Oh, sure. Please find more info about UDF here: http://pig.apache.org/docs/r0.10.0/udf.html
On Tue, Sep 25, 2012 at 8:16 PM, jamal sasha <[email protected]> wrote: > Hi, > Thanks for replying. > Err I am a new here. > I am trying to find the info as in what is UDF? > > > On Tue, Sep 25, 2012 at 10:41 PM, Cheolsoo Park <[email protected] > >wrote: > > > Hi, > > > > in = load 'in.txt' using PigStorage(',') as (merchant:int, customer:int, > > amount:float); > > perMerchant = group in by merchant; > > avg = foreach perMerchant generate group, AVG(in.amount); > > dump avg; > > > > This returns (merchant_id, avg of amount) as follows: > > > > (1233,203.1999969482422) > > (1234,264.6000061035156) > > > > Regarding standard deviation, you can write your own UDF that computes > it. > > Please take a look at AVG.java to see how it compute the average. > > Basically, you need to modify the exec() method to compute standard > > deviation instead of average. > > > > Thanks, > > Cheolsoo > > > > On Tue, Sep 25, 2012 at 6:36 PM, jamal sasha <[email protected]> > > wrote: > > > > > Hi, > > > I have a huge text file of form > > > data is saved in directory data/data1.txt, data2.txt and so on > > > merchant_id, user_id, amount > > > 1234, 9123, 299.2 > > > 1233, 9199, 203.2 > > > 1234, 0124, 230 > > > and so on.. > > > > > > What I want to do is for each merchant, find the average amount.. > > > so basically in the end i want to save the output in file. > > > something like > > > merchant_id, average_amount > > > 1234, avg_amt_1234 a > > > and so on. > > > How do I calculate the standard deviation as well? > > > > > > Sorry for asking such a basic question. :( > > > Any help would be appreciated. :) > > > Jamal > > > > > >
