Hi,
in = load 'in.txt' using PigStorage(',') as (merchant:int, customer:int,
amount:float);
perMerchant = group in by merchant;
avg = foreach perMerchant generate group, AVG(in.amount);
dump avg;
This returns (merchant_id, avg of amount) as follows:
(1233,203.1999969482422)
(1234,264.6000061035156)
Regarding standard deviation, you can write your own UDF that computes it.
Please take a look at AVG.java to see how it compute the average.
Basically, you need to modify the exec() method to compute standard
deviation instead of average.
Thanks,
Cheolsoo
On Tue, Sep 25, 2012 at 6:36 PM, jamal sasha <[email protected]> wrote:
> Hi,
> I have a huge text file of form
> data is saved in directory data/data1.txt, data2.txt and so on
> merchant_id, user_id, amount
> 1234, 9123, 299.2
> 1233, 9199, 203.2
> 1234, 0124, 230
> and so on..
>
> What I want to do is for each merchant, find the average amount..
> so basically in the end i want to save the output in file.
> something like
> merchant_id, average_amount
> 1234, avg_amt_1234 a
> and so on.
> How do I calculate the standard deviation as well?
>
> Sorry for asking such a basic question. :(
> Any help would be appreciated. :)
> Jamal
>