Re: updates using Pig

TianYi Zhu Mon, 27 Aug 2012 21:56:25 -0700

Hi Srinivas,

you can write a user defined function for this


feed = union feed1, feed2;
feed_grouped = group feed by trade-key;
output = foreach feed_grouped generate
flatten(your_user_defined_function(feed)) as (trade-key, trade-add-date,
trade-price)

your_user_defined_function take the one or more records with the same
trade-key as input, and it should only output the latest tuple of
(trade-key, trade-add-date, trade-price)


by the way, you can sort these 2 files by trade-key then merge them using a
small script, that's much more faster than using pig.

On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani <[email protected]>wrote:

> Hi,
>
> I'm trying to do updates of records in hadoop using Pig ( I know this is
> not ideal but trying out POC )..
> data looks like the below:
>
> *feed1:*
> --> here trade key is unique for each order/record
> --> this is history file
>
> trade-key    trade-add-date       trade-price
> *k1                 05/21/2012            2000*
> k2                  04/21/2012             3000
> k3                 03/21/2012            4000
> k4                 05/21/2012             5000
>
> *feed2:  *--> this is the latest/daily feed
> trade-key    trade-add-date       trade-price
> k5                06/22/2012             1000
> k6                 06/22/2012            2000
> *k1                06/21/2012             3000   ---> we can see here,
> trade with key "k1" is appeared again..that means order with trade key "k1"
> has some update*
> *
> *
> Now I'm looking for the below output :  ( merging the both files and and
> looking for common key from both feeds and keeping the latest key record in
> the output file )
> *k1                06/21/2012             3000*
> *
> k2                  04/21/2012             3000
> k3                 06/21/2012            4000
> k4                 07/21/2012             5000
> *k5                06/22/2012             1000
> k6                 06/22/2012            2000*
>
> any help appreciated greatly !!
> *
>
> Regards,
> Srinivas
>

Re: updates using Pig

Reply via email to