now I can see it :-) very beautiful place
On Wed, Aug 29, 2012 at 5:47 AM, Srini <[email protected]> wrote: > Thank-you very much Jonathan... > > On Tue, Aug 28, 2012 at 2:47 AM, Jonathan Coveney <[email protected] > >wrote: > > > I would do this with a cogroup. Whether or not you need a UDF depends on > > whether or not a key can appear more than once in a file. > > > > trade-key trade-add-date trade-price > > > > feed_group = cogroup feed1 by trade-key, feed2 by trade-key; > > feed_proj = foreach feed_group generate FLATTEN( IsEmpty(feed2) ? feed1 ? > > feed2 ); > > > > and there you go (you may need to tweak the flatten to make it work). > > > > It'd be slightly more complicated if you had multiple key/date pairs. > > > > 2012/8/27 Srini <[email protected]> > > > > > Hello TianYi Zhu, > > > > > > Thanks !! and will get back.. > > > > > > -->by the way, you can sort these 2 files by trade-key then merge them > > > using a > > > small script, that's much more faster than using pig. > > > ... Trying out POC on updates in hadoop > > > > > > Thanks, > > > Srinivas > > > On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu < > > > [email protected]> wrote: > > > > > > > Hi Srinivas, > > > > > > > > you can write a user defined function for this > > > > > > > > feed = union feed1, feed2; > > > > feed_grouped = group feed by trade-key; > > > > output = foreach feed_grouped generate > > > > flatten(your_user_defined_function(feed)) as (trade-key, > > trade-add-date, > > > > trade-price) > > > > > > > > your_user_defined_function take the one or more records with the same > > > > trade-key as input, and it should only output the latest tuple of > > > > (trade-key, trade-add-date, trade-price) > > > > > > > > > > > > by the way, you can sort these 2 files by trade-key then merge them > > > using a > > > > small script, that's much more faster than using pig. > > > > > > > > On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani < > > > [email protected] > > > > >wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm trying to do updates of records in hadoop using Pig ( I know > this > > > is > > > > > not ideal but trying out POC ).. > > > > > data looks like the below: > > > > > > > > > > *feed1:* > > > > > --> here trade key is unique for each order/record > > > > > --> this is history file > > > > > > > > > > trade-key trade-add-date trade-price > > > > > *k1 05/21/2012 2000* > > > > > k2 04/21/2012 3000 > > > > > k3 03/21/2012 4000 > > > > > k4 05/21/2012 5000 > > > > > > > > > > *feed2: *--> this is the latest/daily feed > > > > > trade-key trade-add-date trade-price > > > > > k5 06/22/2012 1000 > > > > > k6 06/22/2012 2000 > > > > > *k1 06/21/2012 3000 ---> we can see > > here, > > > > > trade with key "k1" is appeared again..that means order with trade > > key > > > > "k1" > > > > > has some update* > > > > > * > > > > > * > > > > > Now I'm looking for the below output : ( merging the both files > and > > > and > > > > > looking for common key from both feeds and keeping the latest key > > > record > > > > in > > > > > the output file ) > > > > > *k1 06/21/2012 3000* > > > > > * > > > > > k2 04/21/2012 3000 > > > > > k3 06/21/2012 4000 > > > > > k4 07/21/2012 5000 > > > > > *k5 06/22/2012 1000 > > > > > k6 06/22/2012 2000* > > > > > > > > > > any help appreciated greatly !! > > > > > * > > > > > > > > > > Regards, > > > > > Srinivas > > > > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > Srinivas > > > [email protected] > > > > > > > > > -- > Regards, > Srinivas > [email protected] >
