Re: updates using Pig

pablomar Wed, 29 Aug 2012 04:05:14 -0700

now I can see it :-)
very beautiful place


On Wed, Aug 29, 2012 at 5:47 AM, Srini <[email protected]> wrote:

> Thank-you very much  Jonathan...
>
> On Tue, Aug 28, 2012 at 2:47 AM, Jonathan Coveney <[email protected]
> >wrote:
>
> > I would do this with a cogroup. Whether or not you need a UDF depends on
> > whether or not a key can appear more than once in a file.
> >
> > trade-key    trade-add-date       trade-price
> >
> > feed_group = cogroup feed1 by trade-key, feed2 by trade-key;
> > feed_proj = foreach feed_group generate FLATTEN( IsEmpty(feed2) ? feed1 ?
> > feed2 );
> >
> > and there you go (you may need to tweak the flatten to make it work).
> >
> > It'd be slightly more complicated if you had multiple key/date pairs.
> >
> > 2012/8/27 Srini <[email protected]>
> >
> > > Hello  TianYi Zhu,
> > >
> > > Thanks !! and will get back..
> > >
> > > -->by the way, you can sort these 2 files by trade-key then merge them
> > > using a
> > > small script, that's much more faster than using pig.
> > > ... Trying out POC on updates in hadoop
> > >
> > > Thanks,
> > > Srinivas
> > > On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu <
> > > [email protected]> wrote:
> > >
> > > > Hi Srinivas,
> > > >
> > > > you can write a user defined function for this
> > > >
> > > > feed = union feed1, feed2;
> > > > feed_grouped = group feed by trade-key;
> > > > output = foreach feed_grouped generate
> > > > flatten(your_user_defined_function(feed)) as (trade-key,
> > trade-add-date,
> > > > trade-price)
> > > >
> > > > your_user_defined_function take the one or more records with the same
> > > > trade-key as input, and it should only output the latest tuple of
> > > > (trade-key, trade-add-date, trade-price)
> > > >
> > > >
> > > > by the way, you can sort these 2 files by trade-key then merge them
> > > using a
> > > > small script, that's much more faster than using pig.
> > > >
> > > > On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani <
> > > [email protected]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to do updates of records in hadoop using Pig ( I know
> this
> > > is
> > > > > not ideal but trying out POC )..
> > > > > data looks like the below:
> > > > >
> > > > > *feed1:*
> > > > > --> here trade key is unique for each order/record
> > > > > --> this is history file
> > > > >
> > > > > trade-key    trade-add-date       trade-price
> > > > > *k1                 05/21/2012            2000*
> > > > > k2                  04/21/2012             3000
> > > > > k3                 03/21/2012            4000
> > > > > k4                 05/21/2012             5000
> > > > >
> > > > > *feed2:  *--> this is the latest/daily feed
> > > > > trade-key    trade-add-date       trade-price
> > > > > k5                06/22/2012             1000
> > > > > k6                 06/22/2012            2000
> > > > > *k1                06/21/2012             3000   ---> we can see
> > here,
> > > > > trade with key "k1" is appeared again..that means order with trade
> > key
> > > > "k1"
> > > > > has some update*
> > > > > *
> > > > > *
> > > > > Now I'm looking for the below output :  ( merging the both files
> and
> > > and
> > > > > looking for common key from both feeds and keeping the latest key
> > > record
> > > > in
> > > > > the output file )
> > > > > *k1                06/21/2012             3000*
> > > > > *
> > > > > k2                  04/21/2012             3000
> > > > > k3                 06/21/2012            4000
> > > > > k4                 07/21/2012             5000
> > > > > *k5                06/22/2012             1000
> > > > > k6                 06/22/2012            2000*
> > > > >
> > > > > any help appreciated greatly !!
> > > > > *
> > > > >
> > > > > Regards,
> > > > > Srinivas
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Srinivas
> > > [email protected]
> > >
> >
>
>
>
> --
> Regards,
> Srinivas
> [email protected]
>

Re: updates using Pig

Reply via email to