> -----Original Message-----
> From: Jonathan Coveney [mailto:[email protected]]
> Sent: Wednesday, June 27, 2012 3:12 AM
> To: [email protected]
> Subject: Re: Passing a BAG to Pig UDF constructor?
> 
> You can also just pass the bag to the UDF, and have a lazy initializer
> in exec that loads the bag into memory.


Can you elaborate what you mean by pass the bag to the UDF ?
Pass it as part of the input to the udf in exec and initialize it only once 
(first time) ? (If yes, this is expensive)
Or something else ?


Regards,
Mridul



> 
> 2012/6/26 Mridul Muralidharan <[email protected]>
> 
> > You could dump the data in a dfs file and pass the location of the
> > file as param to your udf in define - so that it initializes itself
> > using that data ...
> >
> >
> > - Mridul
> >
> >
> > > -----Original Message-----
> > > From: Dexin Wang [mailto:[email protected]]
> > > Sent: Tuesday, June 26, 2012 10:58 PM
> > > To: [email protected]
> > > Subject: Passing a BAG to Pig UDF constructor?
> > >
> > > Is it possible to pass a bag to a Pig UDF constructor?
> > >
> > > Basically in the constructor I want to initialize some hash map so
> > > that on every exec operation, I can use the hashmap to do a lookup
> > > and find the value I need, and apply some algorithm to it.
> > >
> > > I realize I could just do a replicated join to achieve similar
> > > things but the algorithm is more than a few lines and there are
> some
> > > edge cases so I would rather wrap that logic inside a UDF function.
> > > I also realize I could just pass a file path to the constructor and
> > > read the files to initialize the hashmap but my files are on
> > > Amazon's S3 and I don't want to deal with
> > > S3 API to read the file.
> > >
> > > Is this possible or is there some alternative ways to achieve the
> > > same thing?
> > >
> > > Thanks.
> > > Dexin
> >

Reply via email to