You can also just pass the bag to the UDF, and have a lazy initializer in
exec that loads the bag into memory.

2012/6/26 Mridul Muralidharan <[email protected]>

> You could dump the data in a dfs file and pass the location of the file as
> param to your udf in define - so that it initializes itself using that data
> ...
>
>
> - Mridul
>
>
> > -----Original Message-----
> > From: Dexin Wang [mailto:[email protected]]
> > Sent: Tuesday, June 26, 2012 10:58 PM
> > To: [email protected]
> > Subject: Passing a BAG to Pig UDF constructor?
> >
> > Is it possible to pass a bag to a Pig UDF constructor?
> >
> > Basically in the constructor I want to initialize some hash map so that
> > on every exec operation, I can use the hashmap to do a lookup and find
> > the value I need, and apply some algorithm to it.
> >
> > I realize I could just do a replicated join to achieve similar things
> > but the algorithm is more than a few lines and there are some edge
> > cases so I would rather wrap that logic inside a UDF function. I also
> > realize I could just pass a file path to the constructor and read the
> > files to initialize the hashmap but my files are on Amazon's S3 and I
> > don't want to deal with
> > S3 API to read the file.
> >
> > Is this possible or is there some alternative ways to achieve the same
> > thing?
> >
> > Thanks.
> > Dexin
>

Reply via email to