You can also just pass the bag to the UDF, and have a lazy initializer in exec that loads the bag into memory.
2012/6/26 Mridul Muralidharan <[email protected]> > You could dump the data in a dfs file and pass the location of the file as > param to your udf in define - so that it initializes itself using that data > ... > > > - Mridul > > > > -----Original Message----- > > From: Dexin Wang [mailto:[email protected]] > > Sent: Tuesday, June 26, 2012 10:58 PM > > To: [email protected] > > Subject: Passing a BAG to Pig UDF constructor? > > > > Is it possible to pass a bag to a Pig UDF constructor? > > > > Basically in the constructor I want to initialize some hash map so that > > on every exec operation, I can use the hashmap to do a lookup and find > > the value I need, and apply some algorithm to it. > > > > I realize I could just do a replicated join to achieve similar things > > but the algorithm is more than a few lines and there are some edge > > cases so I would rather wrap that logic inside a UDF function. I also > > realize I could just pass a file path to the constructor and read the > > files to initialize the hashmap but my files are on Amazon's S3 and I > > don't want to deal with > > S3 API to read the file. > > > > Is this possible or is there some alternative ways to achieve the same > > thing? > > > > Thanks. > > Dexin >
