[ https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913154#action_12913154 ]
Alan Gates commented on PIG-1337: --------------------------------- The problem with allowing load and store functions access to the config file is that the config file they see is not the config file that goes to Hadoop. This is not all Pig's fault (see comments above on this). The other problem is that multiple instances of the same load and store function may be operating in a given script, so there are namespace issues to resolve. The proposal for Hadoop 0.22 is that rather than providing access to the config file at all Hadoop will serialize objects such as InputFormat and OutputFormat and pass those to the backend. It will make sense for Pig to follow suit and serialize all UDFs on the front end. This will remove the need for the UDFContext black magic that we do at the moment and should allow all UDFs to easily transfer information from front end to backend. So, hopefully this can get resolved when Pig migrates to Hadoop 0.22, whenever that is. > Need a way to pass distributed cache configuration information to hadoop > backend in Pig's LoadFunc > -------------------------------------------------------------------------------------------------- > > Key: PIG-1337 > URL: https://issues.apache.org/jira/browse/PIG-1337 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.6.0 > Reporter: Chao Wang > > The Zebra storage layer needs to use distributed cache to reduce name node > load during job runs. > To to this, Zebra needs to set up distributed cache related configuration > information in TableLoader (which extends Pig's LoadFunc) . > It is doing this within getSchema(conf). The problem is that the conf object > here is not the one that is being serialized to map/reduce backend. As such, > the distributed cache is not set up properly. > To work over this problem, we need Pig in its LoadFunc to ensure a way that > we can use to set up distributed cache information in a conf object, and this > conf object is the one used by map/reduce backend. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.