[ https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672360#action_12672360 ]
Alan Gates commented on PIG-602: -------------------------------- I propose the following solution. First, a singleton class will added to pig. {code} public class PigConf implements Serializable { private static PigConf self; private Map<String, Serializable> userConf; private PigConf() { ... } public static getPigConf() { return self; } public Map<String, Serializable> getUserConf(); } {code} Pig would take care of serializing this class between the front end and backend. So users UDFs could stash keys and values away in this on the front end and then be guaranteed to pick them back up on the back end. Pig's map, reduce, and combiner frameworks would need to change to explicitly desieralize this and populate it. The front end would need to change to serialize this as part of submitting the job to hadoop. Furthermore, users could populate this from a configuration file by providing a file on the command line. We would add a command line argument (such as -u/-userconf). Contents of this file would be read using Properties.loadFromXml and then loaded to PigConf.userConf. The reason a Properties object is not used for this is that Properties is a Map<Object, Object> which is too generic. We would like to constrain the keys to be Strings, and the values must be Serializable so that we can guarantee that we can transmit them from front end to back. Thoughts? > Pass global configurations to UDF > --------------------------------- > > Key: PIG-602 > URL: https://issues.apache.org/jira/browse/PIG-602 > Project: Pig > Issue Type: New Feature > Components: impl > Reporter: Yiping Han > Assignee: Alan Gates > > We are seeking an easy way to pass a large number of global configurations to > UDFs. > Since our application contains many pig jobs, and has a large number of > configurations. Passing configurations through command line is not an ideal > way (i.e. modifying single parameter needs to change multiple command lines). > And to put everything into the hadoop conf is not an ideal way either. > We would like to see if Pig can provide such a facility that allows us to > pass a configuration file in some format(XML?) and then make it available > through out all the UDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.