One thing I not sure is whether "mapred.tasktracker.map.tasks.maximum" is a client side setting. If it is, you can create "pig-cluster-hadoop-site.xml", and put the directory containing it in classpath. pig-cluster-hadoop-site.xml is the additional hadoop settings specific to Pig. It has the same format as other hadoop config files.
Daniel On Fri, Jul 8, 2011 at 10:08 AM, Dylan Scott <[email protected]> wrote: > I have a Hadoop job running through Pig for which I would like to limit the > number of concurrently running mappers per task tracker. The > mapred.tasktracker.map.tasks.maximum property seems to be just what I want > to modify, but unfortunately I cannot modify it in mapred-site.xml as this > configuration is shared by many different jobs, most of which don't need to > be limited in the same way. > I'm wondering what the best way to set this option would be. I noticed that > using the Configuration returned by UDFContext.getJobConf() will not work, > as it is a copy of the configuration and so writing the property here will > not get passed back to the system. I'm given access to the Job object in my > store func's setStoreLocation method, would setting the property on this > Job's configuration get passed back to the system? If not is there a good > way to set a property like this from within a Pig UDF? >
