I am running pig on a custom hadoop implementation but it doesnt respect
params in mapred-site.xml.
Looking into the code, I find that the following 2 files are slightly
different from stock hadoop in that some patches are not present.
hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
and
src/mapred/org/apache/hadoop/mapred/JobConf.java
Given the constraint that I cannot modify these files, what change should I
make within pig to recognize mapred-site.xml parameters?
I pulled in PIG-3135 and PIG-3145 which make changes to
org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
But the params in mapred-site.xml are still not getting recognized. Upon
remote eclipse debugging with breakpoints in the file above, this is what
I found -
HExecutionEngine.java - jc = new jobConf()
calls
1st call upon JobConf() constructor -
Configuration.get(String)
Configuration.getProps() --> if properties ==null, properties = new
Properties(); loadResources(properties, resources...);
JobConf static constructor -
Configuration.addDefaultResource(mapred-site.xml)
HExecutionEngine: 2nd call jc.addResource("mapred-site.xml") -
Configuration.get
val = getProps().getProperty(name)
if (val==null) val =
<custom_hadoop_impl_configuration_object>.getDefault(name);
HExecutionEngine: 3rd call recomputeProperties(jc, properties) -->
clearing properties which were added so it gets reloaded again.
What do I ned to do to make sure the getProps().getProperty call is not
null so that the mapred-site.xml values are not over-ridden by defaults in
custom_hadoop implementation ?
Thanks,
Suhas.