I want to make sure that the native libraries installed on the nodemanagers get
used by all yarn containers. I first found the
mapreduce.admin.{map,reduce}.child.java.opts config property and set it to:
'-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native'
Basically adding on the native paths to the default values for these
properties. This seemed to work, but now I see the warning:
WARN mapred.YARNRunner: Usage of -Djava.library.path in
mapreduce.admin.map.child.java.opts can cause programs to no longer function if
hadoop native libraries are used. These values should be set as part of the
LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env config
settings.
Okay, so I can go and set mapreduce.admin.user.env, but before I do that I have
a few questions. Where are these properties actually read in and set? Are they
read and set prior to the job being submitted by the client code, on the host
where "hadoop jar whatever.jar" is run? Or are they set by the Resource
Manager. Or the Application master? Or is it read on the host the map or reduce
task actually runs on?
Imagine the following scenarios:
A. The mapreduce.admin.user.env property is not set explicitly by the job's
java code prior to submission. It is not set via command-line switches during
submit. It is not set in /etc/hadoop/conf/*-site.xml on the client host. It is
not set in /etc/hadoop/conf/*-site.xml on the host running the Resource
Manager. It is not set in /etc/hadoop/conf/*-site.xml on the host that runs the
Application Master. But it is set in /etc/hadoop/conf/mapred-site.xml on the
Node Manager host that runs one of the map tasks.
B. Same as A, but the property is only set in /etc/hadoop/conf/mapred-site.xml
on the host that runs the Application Master (not on any of the Node Managers
that run the actual tasks).
C. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml
on the Resource Manager host.
D. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml
on the client submission host.
E. Same as A, but the property is set either via command line switch, or in
the client's code (assuming these cases are the same as D).
In which cases will the map task see the default value for
mapreduce.admin.map.child.java.opts, and when will it see the explicitly set
value? What happens if it's explicitly set in more than one of the locations
referenced above?
And what about mapred.child.env, where and how does that come into play?
What about yarn.app.mapreduce.am.env and yarn.app.mapreduce.am.admin.user.env,
will those settings trickle down to the actual tasks or do they only affect the
Application Master's environment? Same with yarn.nodemanager.admin-env, will it
trickle down from the Node Manager to the container? Would it be better to set
one of these rather than the mapreduce equivalent so that I get the native
libraries for all yarn apps, not just mapreduce ones?
-Steven Willis