I want to make sure that the native libraries installed on the nodemanagers get 
used by all yarn containers. I first found the 
mapreduce.admin.{map,reduce}.child.java.opts config property and set it to:

    '-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 
-Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native'

Basically adding on the native paths to the default values for these 
properties. This seemed to work, but now I see the warning:

    WARN mapred.YARNRunner: Usage of -Djava.library.path in 
mapreduce.admin.map.child.java.opts can cause programs to no longer function if 
hadoop native libraries are used. These values should be set as part of the 
LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env config 
settings.

Okay, so I can go and set mapreduce.admin.user.env, but before I do that I have 
a few questions. Where are these properties actually read in and set? Are they 
read and set prior to the job being submitted by the client code, on the host 
where "hadoop jar whatever.jar" is run? Or are they set by the Resource 
Manager. Or the Application master? Or is it read on the host the map or reduce 
task actually runs on?

Imagine the following scenarios:

 A. The mapreduce.admin.user.env property is not set explicitly by the job's 
java code prior to submission. It is not set via command-line switches during 
submit. It is not set in /etc/hadoop/conf/*-site.xml on the client host. It is 
not set in /etc/hadoop/conf/*-site.xml on the host running the Resource 
Manager. It is not set in /etc/hadoop/conf/*-site.xml on the host that runs the 
Application Master. But it is set in /etc/hadoop/conf/mapred-site.xml on the 
Node Manager host that runs one of the map tasks.
 B. Same as A, but the property is only set in /etc/hadoop/conf/mapred-site.xml 
on the host that runs the Application Master (not on any of the Node Managers 
that run the actual tasks).
 C. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml 
on the Resource Manager host.
 D. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml 
on the client submission host.
 E. Same as A, but the property is set either via command line switch, or in 
the client's code (assuming these cases are the same as D).

In which cases will the map task see the default value for 
mapreduce.admin.map.child.java.opts, and when will it see the explicitly set 
value? What happens if it's explicitly set in more than one of the locations 
referenced above? 

And what about mapred.child.env, where and how does that come into play?

What about yarn.app.mapreduce.am.env and yarn.app.mapreduce.am.admin.user.env, 
will those settings trickle down to the actual tasks or do they only affect the 
Application Master's environment? Same with yarn.nodemanager.admin-env, will it 
trickle down from the Node Manager to the container? Would it be better to set 
one of these rather than the mapreduce equivalent so that I get the native 
libraries for all yarn apps, not just mapreduce ones?

-Steven Willis

Reply via email to