We are trying to get the appropriate jars into our AMs CLASSPATH, but running
into an issue. We are building on the distributed shell sample code. Feel
free to direct me to "the right way to do this", if our approach is incorrect
or the best practice has been revised. All we need are the default Hadoop jars
plus our AM's jar.
I am running HDP 2.2.0.2.0.6.0-76. I am developing a YARN application that
builds on the distributed shell example.
The code for constructing the classpath is derived from the distributed shell
example:
Map<String, String> env = new HashMap<String, String>();
// Add AppMaster.jar location to classpath
// At some point we should not be required to add
// the hadoop specific classpaths to the env.
// It should be provided out of the box.
// For now setting all required classpaths including
// the classpath to "." for the application jar
StringBuilder classPathEnv = new StringBuilder("${CLASSPATH}:./*");
for (String c : conf.getStrings(
YarnConfiguration.YARN_APPLICATION_CLASSPATH,
YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) {
classPathEnv.append(':');
classPathEnv.append(c.trim());
}
classPathEnv.append(":./log4j.properties");
env.put("CLASSPATH", classPathEnv.toString());
amContainer.setEnvironment(env);
It produces a string that looks something like this:
"$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:
..."
When I submit the application on a single node cluster, the classpath, as given
by system property "java.class.path", in the Application Master has all wild
card expansion done and produces a very long classpath. This classpath is
correct and the Application Master runs properly.
When I submit the same application to a 4 node cluster running the same version
of HDP, then "java.class.path" shows "*" characters which have not been
expanded to be the list of jar files in the named directory. Thus, I get "class
not found" exceptions.
On the 4 node cluster the value of "yarn.application.classpath" appears "as is"
in "java.class.path" with no wild card expansion. Yet, in the single node
cluster the value for "yarn.application.classpath" appears in "java.class.path"
with all wild card expansion done.
Is there perhaps a problem in our 4 node cluster configuration? Or is there
possibly a bug in the YARN implementation for this setup?
Thanks
John