We are trying to get the appropriate jars into our AMs CLASSPATH, but running 
into an issue.  We are building on the distributed shell sample code.  Feel 
free to direct me to "the right way to do this", if our approach is incorrect 
or the best practice has been revised.  All we need are the default Hadoop jars 
plus our AM's jar.

I am running HDP 2.2.0.2.0.6.0-76. I am developing a YARN application that 
builds on the distributed shell example.

The code for constructing the classpath is derived from the distributed shell 
example:
    Map<String, String> env = new HashMap<String, String>();
    // Add AppMaster.jar location to classpath
    // At some point we should not be required to add
    // the hadoop specific classpaths to the env.
    // It should be provided out of the box.
    // For now setting all required classpaths including
    // the classpath to "." for the application jar
    StringBuilder classPathEnv = new StringBuilder("${CLASSPATH}:./*");
    for (String c : conf.getStrings(
        YarnConfiguration.YARN_APPLICATION_CLASSPATH,
        YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) {
      classPathEnv.append(':');
      classPathEnv.append(c.trim());
    }
    classPathEnv.append(":./log4j.properties");

    env.put("CLASSPATH", classPathEnv.toString());

    amContainer.setEnvironment(env);

It produces a string that looks something like this:
"$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:
 ..."

When I submit the application on a single node cluster, the classpath, as given 
by system property "java.class.path", in the Application Master has all wild 
card expansion done and produces a very long classpath. This classpath is 
correct and the Application Master runs properly.
When I submit the same application to a 4 node cluster running the same version 
of HDP, then "java.class.path" shows "*" characters which have not been 
expanded to be the list of jar files in the named directory. Thus, I get "class 
not found" exceptions.
On the 4 node cluster the value of "yarn.application.classpath" appears "as is" 
in "java.class.path" with no wild card expansion. Yet, in the single node 
cluster the value for "yarn.application.classpath" appears in "java.class.path" 
with all wild card expansion done.
Is there perhaps a problem in our 4 node cluster configuration? Or is there 
possibly a bug in the YARN implementation for this setup?

Thanks
John


Reply via email to