Yun Tang created YARN-6745:
------------------------------

             Summary: Cannot parse correct Spark 2.x jars classpath in YARN on 
Windows
                 Key: YARN-6745
                 URL: https://issues.apache.org/jira/browse/YARN-6745
             Project: Hadoop YARN
          Issue Type: Bug
          Components: applications
    Affects Versions: 2.7.2
         Environment: Windows cluster, Yarn-2.7.2
            Reporter: Yun Tang


When submit Spark 2.x applications to YARN cluster on Windows, we found two 
errors:
# If  [dynamic resource 
allocation|https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation]is
 enabled for Spark, we will get exception in thread "main" 
java.lang.NoSuchMethodError: 
org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit)
# We cannot open spark application running web UI

The two errors are both related to YARN cannot parse correct Spark 2.x jars 
wildcard classpath on Windows, and I checked the latest code from hadoop-3.x, 
this part of code seems not changed and would cause this error again.

A typical appacahe folder to run spark executor/driver in our windows yarn 
looks like below:
!http://wx1.sinaimg.cn/large/62eae5a9gy1fh14j38zvbj20bb0990tm.jpg!
The link folder of ‘__spark_libs_’ points to a filecache folder with spark-2+ 
needed jars;
The classpath-xxx.jar containing a manifest file of the runtime classpath to 
work around the 8k maximum command line length problem in windows 
(https://issues.apache.org/jira/browse/YARN-358) .
The ‘launch_container.cmd’ is the script to start YARN container, please note 
that after running launch_container.cmd, the shortcut ‘__spark_conf_’ , 
‘__spark_libs_’ and ‘__app__.jar’ could then be created.


=================================================
The typical CLASSPATH of hadoop-2.7.2 in launch_container.cmd looks like below:
!http://wx4.sinaimg.cn/large/62eae5a9gy1fh14j2c801j20sh023weh.jpg!
The ‘classpath-3177336218981224920.jar’ contains a manifest file containing all 
the hadoop runtime jars, in which we could find 
spark-1.6.2-nao-yarn-shuffle.jar and servlet-api-2.5.jar. The two problems are 
all due to java runtime first load class from those two old jars, while spark 
1.x shuffle external service is not compatible with spark 2.x and 
servlet-api-2.x is not compatible with servlet-api-3.x (used in spark-2).

So, that is to say, the “xxx/spark_libs/*” should place before the 
classpath-jar. OK, let’s see what is the CLASSPATH in Linux.

=================================================
The classpath in launch_container.sh looks like:
!http://wx2.sinaimg.cn/large/62eae5a9gy1fh14ivycpxj20um01tjre.jpg!
We can see the “xxx/spark_libs/*” placed before hadoop jars so that the #1 and 
#2 problem would not happen in Linux environment.

*Root cause*:
Two steps for the whole process
1.{color:blue}org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch{color}
 will transform original CLASSPATH into the classpath-jar in method 
‘sanitizeEnv’. The CLASSPATH is:
{code:java}
%PWD%;%PWD%/__spark_conf__;%PWD%/__app__.jar;%PWD%/__spark_libs__/*;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*;
{code}

Within this method, it will call ‘createJarWithClassPath’ method from 
{color:blue}org.apache.hadoop.fs.FileUtil{color}

2. For the wildcard path, {color:blue}org.apache.hadoop.fs.FileUtil{color} will 
find the files in that folder with suffix of ‘jar’ or ‘JAR’. The previous 
%PWD%/__spark_libs__/* transformed to 
{code:java}
D:/Data/Yarn/nm-local-dir/usercache/xxx/appcache/application_1494151518127_0073/container_e3752_1494151518127_0073_01_000001/__spark_libs__/*
 .
{code}

However, this folder is not existing when generating the classpath-jar, only 
after running ‘launch_container.cmd’ we could have the ‘_spark_libs_’ folder in 
current directory, which results in YARN put the “xxx/_spark_libs_/*” classpath 
into unexpandedWildcardClasspath. And the unexpandedWildcardClasspath is placed 
after the classpath-jar in CLASSPATH, that’s why we see the 
“xxx/__spark_libs__/” located in the end. 

In other words, the correct order should be “xxx/spark_libs/*" placed before 
the classpath-jar just like Linux case or parse the “xxx/spark_libs/xxx.jar” 
into the classpath-jar, which means changing current wrong order satisfied the 
original design. 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to