Yun Tang created YARN-6745:
------------------------------
Summary: Cannot parse correct Spark 2.x jars classpath in YARN on
Windows
Key: YARN-6745
URL: https://issues.apache.org/jira/browse/YARN-6745
Project: Hadoop YARN
Issue Type: Bug
Components: applications
Affects Versions: 2.7.2
Environment: Windows cluster, Yarn-2.7.2
Reporter: Yun Tang
When submit Spark 2.x applications to YARN cluster on Windows, we found two
errors:
# If [dynamic resource
allocation|https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation]is
enabled for Spark, we will get exception in thread "main"
java.lang.NoSuchMethodError:
org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit)
# We cannot open spark application running web UI
The two errors are both related to YARN cannot parse correct Spark 2.x jars
wildcard classpath on Windows, and I checked the latest code from hadoop-3.x,
this part of code seems not changed and would cause this error again.
A typical appacahe folder to run spark executor/driver in our windows yarn
looks like below:
!http://wx1.sinaimg.cn/large/62eae5a9gy1fh14j38zvbj20bb0990tm.jpg!
The link folder of ‘__spark_libs_’ points to a filecache folder with spark-2+
needed jars;
The classpath-xxx.jar containing a manifest file of the runtime classpath to
work around the 8k maximum command line length problem in windows
(https://issues.apache.org/jira/browse/YARN-358) .
The ‘launch_container.cmd’ is the script to start YARN container, please note
that after running launch_container.cmd, the shortcut ‘__spark_conf_’ ,
‘__spark_libs_’ and ‘__app__.jar’ could then be created.
=================================================
The typical CLASSPATH of hadoop-2.7.2 in launch_container.cmd looks like below:
!http://wx4.sinaimg.cn/large/62eae5a9gy1fh14j2c801j20sh023weh.jpg!
The ‘classpath-3177336218981224920.jar’ contains a manifest file containing all
the hadoop runtime jars, in which we could find
spark-1.6.2-nao-yarn-shuffle.jar and servlet-api-2.5.jar. The two problems are
all due to java runtime first load class from those two old jars, while spark
1.x shuffle external service is not compatible with spark 2.x and
servlet-api-2.x is not compatible with servlet-api-3.x (used in spark-2).
So, that is to say, the “xxx/spark_libs/*” should place before the
classpath-jar. OK, let’s see what is the CLASSPATH in Linux.
=================================================
The classpath in launch_container.sh looks like:
!http://wx2.sinaimg.cn/large/62eae5a9gy1fh14ivycpxj20um01tjre.jpg!
We can see the “xxx/spark_libs/*” placed before hadoop jars so that the #1 and
#2 problem would not happen in Linux environment.
*Root cause*:
Two steps for the whole process
1.{color:blue}org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch{color}
will transform original CLASSPATH into the classpath-jar in method
‘sanitizeEnv’. The CLASSPATH is:
{code:java}
%PWD%;%PWD%/__spark_conf__;%PWD%/__app__.jar;%PWD%/__spark_libs__/*;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*;
{code}
Within this method, it will call ‘createJarWithClassPath’ method from
{color:blue}org.apache.hadoop.fs.FileUtil{color}
2. For the wildcard path, {color:blue}org.apache.hadoop.fs.FileUtil{color} will
find the files in that folder with suffix of ‘jar’ or ‘JAR’. The previous
%PWD%/__spark_libs__/* transformed to
{code:java}
D:/Data/Yarn/nm-local-dir/usercache/xxx/appcache/application_1494151518127_0073/container_e3752_1494151518127_0073_01_000001/__spark_libs__/*
.
{code}
However, this folder is not existing when generating the classpath-jar, only
after running ‘launch_container.cmd’ we could have the ‘_spark_libs_’ folder in
current directory, which results in YARN put the “xxx/_spark_libs_/*” classpath
into unexpandedWildcardClasspath. And the unexpandedWildcardClasspath is placed
after the classpath-jar in CLASSPATH, that’s why we see the
“xxx/__spark_libs__/” located in the end.
In other words, the correct order should be “xxx/spark_libs/*" placed before
the classpath-jar just like Linux case or parse the “xxx/spark_libs/xxx.jar”
into the classpath-jar, which means changing current wrong order satisfied the
original design.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]