Yun Tang created YARN-6745: ------------------------------ Summary: Cannot parse correct Spark 2.x jars classpath in YARN on Windows Key: YARN-6745 URL: https://issues.apache.org/jira/browse/YARN-6745 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.7.2 Environment: Windows cluster, Yarn-2.7.2 Reporter: Yun Tang
When submit Spark 2.x applications to YARN cluster on Windows, we found two errors: # If [dynamic resource allocation|https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation]is enabled for Spark, we will get exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit) # We cannot open spark application running web UI The two errors are both related to YARN cannot parse correct Spark 2.x jars wildcard classpath on Windows, and I checked the latest code from hadoop-3.x, this part of code seems not changed and would cause this error again. A typical appacahe folder to run spark executor/driver in our windows yarn looks like below: !http://wx1.sinaimg.cn/large/62eae5a9gy1fh14j38zvbj20bb0990tm.jpg! The link folder of ‘__spark_libs_’ points to a filecache folder with spark-2+ needed jars; The classpath-xxx.jar containing a manifest file of the runtime classpath to work around the 8k maximum command line length problem in windows (https://issues.apache.org/jira/browse/YARN-358) . The ‘launch_container.cmd’ is the script to start YARN container, please note that after running launch_container.cmd, the shortcut ‘__spark_conf_’ , ‘__spark_libs_’ and ‘__app__.jar’ could then be created. ================================================= The typical CLASSPATH of hadoop-2.7.2 in launch_container.cmd looks like below: !http://wx4.sinaimg.cn/large/62eae5a9gy1fh14j2c801j20sh023weh.jpg! The ‘classpath-3177336218981224920.jar’ contains a manifest file containing all the hadoop runtime jars, in which we could find spark-1.6.2-nao-yarn-shuffle.jar and servlet-api-2.5.jar. The two problems are all due to java runtime first load class from those two old jars, while spark 1.x shuffle external service is not compatible with spark 2.x and servlet-api-2.x is not compatible with servlet-api-3.x (used in spark-2). So, that is to say, the “xxx/spark_libs/*” should place before the classpath-jar. OK, let’s see what is the CLASSPATH in Linux. ================================================= The classpath in launch_container.sh looks like: !http://wx2.sinaimg.cn/large/62eae5a9gy1fh14ivycpxj20um01tjre.jpg! We can see the “xxx/spark_libs/*” placed before hadoop jars so that the #1 and #2 problem would not happen in Linux environment. *Root cause*: Two steps for the whole process 1.{color:blue}org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch{color} will transform original CLASSPATH into the classpath-jar in method ‘sanitizeEnv’. The CLASSPATH is: {code:java} %PWD%;%PWD%/__spark_conf__;%PWD%/__app__.jar;%PWD%/__spark_libs__/*;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*; {code} Within this method, it will call ‘createJarWithClassPath’ method from {color:blue}org.apache.hadoop.fs.FileUtil{color} 2. For the wildcard path, {color:blue}org.apache.hadoop.fs.FileUtil{color} will find the files in that folder with suffix of ‘jar’ or ‘JAR’. The previous %PWD%/__spark_libs__/* transformed to {code:java} D:/Data/Yarn/nm-local-dir/usercache/xxx/appcache/application_1494151518127_0073/container_e3752_1494151518127_0073_01_000001/__spark_libs__/* . {code} However, this folder is not existing when generating the classpath-jar, only after running ‘launch_container.cmd’ we could have the ‘_spark_libs_’ folder in current directory, which results in YARN put the “xxx/_spark_libs_/*” classpath into unexpandedWildcardClasspath. And the unexpandedWildcardClasspath is placed after the classpath-jar in CLASSPATH, that’s why we see the “xxx/__spark_libs__/” located in the end. In other words, the correct order should be “xxx/spark_libs/*" placed before the classpath-jar just like Linux case or parse the “xxx/spark_libs/xxx.jar” into the classpath-jar, which means changing current wrong order satisfied the original design. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org