Re: Spark / YARN classpath issues

Jon Bender Thu, 22 May 2014 13:42:35 -0700

Andrew,

Brilliant!  I built on Java 7 but was still running our cluster on Java 6.
 Upgraded the cluster and it worked (with slight tweaks to the args, I
guess the app args come first then yarn-standalone comes last):


SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
\
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar
examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
\
      --class org.apache.spark.examples.SparkPi \
      --args 10 \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

I'll make sure to use spark-submit from here on out.

Thanks very much!
Jon


On Thu, May 22, 2014 at 12:40 PM, Andrew Or <and...@databricks.com> wrote:

> Hi Jon,
>
> Your configuration looks largely correct. I have very recently confirmed
> that the way you launch SparkPi also works for me.
>
> I have run into the same problem a bunch of times. My best guess is that
> this is a Java version issue. If the Spark assembly jar is built with Java
> 7, it cannot be opened by Java 6 because the two versions use different
> packaging schemes. This is a known issue:
> https://issues.apache.org/jira/browse/SPARK-1520.
>
> The workaround is to either make sure that all your executor nodes are
> running Java 7, and, very importantly, have JAVA_HOME point to this
> version. You can achieve this through
>
> export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"
>
> in spark-env.sh. Another safe alternative, of course, is to just build the
> jar with Java 6. An additional debugging step is to review the launch
> environment of all the containers. This is detailed in the last paragraph
> of this section:
> http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application.
> This may not be necessary, but I have personally found it immensely useful.
>
> One last thing, launching Spark applications through
> org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should
> use bin/spark-submit instead. You can find information about its usage on
> the docs I linked to you, or simply through the --help option.
>
> Cheers,
> Andrew
>
>
> 2014-05-22 11:38 GMT-07:00 Jon Bender <jonathan.ben...@gmail.com>:
>
> Hey all,
>>
>> I'm working through the basic SparkPi example on a YARN cluster, and i'm
>> wondering why my containers don't pick up the spark assembly classes.
>>
>> I built the latest spark code against CDH5.0.0
>>
>> Then ran the following:
>> SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
>> \
>>       ./bin/spark-class org.apache.spark.deploy.yarn.Client \
>>       --jar
>> examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
>> \
>>       --class org.apache.spark.examples.SparkPi \
>>       --args yarn-standalone \
>>       --num-workers 3 \
>>       --master-memory 4g \
>>       --worker-memory 2g \
>>       --worker-cores 1
>>
>> The job dies, and in the stderr from the containers I see
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/deploy/yarn/ApplicationMaster
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.deploy.yarn.ApplicationMaster
>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>>
>> my yarn-site.xml contains the following classpath:
>>   <property>
>>     <name>yarn.application.classpath</name>
>>     <value>
>>     /etc/hadoop/conf/,
>>     /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
>>     /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
>>     /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
>>     /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
>>     /usr/lib/avro/*
>>     </value>
>>   </property>
>>
>> I've confirmed that the spark-assembly JAR has this class.  Does it
>> actually need to be defined in yarn.application.classpath or should the
>> spark client take care of ensuring the necessary JARs are added during job
>> submission?
>>
>> Any tips would be greatly appreciated!
>> Cheers,
>> Jon
>>
>
>

Re: Spark / YARN classpath issues

Reply via email to