I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN.
I am using YARN Client with yarn-standalone mode as described here http://spark.incubator.apache.org/docs/latest/running-on-yarn.html For simplifying matters I’ll say my application code is all contained in application.jar and it additionally depends on on code in dependency.jar I launch my spark application as follows: SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \ --jar application.jar \ --class <My main class> \ --args <app specific arguments> \ --num-workers <NUMBER_OF_WORKER_MACHINES> \ --master-memory <MEMORY_FOR_MASTER> \ --worker-memory <MEMORY_PER_WORKER> \ --worker-cores <CORES_PER_WORKER> \ --name <application_name> \ --addJars dependency.jar Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar. As an attempted fix i tried including all of the dependencies into a single jar “application-with-dependencies.jar” I specify this jar with —jar option and remove the —addJars line. Unfortunately this did not alleviate the issue and the class not found exceptions continued.
