Re: Anyone know hot to submit spark job to yarn in java code?

Philip Ogren Wed, 15 Jan 2014 15:37:20 -0800

My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052

So, I will try running my setup from a Linux client and see if I havebetter luck.


On 1/15/2014 11:38 AM, Philip Ogren wrote:

Great question! I was writing up a similar question this morning anddecided to investigate some more before sending. Here's what I'mtrying. I have created a new scala project that contains onlyspark-examples-assembly-0.8.1-incubating.jar andspark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on theclasspath and I am trying to create a yarn-client SparkContext withthe following:
val spark = new SparkContext("yarn-client", "my-app")
My hope is to run this on my laptop and have it execute/connect on theyarn application master. The hope is that if I can get this to work,then I can do the same from a web application. I'm trying to unpackrun-example.sh, compute-classpath, SparkPi, *.yarn.Client to figureout what environment variables I need to set up etc.
I grabbed all the .xml files out of my clusters conf directory (in mycase /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml andput them on my classpath. I also set up environment variablesSPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
When I run my simple scala script, I get the following error:
Exception in thread "main" org.apache.spark.SparkException: Yarnapplication already ended,might be killed or not able to launchapplication master.atorg.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)atorg.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)atorg.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
atSparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
    at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
I can look at my yarn UI and see that it registers a failedapplication, so I take this as incremental progress. However, I'm notsure how to troubleshoot what I'm doing from here or if what I'mtrying to do is even sensible/possible. Any advice is appreciated.
Thanks,
Philip

On 1/15/2014 11:25 AM, John Zhao wrote:
Now I am working on a web application and I want to submit a sparkjob to hadoop yarn.I have already do my own assemble and can run it in command line bythe following script:
export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
exportSPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar./spark-class org.apache.spark.deploy.yarn.Client --jar./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar--class org.apache.spark.examples.SparkPi --args yarn-standalone--num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
It works fine.
The I realized that it is hard to submit the job from a webapplication .Looks like thespark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar orspark-examples-assembly-0.8.1-incubating.jar is a really big jar. Ibelieve it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to theyarn server ?2) It loos like thespark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the roleof client side and spark-examples-assembly-0.8.1-incubating.jar goeswith spark runtime and examples which will be running in yarn, am Iright?3) Does anyone have any similar experience ? I did lots of hadoop MRstuff and want follow the same logic to submit spark job. For now Ican only find the command line way to submit spark job to yarn. Ibelieve there is a easy way to integration spark in a web allocation.
Thanks.
John.

Re: Anyone know hot to submit spark job to yarn in java code?

Reply via email to