My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052
So, I will try running my setup from a Linux client and see if I have
better luck.
On 1/15/2014 11:38 AM, Philip Ogren wrote:
Great question! I was writing up a similar question this morning and
decided to investigate some more before sending. Here's what I'm
trying. I have created a new scala project that contains only
spark-examples-assembly-0.8.1-incubating.jar and
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the
classpath and I am trying to create a yarn-client SparkContext with
the following:
val spark = new SparkContext("yarn-client", "my-app")
My hope is to run this on my laptop and have it execute/connect on the
yarn application master. The hope is that if I can get this to work,
then I can do the same from a web application. I'm trying to unpack
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure
out what environment variables I need to set up etc.
I grabbed all the .xml files out of my clusters conf directory (in my
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and
put them on my classpath. I also set up environment variables
SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
When I run my simple scala script, I get the following error:
Exception in thread "main" org.apache.spark.SparkException: Yarn
application already ended,might be killed or not able to launch
application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
at
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
at
SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
I can look at my yarn UI and see that it registers a failed
application, so I take this as incremental progress. However, I'm not
sure how to troubleshoot what I'm doing from here or if what I'm
trying to do is even sensible/possible. Any advice is appreciated.
Thanks,
Philip
On 1/15/2014 11:25 AM, John Zhao wrote:
Now I am working on a web application and I want to submit a spark
job to hadoop yarn.
I have already do my own assemble and can run it in command line by
the following script:
export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client --jar
./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
--class org.apache.spark.examples.SparkPi --args yarn-standalone
--num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
It works fine.
The I realized that it is hard to submit the job from a web
application .Looks like the
spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or
spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I
believe it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the
yarn server ?
2) It loos like the
spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role
of client side and spark-examples-assembly-0.8.1-incubating.jar goes
with spark runtime and examples which will be running in yarn, am I
right?
3) Does anyone have any similar experience ? I did lots of hadoop MR
stuff and want follow the same logic to submit spark job. For now I
can only find the command line way to submit spark job to yarn. I
believe there is a easy way to integration spark in a web allocation.
Thanks.
John.