various questions about yarn-standalone vs. yarn-client

Philip Ogren Thu, 30 Jan 2014 11:22:00 -0800

I have a few questions about yarn-standalone and yarn-client deploymentmodes that are described on the Launching Spark on YARN<http://spark.incubator.apache.org/docs/latest/running-on-yarn.html> page.

1) Can someone give me a basic conceptual overview? I am strugglingwith understanding the difference between yarn-standalone andyarn-client deployment modes. I understand that yarn-standalone runs onthe name node and that yarn-client can be run from a remote machine -but otherwise don't understand how they are different. It seems likehaving yarn-client is the obvious better approach because it can runfrom anywhere - but presumably, there is some advantage to havingyarn-standalone (otherwise, why not just run yarn-client on the namenode or from a remote machine.) I'm also curious to know what"standalone" refers to here.

2) I was able to run the SparkPi in yarn-client mode from a simple scalamain method by providing only SPARK_JAR and SPARK_YARN_APP_JARenvironment variables and by putting the various *-site.xml files on myclasspath. That is, I didn't call run-example - just called my Scalaapp directly. We've had troubles duplicating this success on our ownapp and are in the process of applying the patch detailed here:


https://github.com/apache/incubator-spark/pull/371

However, one think that I think I learned is that Spark doesn't have tobe installed on the name node. Is that correct? Should I need to haveSpark installed at all either on my remote machine or on the name node?It would be great if all that was needed were the SPARK_JAR and theSPARK_YARN_APP_JAR.

3) Finally, is it possible to pre-stage the assembly jar files so theydon't need to be copied over every time I start a new Spark job inyarn-client mode? Any advice here is appreciated.


Thanks!
Philip

various questions about yarn-standalone vs. yarn-client

Reply via email to