Yarn client is much like Spark client mode, except that the executors are running in Yarn containers managed by the Yarn resource manager on the cluster instead of as Spark workers managed by the Spark master.  The driver executes as a local client in your local JVM.  It communicates with the workers on the cluster.  Transformations are scheduled on the cluster by the driver's logic.  Actions involve communication between local driver and remote cluster executors.  So, there is some additional network overhead, especially if the driver is not co-located on the cluster.  In yarn-cluster mode -- in contrast, the driver is executed as a thread in a Yarn application master on the cluster. 

In either case, the assembly JAR must be available to the application on the cluster.  Best to copy it to HDFS and specify its location by exporting its location as SPARK_JAR.

Kevin Markey

On 06/19/2014 11:22 AM, Koert Kuipers wrote:
i am trying to understand how yarn-client mode works. i am not using spark-submit, but instead launching a spark job from within my own application.

i can see my application contacting yarn successfully, but then in yarn i get an immediate error:

Application application_1403117970283_0014 failed 2 times due to AM Container for appattempt_1403117970283_0014_000002 exited with exitCode: -1000 due to: File file:/home/koert/test-assembly-0.1-SNAPSHOT.jar does not exist
.Failing this attempt.. Failing the application.

why is yarn trying to fetch my jar, and why as a local file? i would expect the jar to be send to yarn over the wire upon job submission?


Reply via email to