[GitHub] spark pull request: SPARK-1492. Update Spark YARN docs to use spar...

vanzin Wed, 30 Apr 2014 15:41:23 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/601#discussion_r12169839
  
    --- Diff: docs/running-on-yarn.md ---
    @@ -47,83 +49,42 @@ System Properties:
     # Launching Spark on YARN
     
     Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which 
contains the (client side) configuration files for the Hadoop cluster.
    -These configs are used to connect to the cluster, write to the dfs, and 
connect to the YARN ResourceManager.
    +These configs are used to write to the dfs, and connect to the YARN 
ResourceManager.
     
     There are two deploy modes that can be used to launch Spark applications 
on YARN. In yarn-cluster mode, the Spark driver runs inside an application 
master process which is managed by YARN on the cluster, and the client can go 
away after initiating the application. In yarn-client mode, the driver runs in 
the client process, and the application master is only used for requesting 
resources from YARN.
     
     Unlike in Spark standalone and Mesos mode, in which the master's address 
is specified in the "master" parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration.  Thus, the master parameter 
is simply "yarn-client" or "yarn-cluster".
     
    -The spark-submit script described in the [cluster mode 
overview](cluster-overview.html) provides the most straightforward way to 
submit a compiled Spark application to YARN in either deploy mode. For info on 
the lower-level invocations it uses, read ahead. For running spark-shell 
against YARN, skip down to the yarn-client section. 
    -
    -## Launching a Spark application with yarn-cluster mode.
    -
    -The command to launch the Spark application on the cluster is as follows:
    -
    -    SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./bin/spark-class 
org.apache.spark.deploy.yarn.Client \
    -      --jar <YOUR_APP_JAR_FILE> \
    -      --class <APP_MAIN_CLASS> \
    -      --arg <APP_MAIN_ARGUMENT> \
    -      --num-executors <NUMBER_OF_EXECUTOR_PROCESSES> \
    -      --driver-memory <MEMORY_FOR_ApplicationMaster> \
    -      --executor-memory <MEMORY_PER_EXECUTOR> \
    -      --executor-cores <CORES_PER_EXECUTOR> \
    -      --name <application_name> \
    -      --queue <queue_name> \
    -      --addJars <any_local_files_used_in_SparkContext.addJar> \
    -      --files <files_for_distributed_cache> \
    -      --archives <archives_for_distributed_cache>
    -
    -To pass multiple arguments the "arg" option can be specified multiple 
times. For example:
    -
    -    # Build the Spark assembly JAR and the Spark examples JAR
    -    $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
    -
    -    # Configure logging
    -    $ cp conf/log4j.properties.template conf/log4j.properties
    -
    -    # Submit Spark's ApplicationMaster to YARN's ResourceManager, and 
instruct Spark to run the SparkPi example
    -    $ 
SPARK_JAR=./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar
 \
    -        ./bin/spark-class org.apache.spark.deploy.yarn.Client \
    -          --jar 
examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar
 \
    -          --class org.apache.spark.examples.SparkPi \
    -          --arg yarn-cluster \
    -          --arg 5 \
    -          --num-executors 3 \
    -          --driver-memory 4g \
    -          --executor-memory 2g \
    -          --executor-cores 1
    -
    -The above starts a YARN client program which starts the default 
Application Master. Then SparkPi will be run as a child thread of Application 
Master. The client will periodically poll the Application Master for status 
updates and display them in the console. The client will exit once your 
application has finished running.  Refer to the "Viewing Logs" section below 
for how to see driver and executor logs.
    -
    -Because the application is run on a remote machine where the Application 
Master is running, applications that involve local interaction, such as 
spark-shell, will not work.
    -
    -## Launching a Spark application with yarn-client mode.
    -
    -With yarn-client mode, the application will be launched locally, just like 
running an application or spark-shell on Local / Mesos / Standalone client 
mode. The launch method is also the same, just make sure to specify the master 
URL as "yarn-client". You also need to export the env value for SPARK_JAR.
    +To launch a Spark application in yarn-cluster mode:
     
    -Configuration in yarn-client mode:
    +    ./bin/spark-submit --class path.to.your.Class --master yarn-cluster 
[options] <app jar> [app options]
    --- End diff --
    
    This works, but I thought the preferred way was:
    
      --master yarn --deploy-mode [client|cluster]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1492. Update Spark YARN docs to use spar...

Reply via email to