Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/601#discussion_r12169839
--- Diff: docs/running-on-yarn.md ---
@@ -47,83 +49,42 @@ System Properties:
# Launching Spark on YARN
Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which
contains the (client side) configuration files for the Hadoop cluster.
-These configs are used to connect to the cluster, write to the dfs, and
connect to the YARN ResourceManager.
+These configs are used to write to the dfs, and connect to the YARN
ResourceManager.
There are two deploy modes that can be used to launch Spark applications
on YARN. In yarn-cluster mode, the Spark driver runs inside an application
master process which is managed by YARN on the cluster, and the client can go
away after initiating the application. In yarn-client mode, the driver runs in
the client process, and the application master is only used for requesting
resources from YARN.
Unlike in Spark standalone and Mesos mode, in which the master's address
is specified in the "master" parameter, in YARN mode the ResourceManager's
address is picked up from the Hadoop configuration. Thus, the master parameter
is simply "yarn-client" or "yarn-cluster".
-The spark-submit script described in the [cluster mode
overview](cluster-overview.html) provides the most straightforward way to
submit a compiled Spark application to YARN in either deploy mode. For info on
the lower-level invocations it uses, read ahead. For running spark-shell
against YARN, skip down to the yarn-client section.
-
-## Launching a Spark application with yarn-cluster mode.
-
-The command to launch the Spark application on the cluster is as follows:
-
- SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./bin/spark-class
org.apache.spark.deploy.yarn.Client \
- --jar <YOUR_APP_JAR_FILE> \
- --class <APP_MAIN_CLASS> \
- --arg <APP_MAIN_ARGUMENT> \
- --num-executors <NUMBER_OF_EXECUTOR_PROCESSES> \
- --driver-memory <MEMORY_FOR_ApplicationMaster> \
- --executor-memory <MEMORY_PER_EXECUTOR> \
- --executor-cores <CORES_PER_EXECUTOR> \
- --name <application_name> \
- --queue <queue_name> \
- --addJars <any_local_files_used_in_SparkContext.addJar> \
- --files <files_for_distributed_cache> \
- --archives <archives_for_distributed_cache>
-
-To pass multiple arguments the "arg" option can be specified multiple
times. For example:
-
- # Build the Spark assembly JAR and the Spark examples JAR
- $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
-
- # Configure logging
- $ cp conf/log4j.properties.template conf/log4j.properties
-
- # Submit Spark's ApplicationMaster to YARN's ResourceManager, and
instruct Spark to run the SparkPi example
- $
SPARK_JAR=./assembly/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar
\
- ./bin/spark-class org.apache.spark.deploy.yarn.Client \
- --jar
examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar
\
- --class org.apache.spark.examples.SparkPi \
- --arg yarn-cluster \
- --arg 5 \
- --num-executors 3 \
- --driver-memory 4g \
- --executor-memory 2g \
- --executor-cores 1
-
-The above starts a YARN client program which starts the default
Application Master. Then SparkPi will be run as a child thread of Application
Master. The client will periodically poll the Application Master for status
updates and display them in the console. The client will exit once your
application has finished running. Refer to the "Viewing Logs" section below
for how to see driver and executor logs.
-
-Because the application is run on a remote machine where the Application
Master is running, applications that involve local interaction, such as
spark-shell, will not work.
-
-## Launching a Spark application with yarn-client mode.
-
-With yarn-client mode, the application will be launched locally, just like
running an application or spark-shell on Local / Mesos / Standalone client
mode. The launch method is also the same, just make sure to specify the master
URL as "yarn-client". You also need to export the env value for SPARK_JAR.
+To launch a Spark application in yarn-cluster mode:
-Configuration in yarn-client mode:
+ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster
[options] <app jar> [app options]
--- End diff --
This works, but I thought the preferred way was:
--master yarn --deploy-mode [client|cluster]
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---