Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/86#discussion_r10960773
--- Diff: docs/cluster-overview.md ---
@@ -50,6 +50,47 @@ The system currently supports three cluster managers:
In addition, Spark's [EC2 launch scripts](ec2-scripts.html) make it easy
to launch a standalone
cluster on Amazon EC2.
+# Launching Applications
+
+The recommended way to launch a compiled Spark application is through the
spark-submit script (located in the
+bin directory), which takes care of setting up the classpath with Spark
and its dependencies, as well as
+provides a layer over the different cluster managers and deploy modes that
Spark supports. It's usage is
+
+ spark-submit <jar> <options>
+
+Where options are any of:
+
+- **--class** - The main class to run.
+- **--master** - The URL of the cluster manager master, e.g.
spark://host:port, mesos://host:port, yarn,
+ or local.
+- **--deploy-mode** - "client" to run the driver in the client process or
"cluster" to run the driver in
+ a process on the cluster. For Mesos, only "client" is supported.
+- **--executor-memory** - Memory per executor (e.g. 1000M, 2G).
+- **--executor-cores** - Number of cores per executor.
+- **--driver-memory** - Memory for driver (e.g. 1000M, 2G)
+- **--name** - Name of the application.
+- **--arg** - Argument to be passed to the application's main class. This
option can be specified
+ multiple times to pass multiple arguments.
+- **--jars** - A comma-separated list of local jars to include on the
driver classpath and that
+ SparkContext.addJar will work with. Doesn't work on standalone with
'cluster' deploy mode.
+
+The following currently only work for Spark standalone with cluster deploy
mode:
+- **--driver-cores** - Cores for driver (Default: 1).
+- **--supervise** - If given, restarts the driver on failure.
+
+The following only works for Spark standalone and Mesos only:
+- **--total-executor-cores** - Total cores for all executors.
+
+The following currently only work for YARN:
+
+- **--queue** - The YARN queue to place the application in.
+- **--files** - Comma separated list of files to be placed next to all
executors
+- **--archives** - Comma separated list of archives to be extracted next
to all executors
+- **--num-executors** - Number of executors to start.
+
+The master and deploy mode can also be set with the MASTER and DEPLOY_MODE
environment variables.
--- End diff --
I think "deploy mode" is a new term that this PR introduces. Would you mind
adding it to the glossary below? I think it's something like:
```
Deploy mode: Distinguishes who is responsible for launching the driver. In
"cluster" mode the driver is launched inside of the cluster. In "client" mode,
the driver is launched outside of the cluster.
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---