[GitHub] [spark] srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation improvement: More detailed explanation of possible "deploy-mode"s

GitBox Wed, 13 Mar 2019 07:14:23 -0700

srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation 
improvement: More detailed explanation of possible "deploy-mode"s
URL: https://github.com/apache/spark/pull/23919#discussion_r265146573


 ##########
 File path: docs/submitting-applications.md
 ##########
 @@ -48,13 +48,28 @@ Some of the commonly used options are:
 * `application-jar`: Path to a bundled jar including your application and all 
dependencies. The URL must be globally visible inside of your cluster, for 
instance, an `hdfs://` path or a `file://` path that is present on all nodes.
 * `application-arguments`: Arguments passed to the main method of your main 
class, if any
 
-<b>&#8224;</b> A common deployment strategy is to submit your application from 
a gateway machine
-that is
-physically co-located with your worker machines (e.g. Master node in a 
standalone EC2 cluster).
-In this setup, `client` mode is appropriate. In `client` mode, the driver is 
launched directly
-within the `spark-submit` process which acts as a *client* to the cluster. The 
input and
-output of the application is attached to the console. Thus, this mode is 
especially suitable
-for applications that involve the REPL (e.g. Spark shell).
+<b>&#8224;</b> The exact behaviour depends on the resource-manager 
(standalone, YARN, Mesos etc.)
+used on the cluster. While some of them offer advanced features (see for 
example, the
+[support for 
Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode),
+where the driver can run inside a Kubernetes pod or on a physical host),
+generally, the following is applicable: in `client` mode,
+the driver is launched directly within the `spark-submit` process on
+the machine which was used to submit the Spark job, and it will act as a 
*client* to the cluster.
+In this mode, the input and output of the application is attached to the 
console, thus, this mode is
+especially suitable for applications that involve the REPL (e.g. Spark shell).
+Depending on the resource-manager used and its configuration, the handling of 
driver failures
+(termination or disconnect) might be different, but in most of the cases that 
ends the execution
+of the job on the (remote) cluster as well.
+
+If `cluster` mode is specified, the driver program is executed on one of the 
cluster machines, requiring no
+connection from the (client) machine which was used for submitting the Spark 
job: the Spark job will run
 
 Review comment:
   The word 'client' is probably confusing here; it's just the machine that ran 
spark-submit. If "client" here is used to mean the driver's machine, then this 
isn't the client.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation improvement: More detailed explanation of possible "deploy-mode"s

Reply via email to