[GitHub] [spark] srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation improvement: More detailed explanation of possible "deploy-mode"s

GitBox Sat, 09 Mar 2019 10:49:55 -0800

srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation 
improvement: More detailed explanation of possible "deploy-mode"s
URL: https://github.com/apache/spark/pull/23919#discussion_r264010134


 ##########
 File path: docs/submitting-applications.md
 ##########
 @@ -48,13 +48,22 @@ Some of the commonly used options are:
 * `application-jar`: Path to a bundled jar including your application and all 
dependencies. The URL must be globally visible inside of your cluster, for 
instance, an `hdfs://` path or a `file://` path that is present on all nodes.
 * `application-arguments`: Arguments passed to the main method of your main 
class, if any
 
-<b>&#8224;</b> A common deployment strategy is to submit your application from 
a gateway machine
-that is
-physically co-located with your worker machines (e.g. Master node in a 
standalone EC2 cluster).
-In this setup, `client` mode is appropriate. In `client` mode, the driver is 
launched directly
-within the `spark-submit` process which acts as a *client* to the cluster. The 
input and
-output of the application is attached to the console. Thus, this mode is 
especially suitable
-for applications that involve the REPL (e.g. Spark shell).
+<b>&#8224;</b>In `client` mode, the driver is launched directly within the 
`spark-submit` process on
+the machine which was used to submit the Spark job, and it will act as a 
*client* to the cluster.
 
 Review comment:
   I think I agree with @vanzin here. The comment above is accurate, and 
probably as far as we need to distinguish it for purposes of what this comment 
is trying to explain. I'd propose removing the special comment about K8S.
   
   I might phrase the distinction as:
   
   - In `client` mode, the `spark-submit` process runs the driver directly, 
from the machine running `spark-submit`. That process is the direct 'client' to 
the cluster workers running the executors, and runs as long as the job does. If 
this `spark-submit` process is terminated, the driver terminates and thus the 
job does too.
   - In `cluster` mode, the `spark-submit` process runs the driver on the 
cluster. That process and machine is not involved in running the job after 
submission. The job continues to run even if `spark-submit` terminates, as the 
driver 'client' is on the cluster. The job runs as long as the driver continues 
to execute on the cluster.
   
   That kind of thing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #23919: [MINOR][DOC] Documentation improvement: More detailed explanation of possible "deploy-mode"s

Reply via email to