Hi,

I have explained this in my following Linkedlin article "The Operational
Advantages of Spark as a Distributed Processing Framework
<https://www.linkedin.com/pulse/operational-advantages-spark-distributed-processing-mich/>
"

An extract

*2) YARN Deployment Modes*

The term D*eployment mode of Spark*, simply means that “where the driver
program will be run”. There are two ways, namely; *Spark Client Mode*
<https://spark.apache.org/docs/latest/running-on-yarn.html>* and **Spark
Cluster Mode* <https://spark.apache.org/docs/latest/cluster-overview.html>
*.* These are described below:

*In the Client mode,* *the driver daemon runs in the node through which you
submit the spark job to your cluster.* This is often done through the Edge
Node. This mode is valuable when you want to use spark interactively like
in our case where we would like to display high value prices in the
dashboard. In the Client mode you do not want to reserve any resource from
your cluster for the driver daemon

*In Cluster mode,* *you submit the spark job to your cluster and the driver
daemon is run inside your cluster and application master*. In this mode you
do not get to use the spark job interactively as the client through which
you submit the job is gone as soon as it successfully submits the job to
cluster. You will have to reserve some resources for the driver daemon
process as it will be running in your cluster.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 23 Mar 2019 at 21:13, Pat Ferrel <p...@occamsmachete.com> wrote:

> I have researched this for a significant amount of time and find answers
> that seem to be for a slightly different question than mine.
>
> The Spark 2.3.3 cluster is running fine. I see the GUI on “
> http://master-address:8080";, there are 2 idle workers, as configured.
>
> I have a Scala application that creates a context and starts execution of
> a Job. I *do not use spark-submit*, I start the Job programmatically and
> this is where many explanations forks from my question.
>
> In "my-app" I create a new SparkConf, with the following code (slightly
> abbreviated):
>
>       conf.setAppName(“my-job")
>       conf.setMaster(“spark://master-address:7077”)
>       conf.set(“deployMode”, “cluster”)
>       // other settings like driver and executor memory requests
>       // the driver and executor memory requests are for all mem on the
> slaves, more than
>       // mem available on the launching machine with “my-app"
>       val jars = listJars(“/path/to/lib")
>       conf.setJars(jars)
>       …
>
> When I launch the job I see 2 executors running on the 2 workers/slaves.
> Everything seems to run fine and sometimes completes successfully. Frequent
> failures are the reason for this question.
>
> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
> taking all cluster resources. With a Yarn cluster I would expect the
> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
> Master, where is the Drive part of the Job running?
>
> If is is running in the Master, we are in trouble because I start the
> Master on one of my 2 Workers sharing resources with one of the Executors.
> Executor mem + driver mem is > available mem on a Worker. I can change this
> but need so understand where the Driver part of the Spark Job runs. Is it
> in the Spark Master, or inside and Executor, or ???
>
> The “Driver” creates and broadcasts some large data structures so the need
> for an answer is more critical than with more typical tiny Drivers.
>
> Thanks for you help!
>

Reply via email to