Re: Running Spark in local mode

Jonathan Kelly Sun, 19 Jun 2016 11:09:42 -0700

Mich, what Jacek is saying is not that you implied that YARN relies on two
masters. He's just clarifying that yarn-client and yarn-cluster modes are
really both using the same (type of) master (simply "yarn"). In fact, if
you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
will translate that into using a master URL of "yarn" and a deploy-mode of
"client" or "cluster".


And thanks, Jacek, for the tips on the "less-common master URLs". I had no
idea that was an option!

~ Jonathan

On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Good points but I am an experimentalist
>
> In Local mode I have this
>
> In local mode with:
>
> --master local
>
>
>
> This will start with one thread or equivalent to –master local[1]. You can
> also start by more than one thread by specifying the number of threads *k*
> in –master local[k]. You can also start using all available threads with 
> –master
> local[*]which in mine would be local[12].
>
> The important thing about Local mode is that number of JVM thrown is
> controlled by you and you can start as many spark-submit as you wish within
> constraint of what you get
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>
>                 --driver-memory 2G \
>
>                 --num-executors 1 \
>
>                 --executor-memory 2G \
>
>                 --master local \
>
>                 --executor-cores 2 \
>
>                 --conf "spark.scheduler.mode=FIFO" \
>
>                 --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
>
>                 --jars
> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>
>                 --class "${FILE_NAME}" \
>
>                 --conf "spark.ui.port=4040” \
>
>                 ${JAR_FILE} \
>
>                 >> ${LOG_FILE}
>
> Now that does work fine although some of those parameters are implicit
> (for example cheduler.mode = FIFOR or FAIR and I can start different spark
> jobs in Local mode. Great for testing.
>
> With regard to your comments on Standalone
>
> Spark Standalone – a simple cluster manager included with Spark that
> makes it easy to set up a cluster.
>
> s/simple/built-in
> What is stated as "included" implies that, i.e. it comes as part of
> running Spark in standalone mode.
>
> Your other points on YARN cluster mode and YARN client mode
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
>
> Yes that is I believe what the text implied. I would be very surprised if
> YARN as a resource manager relies on two masters :)
>
>
> HTH
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
>> <mich.talebza...@gmail.com> wrote:
>>
>> > Spark Local - Spark runs on the local host. This is the simplest set up
>> and
>> > best suited for learners who want to understand different concepts of
>> Spark
>> > and those performing unit testing.
>>
>> There are also the less-common master URLs:
>>
>> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
>> threads and maxRetries number of failures.
>> * local-cluster[n, cores, memory] for simulating a Spark local cluster
>> with n workers, # cores per worker, and # memory per worker.
>>
>> As of Spark 2.0.0, you could also have your own scheduling system -
>> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
>> known implementation of the ExternalClusterManager contract in Spark
>> being YarnClusterManager, i.e. whenever you call Spark with --master
>> yarn.
>>
>> > Spark Standalone – a simple cluster manager included with Spark that
>> makes
>> > it easy to set up a cluster.
>>
>> s/simple/built-in
>>
>> > YARN Cluster Mode, the Spark driver runs inside an application master
>> > process which is managed by YARN on the cluster, and the client can go
>> away
>> > after initiating the application. This is invoked with –master yarn and
>> > --deploy-mode cluster
>> >
>> > YARN Client Mode, the driver runs in the client process, and the
>> application
>> > master is only used for requesting resources from YARN. Unlike Spark
>> > standalone mode, in which the master’s address is specified in the
>> --master
>> > parameter, in YARN mode the ResourceManager’s address is picked up from
>> the
>> > Hadoop configuration. Thus, the --master parameter is yarn. This is
>> invoked
>> > with --deploy-mode client
>>
>> I'd say there's only one YARN master, i.e. --master yarn. You could
>> however say where the driver runs, be it on your local machine where
>> you executed spark-submit or on one node in a YARN cluster.
>>
>> The same applies to Spark Standalone and Mesos and is controlled by
>> --deploy-mode, i.e. client (default) or cluster.
>>
>> Please update your notes accordingly ;-)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>
>

Re: Running Spark in local mode

Reply via email to