Mich, what Jacek is saying is not that you implied that YARN relies on two masters. He's just clarifying that yarn-client and yarn-cluster modes are really both using the same (type of) master (simply "yarn"). In fact, if you specify "--master yarn-client" or "--master yarn-cluster", spark-submit will translate that into using a master URL of "yarn" and a deploy-mode of "client" or "cluster".
And thanks, Jacek, for the tips on the "less-common master URLs". I had no idea that was an option! ~ Jonathan On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Good points but I am an experimentalist > > In Local mode I have this > > In local mode with: > > --master local > > > > This will start with one thread or equivalent to –master local[1]. You can > also start by more than one thread by specifying the number of threads *k* > in –master local[k]. You can also start using all available threads with > –master > local[*]which in mine would be local[12]. > > The important thing about Local mode is that number of JVM thrown is > controlled by you and you can start as many spark-submit as you wish within > constraint of what you get > > ${SPARK_HOME}/bin/spark-submit \ > > --packages com.databricks:spark-csv_2.11:1.3.0 \ > > --driver-memory 2G \ > > --num-executors 1 \ > > --executor-memory 2G \ > > --master local \ > > --executor-cores 2 \ > > --conf "spark.scheduler.mode=FIFO" \ > > --conf > "spark.executor.extraJavaOptions=-XX:+PrintGCDetails > -XX:+PrintGCTimeStamps" \ > > --jars > /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \ > > --class "${FILE_NAME}" \ > > --conf "spark.ui.port=4040” \ > > ${JAR_FILE} \ > > >> ${LOG_FILE} > > Now that does work fine although some of those parameters are implicit > (for example cheduler.mode = FIFOR or FAIR and I can start different spark > jobs in Local mode. Great for testing. > > With regard to your comments on Standalone > > Spark Standalone – a simple cluster manager included with Spark that > makes it easy to set up a cluster. > > s/simple/built-in > What is stated as "included" implies that, i.e. it comes as part of > running Spark in standalone mode. > > Your other points on YARN cluster mode and YARN client mode > > I'd say there's only one YARN master, i.e. --master yarn. You could > however say where the driver runs, be it on your local machine where > you executed spark-submit or on one node in a YARN cluster. > > > Yes that is I believe what the text implied. I would be very surprised if > YARN as a resource manager relies on two masters :) > > > HTH > > > > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote: > >> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh >> <mich.talebza...@gmail.com> wrote: >> >> > Spark Local - Spark runs on the local host. This is the simplest set up >> and >> > best suited for learners who want to understand different concepts of >> Spark >> > and those performing unit testing. >> >> There are also the less-common master URLs: >> >> * local[n, maxRetries] or local[*, maxRetries] — local mode with n >> threads and maxRetries number of failures. >> * local-cluster[n, cores, memory] for simulating a Spark local cluster >> with n workers, # cores per worker, and # memory per worker. >> >> As of Spark 2.0.0, you could also have your own scheduling system - >> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only >> known implementation of the ExternalClusterManager contract in Spark >> being YarnClusterManager, i.e. whenever you call Spark with --master >> yarn. >> >> > Spark Standalone – a simple cluster manager included with Spark that >> makes >> > it easy to set up a cluster. >> >> s/simple/built-in >> >> > YARN Cluster Mode, the Spark driver runs inside an application master >> > process which is managed by YARN on the cluster, and the client can go >> away >> > after initiating the application. This is invoked with –master yarn and >> > --deploy-mode cluster >> > >> > YARN Client Mode, the driver runs in the client process, and the >> application >> > master is only used for requesting resources from YARN. Unlike Spark >> > standalone mode, in which the master’s address is specified in the >> --master >> > parameter, in YARN mode the ResourceManager’s address is picked up from >> the >> > Hadoop configuration. Thus, the --master parameter is yarn. This is >> invoked >> > with --deploy-mode client >> >> I'd say there's only one YARN master, i.e. --master yarn. You could >> however say where the driver runs, be it on your local machine where >> you executed spark-submit or on one node in a YARN cluster. >> >> The same applies to Spark Standalone and Mesos and is controlled by >> --deploy-mode, i.e. client (default) or cluster. >> >> Please update your notes accordingly ;-) >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> > >