OK this is basically form my notes for Spark standalone. Worker process is the slave process
[image: Inline images 2] You start worker as you showed $SPARK_HOME/sbin/start-slaves.sh Now that picks up the worker host node names from $SPARK_HOME/conf/slaves files. So you still have to tell Spark where to run workers. However, if I am correct regardless of what you have specified in slaves, in this standalone mode there will not be any spark process spawned by the driver on the slaves. In all probability you will be running one spark-submit process on the driver node. You can see this through the output of jps|grep SparkSubmit and you will see the details by running jmonitor for that SparkSubmit job However, I still doubt whether Scheduling Across applications is feasible in standalone mode. The doc says *Standalone mode:* By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use *all available nodes*. You can limit the number of nodes an application uses by setting the spark.cores.max configuration property in it, or change the default for applications that don’t set this setting through spark.deploy.defaultCores. Finally, in addition to controlling cores, each application’s spark.executor.memory setting controls its memory use. It uses the word all available nodes but I am not convinced if it will use those nodes? Someone can possibly clarify this HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 20 May 2016 at 02:03, Mathieu Longtin <math...@closetwork.org> wrote: > Okay: > *host=my.local.server* > *port=someport* > > This is the spark-submit command, which runs on my local server: > *$SPARK_HOME/bin/spark-submit --master spark://$host:$port > --executor-memory 4g python-script.py with args* > > If I want 200 worker cores, I tell the cluster scheduler to run this > command on 200 cores: > *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g > spark://$host:$port * > > That's it. When the task starts, it uses all available workers. If for > some reason, not enough cores are available immediately, it still starts > processing with whatever it gets and the load will be spread further as > workers come online. > > > On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> In a normal operation we tell spark which node the worker processes can >> run by adding the nodenames to conf/slaves. >> >> Not very clear on this in your case all the jobs run locally with say 100 >> executor cores like below: >> >> >> ${SPARK_HOME}/bin/spark-submit \ >> >> --master local[*] \ >> >> --driver-memory xg \ --default would be 512M >> >> --num-executors=1 \ -- This is the constraint in >> stand-alone Spark cluster, whether specified or not >> >> --executor-memory=xG \ -- >> >> --executor-cores=n \ >> >> --master local[*] means all cores and --executor-cores in your case need >> not be specified? or you can cap it like above --executor-cores=n. If it >> is not specified then the Spark app will go and grab every core. Although >> in practice that does not happen it is just an upper ceiling. It is FIFO. >> >> What typical executor memory is specified in your case? >> >> Do you have a sample snapshot of spark-submit job by any chance Mathieu? >> >> Cheers >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 20 May 2016 at 00:27, Mathieu Longtin <math...@closetwork.org> wrote: >> >>> Mostly, the resource management is not up to the Spark master. >>> >>> We routinely start 100 executor-cores for 5 minute job, and they just >>> quit when they are done. Then those processor cores can do something else >>> entirely, they are not reserved for Spark at all. >>> >>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Then in theory every user can fire multiple spark-submit jobs. do you >>>> cap it with settings in $SPARK_HOME/conf/spark-defaults.conf , but I >>>> guess in reality every user submits one job only. >>>> >>>> This is an interesting model for two reasons: >>>> >>>> >>>> - It uses parallel processing across all the nodes or most of the >>>> nodes to minimise the processing time >>>> - it requires less intervention >>>> >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 19 May 2016 at 21:33, Mathieu Longtin <math...@closetwork.org> >>>> wrote: >>>> >>>>> Driver memory is default. Executor memory depends on job, the caller >>>>> decides how much memory to use. We don't specify --num-executors as we >>>>> want >>>>> all cores assigned to the local master, since they were started by the >>>>> current user. No local executor. --master=spark://localhost:someport. 1 >>>>> core per executor. >>>>> >>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Thanks Mathieu >>>>>> >>>>>> So it would be interesting to see what resources allocated in your >>>>>> case, especially the num-executors and executor-cores. I gather every >>>>>> node >>>>>> has enough memory and cores. >>>>>> >>>>>> >>>>>> >>>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>>> >>>>>> --master local[2] \ >>>>>> >>>>>> --driver-memory 4g \ >>>>>> >>>>>> --num-executors=1 \ >>>>>> >>>>>> --executor-memory=4G \ >>>>>> >>>>>> --executor-cores=2 \ >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> >>>>>> wrote: >>>>>> >>>>>>> The driver (the process started by spark-submit) runs locally. The >>>>>>> executors run on any of thousands of servers. So far, I haven't tried >>>>>>> more >>>>>>> than 500 executors. >>>>>>> >>>>>>> Right now, I run a master on the same server as the driver. >>>>>>> >>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> ok so you are using some form of NFS mounted file system shared >>>>>>>> among the nodes and basically you start the processes through >>>>>>>> spark-submit. >>>>>>>> >>>>>>>> In Stand-alone mode, a simple cluster manager included with >>>>>>>> Spark. It does the management of resources so it is not clear to >>>>>>>> me what you are referring as worker manager here? >>>>>>>> >>>>>>>> This is my take from your model. >>>>>>>> The application will go and grab all the cores in the cluster. >>>>>>>> You only have one worker that lives within the driver JVM process. >>>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>>> tasks. In this case there is only one executor for the Driver? The >>>>>>>> Executor >>>>>>>> runs tasks for the Driver. >>>>>>>> >>>>>>>> >>>>>>>> HTH >>>>>>>> >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> No master and no node manager, just the processes that do actual >>>>>>>>> work. >>>>>>>>> >>>>>>>>> We use the "stand alone" version because we have a shared file >>>>>>>>> system and a way of allocating computing resources already (Univa Grid >>>>>>>>> Engine). If an executor were to die, we have other ways of restarting >>>>>>>>> it, >>>>>>>>> we don't need the worker manager to deal with it. >>>>>>>>> >>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Mathieu >>>>>>>>>> >>>>>>>>>> What does this approach provide that the norm lacks? >>>>>>>>>> >>>>>>>>>> So basically each node has its master in this model. >>>>>>>>>> >>>>>>>>>> Are these supposed to be individual stand alone servers? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> First a bit of context: >>>>>>>>>>> We use Spark on a platform where each user start workers as >>>>>>>>>>> needed. This has the advantage that all permission management is >>>>>>>>>>> handled by >>>>>>>>>>> the OS, so the users can only read files they have permission to. >>>>>>>>>>> >>>>>>>>>>> To do this, we have some utility that does the following: >>>>>>>>>>> - start a master >>>>>>>>>>> - start worker managers on a number of servers >>>>>>>>>>> - "submit" the Spark driver program >>>>>>>>>>> - the driver then talks to the master, tell it how many >>>>>>>>>>> executors it needs >>>>>>>>>>> - the master tell the worker nodes to start executors and talk >>>>>>>>>>> to the driver >>>>>>>>>>> - the executors are started >>>>>>>>>>> >>>>>>>>>>> From here on, the master doesn't do much, neither do the process >>>>>>>>>>> manager on the worker nodes. >>>>>>>>>>> >>>>>>>>>>> What I would like to do is simplify this to: >>>>>>>>>>> - Start the driver program >>>>>>>>>>> - Start executors on a number of servers, telling them where to >>>>>>>>>>> find the driver >>>>>>>>>>> - The executors connect directly to the driver >>>>>>>>>>> >>>>>>>>>>> Is there a way I could do this without the master and worker >>>>>>>>>>> managers? >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Mathieu Longtin >>>>>>>>>>> 1-514-803-8977 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> Mathieu Longtin >>>>>>>>> 1-514-803-8977 >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> Mathieu Longtin >>>>>>> 1-514-803-8977 >>>>>>> >>>>>> >>>>>> -- >>>>> Mathieu Longtin >>>>> 1-514-803-8977 >>>>> >>>> >>>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> -- > Mathieu Longtin > 1-514-803-8977 >