Re: Starting executor without a master

2016-05-20 Thread Mathieu Longtin
Correct, what I do to start workers is the equivalent of start-slaves.sh. It ends up running the same command on the worker servers as start-slaves does. It definitively uses all workers, and workers starting later pick up work as well. If you have a long running job, you can add workers dynamical

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
OK this is basically form my notes for Spark standalone. Worker process is the slave process [image: Inline images 2] You start worker as you showed $SPARK_HOME/sbin/start-slaves.sh Now that picks up the worker host node names from $SPARK_HOME/conf/slaves files. So you still have to tell Spark

Re: Starting executor without a master

2016-05-19 Thread Marcelo Vanzin
On Thu, May 19, 2016 at 6:06 PM, Mathieu Longtin wrote: > I'm looking to bypass the master entirely. I manage the workers outside of > Spark. So I want to start the driver, the start workers that connect > directly to the driver. It should be possible to do that if you extend the interface I ment

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
I'm looking to bypass the master entirely. I manage the workers outside of Spark. So I want to start the driver, the start workers that connect directly to the driver. Anyway, it looks like I will have to live with our current solution for a while. On Thu, May 19, 2016 at 8:32 PM Marcelo Vanzin

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
Okay: *host=my.local.server* *port=someport* This is the spark-submit command, which runs on my local server: *$SPARK_HOME/bin/spark-submit --master spark://$host:$port --executor-memory 4g python-script.py with args* If I want 200 worker cores, I tell the cluster scheduler to run this command on

Re: Starting executor without a master

2016-05-19 Thread Marcelo Vanzin
Hi Mathieu, There's nothing like that in Spark currently. For that, you'd need a new cluster manager implementation that knows how to start executors in those remote machines (e.g. by running ssh or something). In the current master there's an interface you can implement to try that if you really

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
In a normal operation we tell spark which node the worker processes can run by adding the nodenames to conf/slaves. Not very clear on this in your case all the jobs run locally with say 100 executor cores like below: ${SPARK_HOME}/bin/spark-submit \ --master local[*] \

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
Mostly, the resource management is not up to the Spark master. We routinely start 100 executor-cores for 5 minute job, and they just quit when they are done. Then those processor cores can do something else entirely, they are not reserved for Spark at all. On Thu, May 19, 2016 at 4:55 PM Mich Tal

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
Then in theory every user can fire multiple spark-submit jobs. do you cap it with settings in $SPARK_HOME/conf/spark-defaults.conf , but I guess in reality every user submits one job only. This is an interesting model for two reasons: - It uses parallel processing across all the nodes or mos

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
Driver memory is default. Executor memory depends on job, the caller decides how much memory to use. We don't specify --num-executors as we want all cores assigned to the local master, since they were started by the current user. No local executor. --master=spark://localhost:someport. 1 core per e

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
Thanks Mathieu So it would be interesting to see what resources allocated in your case, especially the num-executors and executor-cores. I gather every node has enough memory and cores. ${SPARK_HOME}/bin/spark-submit \ --master local[2] \ --driver-memory 4g \

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
The driver (the process started by spark-submit) runs locally. The executors run on any of thousands of servers. So far, I haven't tried more than 500 executors. Right now, I run a master on the same server as the driver. On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh wrote: > ok so you are us

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
ok so you are using some form of NFS mounted file system shared among the nodes and basically you start the processes through spark-submit. In Stand-alone mode, a simple cluster manager included with Spark. It does the management of resources so it is not clear to me what you are referring as work

Re: Starting executor without a master

2016-05-19 Thread Mathieu Longtin
No master and no node manager, just the processes that do actual work. We use the "stand alone" version because we have a shared file system and a way of allocating computing resources already (Univa Grid Engine). If an executor were to die, we have other ways of restarting it, we don't need the w

Re: Starting executor without a master

2016-05-19 Thread Mich Talebzadeh
Hi Mathieu What does this approach provide that the norm lacks? So basically each node has its master in this model. Are these supposed to be individual stand alone servers? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Starting executor without a master

2016-05-19 Thread Mathieu Longtin
First a bit of context: We use Spark on a platform where each user start workers as needed. This has the advantage that all permission management is handled by the OS, so the users can only read files they have permission to. To do this, we have some utility that does the following: - start a mast