You mean that once a job is in a waiting queue, it won't take advantage of
additional workers that happened to be added after the job was put into the
waiting queue?
That would be less than optimal. But it would be OK with us for now as long
as the additional workers will be taken advantage of by
The ACID test will come when you start two or more Spark processes
simultaneously. If you see queuing (i.e. second job waiting for the first
job to finish in Spark GUI) then you may not have enough resources for Yarn
to accommodate two jobs despite the additional worker process.
Dr Mich
Yea, that seems to be the case. It seems that dynamically resizing a
standalone Spark cluster is very simple.
Thanks!
On Mon, Mar 28, 2016 at 10:22 PM, Mich Talebzadeh wrote:
> start-all start the master and anything else in slaves file
> start-master.sh starts the
start-all start the master and anything else in slaves file
start-master.sh starts the master only.
I use start-slaves.sh for my purpose with added nodes to slaves file.
When you run start-slave.sh you are creating another
worker process on the master host. You can check the status on Spark
It seems that the conf/slaves file is only for consumption by the following
scripts:
sbin/start-slaves.sh
sbin/stop-slaves.sh
sbin/start-all.sh
sbin/stop-all.sh
I.e., conf/slaves file doesn't affect a running cluster.
Is this true?
On Mon, Mar 28, 2016 at 9:31 PM, Sung Hwan Chung
No I didn't add it to the conf/slaves file.
What I want to do is leverage auto-scale from AWS, without needing to stop
all the slaves (e.g. if a lot of slaves are idle, terminate those).
Also, the book-keeping is easier if I don't have to deal with some
centralized list of slave list that needs
Have you added the slave host name to $SPARK_HOME/conf?
Then you can use start-slaves.sh or stop-slaves.sh for all instances
The assumption is that slave boxes have $SPARK_HOME installed in the same
directory as $SPARK_HOME is installed in the master.
HTH
Dr Mich Talebzadeh
LinkedIn *
Hello,
I found that I could dynamically add/remove new workers to a running
standalone Spark cluster by simply triggering:
start-slave.sh (SPARK_MASTER_ADDR)
and
stop-slave.sh
E.g., I could instantiate a new AWS instance and just add it to a running
cluster without needing to add it to slaves