Thanks Patrick. I could try that.
But the idea was to be able to write a fully automated benchmark, varying the dataset size, the number of workers, the memory, … without having to stop/start the cluster each time. I was thinking something like SparkConf.set(“spark.max_number_workers”, n) would be useful in this context but maybe too specific to be implemented. Thanks anyway, Cheers Pierre On 12 Mar 2014, at 22:50, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Pierre, > > Currently modifying the "slaves" file is the best way to do this > because in general we expect that users will want to launch workers on > any slave. > > I think you could hack something together pretty easily to allow this. > For instance if you modify the line in slaves.sh from this: > > for slave in `cat "$HOSTLIST"|sed "s/#.*$//;/^$/d"`; do > > to this > > for slave in `cat "$HOSTLIST"| head -n $NUM_SLAVES | sed > "s/#.*$//;/^$/d"`; do > > Then you could just set NUM_SLAVES before you stop/start. Not sure if > this helps much but maybe it's a bit faster. > > - Patrick > > On Wed, Mar 12, 2014 at 10:18 AM, Pierre Borckmans > <pierre.borckm...@realimpactanalytics.com> wrote: >> Hi there! >> >> I was performing some tests for benchmarking purposes, among other things to >> observe the evolution of the performances versus the number of workers. >> >> In that context, I was wondering if there is any easy way to choose the >> number of workers to be used in standalone mode, without having to change >> the "slaves" file, dispatch it, and restart the cluster ? >> >> >> Cheers, >> >> Pierre