How about hacking your way around it.
Start with max workers & keep killing them off after each run.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 13, 2014 at 2:00 AM, Pierre Borckmans <
pierre.borckm...@realimpactanalytics.com> wrote:

> Thanks Patrick.
>
> I could try that.
>
> But the idea was to be able to write a fully automated benchmark, varying
> the dataset size, the number of workers, the memory, … without having to
> stop/start the cluster each time.
>
> I was thinking something like SparkConf.set(“spark.max_number_workers”, n)
> would be useful in this context but maybe too specific to be implemented.
>
> Thanks anyway,
>
> Cheers
>
> Pierre
>
>
>
> On 12 Mar 2014, at 22:50, Patrick Wendell <pwend...@gmail.com> wrote:
>
> > Hey Pierre,
> >
> > Currently modifying the "slaves" file is the best way to do this
> > because in general we expect that users will want to launch workers on
> > any slave.
> >
> > I think you could hack something together pretty easily to allow this.
> > For instance if you modify the line in slaves.sh from this:
> >
> >  for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
> >
> > to this
> >
> >  for slave in `cat "$HOSTLIST"| head -n $NUM_SLAVES | sed
> > "s/#.*$//;/^$/d"`; do
> >
> > Then you could just set NUM_SLAVES before you stop/start. Not sure if
> > this helps much but maybe it's a bit faster.
> >
> > - Patrick
> >
> > On Wed, Mar 12, 2014 at 10:18 AM, Pierre Borckmans
> > <pierre.borckm...@realimpactanalytics.com> wrote:
> >> Hi there!
> >>
> >> I was performing some tests for benchmarking purposes, among other
> things to observe the evolution of the performances versus the number of
> workers.
> >>
> >> In that context, I was wondering if there is any easy way to choose the
> number of workers to be used in standalone mode, without having to change
> the "slaves" file, dispatch it, and restart the cluster ?
> >>
> >>
> >> Cheers,
> >>
> >> Pierre
>
>

Reply via email to