One related question. Is there any way to automatically determine the optimal # of workers in yarn based on the data size, and available resources without explicitly specifying it when the job is lunched?
Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -------------------------------------- Web: http://alpinenow.com/ On Wed, Mar 12, 2014 at 2:50 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Pierre, > > Currently modifying the "slaves" file is the best way to do this > because in general we expect that users will want to launch workers on > any slave. > > I think you could hack something together pretty easily to allow this. > For instance if you modify the line in slaves.sh from this: > > for slave in `cat "$HOSTLIST"|sed "s/#.*$//;/^$/d"`; do > > to this > > for slave in `cat "$HOSTLIST"| head -n $NUM_SLAVES | sed > "s/#.*$//;/^$/d"`; do > > Then you could just set NUM_SLAVES before you stop/start. Not sure if > this helps much but maybe it's a bit faster. > > - Patrick > > On Wed, Mar 12, 2014 at 10:18 AM, Pierre Borckmans > <pierre.borckm...@realimpactanalytics.com> wrote: >> Hi there! >> >> I was performing some tests for benchmarking purposes, among other things to >> observe the evolution of the performances versus the number of workers. >> >> In that context, I was wondering if there is any easy way to choose the >> number of workers to be used in standalone mode, without having to change >> the "slaves" file, dispatch it, and restart the cluster ? >> >> >> Cheers, >> >> Pierre