btw, there is an open PR to allow spreadOut to be configured per-app, instead of per-cluster
https://github.com/apache/incubator-spark/pull/136 On Tue, Nov 19, 2013 at 11:20 AM, Mark Hamstra <[email protected]> wrote: > No, it's my fault for not reading more carefully. We do use a somewhat > overloaded and specialized lexicon to describe Spark, which helps when it is > used uniformly, but penalizes those who leap to misunderstanding. Prashant > is correct that the largest granularity thing that a user launches to do > Spark work and that is associated with its own SparkContext is what we call > an application. A job is what is launched by invoking a Spark action on an > RDD. There can be multiple jobs within an application, and those jobs are > scheduled either FIFO or with the fair scheduler. Going to to even smaller > granularities, jobs can contain multiple stages (defined or broken up at > shuffle boundaries), and stages are associated with task sets containing > multiple tasks, the units of work that actually run on worker nodes. > > Anyway, Prashant's response about spreadOut is appropriate for > application-level scheduling. > > > > On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg <[email protected]> > wrote: >> >> My bad - I should have stated that up front. I guess it was kind of >> implicit within my question. >> >> Thanks for your help, >> >> Yadid >> >> >> >> On 11/19/13 10:59 AM, Mark Hamstra wrote: >> >> Ah, sorry -- misunderstood the question. >> >> >> On Nov 19, 2013, at 7:48 AM, Prashant Sharma <[email protected]> wrote: >> >> I think that is Scheduling Within an Application, and he asked across >> apps. Actually spark standalone supports two ways of scheduling both are >> FIFO type. >> http://spark.incubator.apache.org/docs/latest/spark-standalone.html >> >> One is spread out mode and the other is use as fewer node as possible [1] >> >> 1. >> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L383 >> >> >> >> >> On Tue, Nov 19, 2013 at 9:02 PM, Mark Hamstra <[email protected]> >> wrote: >> >> >> >> According to the documentation, spark standalone currently only >> >> supports a FIFO scheduling system. >> > >> > >> > That's not true. >> > >> > [sorry for the prior misfire] >> > >> > >> > >> > On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra <[email protected]> >> > wrote: >> >> >> >> >> >> >> >> >> >> On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg <[email protected]> >> >> wrote: >> >>> >> >>> Hi all, >> >>> >> >>> According to the documentation, spark standalone currently only >> >>> supports a FIFO scheduling system. >> >>> I understand its possible to limit the number of cores a job uses by >> >>> setting spark.cores.max. >> >>> When running a job, will spark try using the max number of cores on >> >>> each machine until it reaches the set limit, or will it do this round >> >>> robin >> >>> style - utilize a single core on each machine - if its already used a >> >>> core >> >>> on all of the slaves and the limit has not been reached, spark will >> >>> utilize >> >>> an additional core on each machine and so on. >> >>> >> >>> I think the latter make more sense, but I want to be sure that is the >> >>> case. >> >>> >> >>> Thanks, >> >>> Yadid >> >>> >> >> >> > >> >> >> >> -- >> s >> >> >
