No, it's my fault for not reading more carefully. We do use a somewhat overloaded and specialized lexicon to describe Spark, which helps when it is used uniformly, but penalizes those who leap to misunderstanding. Prashant is correct that the largest granularity thing that a user launches to do Spark work and that is associated with its own SparkContext is what we call an application. A job is what is launched by invoking a Spark action on an RDD. There can be multiple jobs within an application, and those jobs are scheduled either FIFO or with the fair scheduler. Going to to even smaller granularities, jobs can contain multiple stages (defined or broken up at shuffle boundaries), and stages are associated with task sets containing multiple tasks, the units of work that actually run on worker nodes.
Anyway, Prashant's response about spreadOut is appropriate for application-level scheduling. On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg <[email protected]>wrote: > My bad - I should have stated that up front. I guess it was kind of > implicit within my question. > > Thanks for your help, > > Yadid > > > > On 11/19/13 10:59 AM, Mark Hamstra wrote: > > Ah, sorry -- misunderstood the question. > > > On Nov 19, 2013, at 7:48 AM, Prashant Sharma <[email protected]> wrote: > > I think that is Scheduling Within an Application, and he asked across > apps. Actually spark standalone supports two ways of scheduling both are > FIFO type. > http://spark.incubator.apache.org/docs/latest/spark-standalone.html > > One is spread out mode and the other is use as fewer node as possible > [1] > > 1. > https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L383 > > > > > On Tue, Nov 19, 2013 at 9:02 PM, Mark Hamstra <[email protected]> > wrote: > >> > >> According to the documentation, spark standalone currently only > supports a FIFO scheduling system. > > > > > > That's not true. > > > > [sorry for the prior misfire] > > > > > > > > On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra <[email protected]> > wrote: > >> > >> > >> > >> > >> On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg <[email protected]> > wrote: > >>> > >>> Hi all, > >>> > >>> According to the documentation, spark standalone currently only > supports a FIFO scheduling system. > >>> I understand its possible to limit the number of cores a job uses by > setting spark.cores.max. > >>> When running a job, will spark try using the max number of cores on > each machine until it reaches the set limit, or will it do this round robin > style - utilize a single core on each machine - if its already used a core > on all of the slaves and the limit has not been reached, spark will utilize > an additional core on each machine and so on. > >>> > >>> I think the latter make more sense, but I want to be sure that is the > case. > >>> > >>> Thanks, > >>> Yadid > >>> > >> > > > > > > -- > s > > >
