Re: multiple concurrent jobs

Mark Hamstra Tue, 19 Nov 2013 09:29:01 -0800

No, it's my fault for not reading more carefully.  We do use a somewhat
overloaded and specialized lexicon to describe Spark, which helps when it
is used uniformly, but penalizes those who leap to misunderstanding.
 Prashant is correct that the largest granularity thing that a user
launches to do Spark work and that is associated with its own SparkContext
is what we call an application.  A job is what is launched by invoking a
Spark action on an RDD.  There can be multiple jobs within an application,
and those jobs are scheduled either FIFO or with the fair scheduler.  Going
to to even smaller granularities, jobs can contain multiple stages (defined
or broken up at shuffle boundaries), and stages are associated with task
sets containing multiple tasks, the units of work that actually run on
worker nodes.


Anyway, Prashant's response about spreadOut is appropriate for
application-level scheduling.



On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg <[email protected]>wrote:

>  My bad - I should have stated that up front. I guess it was kind of
> implicit within my question.
>
> Thanks for your help,
>
> Yadid
>
>
>
> On 11/19/13 10:59 AM, Mark Hamstra wrote:
>
> Ah, sorry -- misunderstood the question.
>
>
> On Nov 19, 2013, at 7:48 AM, Prashant Sharma <[email protected]> wrote:
>
>   I think that is Scheduling Within an Application, and he asked across
> apps. Actually spark standalone supports two ways of scheduling both are
> FIFO type.
> http://spark.incubator.apache.org/docs/latest/spark-standalone.html
>
>  One is spread out mode and the other is use as fewer node as possible
> [1]
>
>  1.
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L383
>
>
>
>
> On Tue, Nov 19, 2013 at 9:02 PM, Mark Hamstra <[email protected]>
> wrote:
> >>
> >> According to the documentation, spark standalone currently only
> supports a FIFO scheduling system.
> >
> >
> > That's not true.
> >
> > [sorry for the prior misfire]
> >
> >
> >
> > On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra <[email protected]>
> wrote:
> >>
> >>
> >>
> >>
> >> On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg <[email protected]>
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> According to the documentation, spark standalone currently only
> supports a FIFO scheduling system.
> >>> I understand its possible to limit the number of cores a job uses by
> setting spark.cores.max.
> >>> When running a job, will spark try using the max number of cores on
> each machine until it reaches the set limit, or will it do this round robin
> style - utilize a single core on each machine -  if its already used a core
> on all of the slaves and the limit has not been reached, spark will utilize
> an additional core on each machine and so on.
> >>>
> >>> I think the latter make more sense, but I want to be sure that is the
> case.
> >>>
> >>> Thanks,
> >>> Yadid
> >>>
> >>
> >
>
>
>
> --
> s
>
>
>

Re: multiple concurrent jobs

Reply via email to