Re: multiple concurrent jobs

Imran Rashid Tue, 19 Nov 2013 11:50:48 -0800

btw, there is an open PR to allow spreadOut to be configured per-app,
instead of per-cluster


https://github.com/apache/incubator-spark/pull/136

On Tue, Nov 19, 2013 at 11:20 AM, Mark Hamstra <[email protected]> wrote:
> No, it's my fault for not reading more carefully.  We do use a somewhat
> overloaded and specialized lexicon to describe Spark, which helps when it is
> used uniformly, but penalizes those who leap to misunderstanding.  Prashant
> is correct that the largest granularity thing that a user launches to do
> Spark work and that is associated with its own SparkContext is what we call
> an application.  A job is what is launched by invoking a Spark action on an
> RDD.  There can be multiple jobs within an application, and those jobs are
> scheduled either FIFO or with the fair scheduler.  Going to to even smaller
> granularities, jobs can contain multiple stages (defined or broken up at
> shuffle boundaries), and stages are associated with task sets containing
> multiple tasks, the units of work that actually run on worker nodes.
>
> Anyway, Prashant's response about spreadOut is appropriate for
> application-level scheduling.
>
>
>
> On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg <[email protected]>
> wrote:
>>
>> My bad - I should have stated that up front. I guess it was kind of
>> implicit within my question.
>>
>> Thanks for your help,
>>
>> Yadid
>>
>>
>>
>> On 11/19/13 10:59 AM, Mark Hamstra wrote:
>>
>> Ah, sorry -- misunderstood the question.
>>
>>
>> On Nov 19, 2013, at 7:48 AM, Prashant Sharma <[email protected]> wrote:
>>
>> I think that is Scheduling Within an Application, and he asked across
>> apps. Actually spark standalone supports two ways of scheduling both are
>> FIFO type.
>> http://spark.incubator.apache.org/docs/latest/spark-standalone.html
>>
>> One is spread out mode and the other is use as fewer node as possible [1]
>>
>> 1.
>> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L383
>>
>>
>>
>>
>> On Tue, Nov 19, 2013 at 9:02 PM, Mark Hamstra <[email protected]>
>> wrote:
>> >>
>> >> According to the documentation, spark standalone currently only
>> >> supports a FIFO scheduling system.
>> >
>> >
>> > That's not true.
>> >
>> > [sorry for the prior misfire]
>> >
>> >
>> >
>> > On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra <[email protected]>
>> > wrote:
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> According to the documentation, spark standalone currently only
>> >>> supports a FIFO scheduling system.
>> >>> I understand its possible to limit the number of cores a job uses by
>> >>> setting spark.cores.max.
>> >>> When running a job, will spark try using the max number of cores on
>> >>> each machine until it reaches the set limit, or will it do this round 
>> >>> robin
>> >>> style - utilize a single core on each machine -  if its already used a 
>> >>> core
>> >>> on all of the slaves and the limit has not been reached, spark will 
>> >>> utilize
>> >>> an additional core on each machine and so on.
>> >>>
>> >>> I think the latter make more sense, but I want to be sure that is the
>> >>> case.
>> >>>
>> >>> Thanks,
>> >>> Yadid
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> s
>>
>>
>

Re: multiple concurrent jobs

Reply via email to