Yes, I see - I misused the term job.

On 11/19/13 12:20 PM, Mark Hamstra wrote:
No, it's my fault for not reading more carefully. We do use a somewhat overloaded and specialized lexicon to describe Spark, which helps when it is used uniformly, but penalizes those who leap to misunderstanding. Prashant is correct that the largest granularity thing that a user launches to do Spark work and that is associated with its own SparkContext is what we call an application. A job is what is launched by invoking a Spark action on an RDD. There can be multiple jobs within an application, and those jobs are scheduled either FIFO or with the fair scheduler. Going to to even smaller granularities, jobs can contain multiple stages (defined or broken up at shuffle boundaries), and stages are associated with task sets containing multiple tasks, the units of work that actually run on worker nodes.

Anyway, Prashant's response about spreadOut is appropriate for application-level scheduling.



On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg <[email protected] <mailto:[email protected]>> wrote:

    My bad - I should have stated that up front. I guess it was kind
    of implicit within my question.

    Thanks for your help,

    Yadid



    On 11/19/13 10:59 AM, Mark Hamstra wrote:
    Ah, sorry -- misunderstood the question.


    On Nov 19, 2013, at 7:48 AM, Prashant Sharma
    <[email protected] <mailto:[email protected]>> wrote:

    I think that is Scheduling Within an Application, and he asked
    across apps. Actually spark standalone supports two ways of
    scheduling both are FIFO type.
    http://spark.incubator.apache.org/docs/latest/spark-standalone.html

    One is spread out mode and the other is use as fewer node as
    possible [1]

    1.
    
https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L383




    On Tue, Nov 19, 2013 at 9:02 PM, Mark Hamstra
    <[email protected] <mailto:[email protected]>> wrote:
    >>
    >> According to the documentation, spark standalone currently
    only supports a FIFO scheduling system.
    >
    >
    > That's not true.
    >
    > [sorry for the prior misfire]
    >
    >
    >
    > On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra
    <[email protected] <mailto:[email protected]>> wrote:
    >>
    >>
    >>
    >>
    >> On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg
    <[email protected] <mailto:[email protected]>> wrote:
    >>>
    >>> Hi all,
    >>>
    >>> According to the documentation, spark standalone currently
    only supports a FIFO scheduling system.
    >>> I understand its possible to limit the number of cores a job
    uses by setting spark.cores.max.
    >>> When running a job, will spark try using the max number of
    cores on each machine until it reaches the set limit, or will it
    do this round robin style - utilize a single core on each
    machine -  if its already used a core on all of the slaves and
    the limit has not been reached, spark will utilize an additional
    core on each machine and so on.
    >>>
    >>> I think the latter make more sense, but I want to be sure
    that is the case.
    >>>
    >>> Thanks,
    >>> Yadid
    >>>
    >>
    >



    --
    s



Reply via email to