GitHub user dhruve opened a pull request:

    https://github.com/apache/spark/pull/19157

    [SPARK-20589][Core][Scheduler] Allow limiting task concurrency per job group

    ## What changes were proposed in this pull request?
    This change allows the user to specify the maximum no. of tasks running in 
a given job group. (Kindly see the jira comments section for more context on 
why this is implemented at a job group level rather than a stage level). This 
change is beneficial where the user wants to avoid having a DoS while trying to 
access an eternal service from multiple executors without having the need to 
repartition or coalesce existing RDDs.
    
    This code change introduces a new user level configuration: 
`spark.job.[userJobGroup].maxConcurrentTasks` which is used to set the active 
no. of tasks executing at a given point in time.
    
    The user can use the feature by setting the appropriate jobGroup and 
passing the conf:
    
    ```
    conf.set("spark.job.group1.maxConcurrentTasks", "10")
    ...
    sc.setJobGroup("group1", "", false)
    sc.parallelize(1 to 100000, 10).map(x => x + 1).count
    sc.clearJobGroup
    ```
    
    #### changes proposed in this fix 
    This change limits the no. of tasks (in turn also the no. of executors to 
be acquired) than can run simultaneously in a given job group and its 
subsequent job/s and stage/s if the appropriate job group and max concurrency 
configs are set.
    
    ## How was this patch tested?
    Ran unit tests and multiple manual tests with various combinations of:
    - single/multiple/no job groups
    - executors with single/multi cores
    - dynamic allocation on/off


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhruve/spark impr/SPARK-20589

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19157
    
----
commit 824396c82977171c38ab5d7f6c0f84bc19eccaba
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-15T14:18:21Z

    [SPARK-20589] Allow limiting task concurrency per stage

commit d3f8162dab4ca7065d7f296fd03528ce6ddfb923
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-15T14:45:18Z

    Merge branch 'master' of github.com:apache/spark into impr/SPARK-20589

commit 824621286ffb107010409c4d0d3442550628247d
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-21T16:51:41Z

    Allow limiting task concurrency per stage in concurrent job groups

commit 517acb490ae5938a22c4175347f6bbc24b47781f
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-21T19:30:17Z

    Remove comment

commit 65941f7884551e84a13a6cc2e7488a01e7d8beec
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-21T19:42:05Z

    Fix comment style

commit 7aba73a31808f6b1017b85dfd4dd19e28365bd97
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-22T14:54:10Z

    Merge branch 'master' of github.com:apache/spark into impr/SPARK-20589

commit 0e518f00ce97fd5d17fe89792c2503d2514b0473
Author: Dhruve Ashar <[email protected]>
Date:   2017-08-22T15:38:01Z

    Fix new unit test and add comments

commit 8b3830004d69bd5f109fd9846f59583c23a910c7
Author: Dhruve Ashar <[email protected]>
Date:   2017-09-05T20:14:02Z

    Resolve merge conflict and add test for speculative task

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to