GitHub user dhruve opened a pull request:

    https://github.com/apache/spark/pull/19194

    [SPARK-20589] Allow limiting task concurrency per stage

    ## What changes were proposed in this pull request?
    This change allows the user to specify the maximum no. of tasks running in 
a given job group. (Kindly see the jira comments section for more context on 
why this is implemented at a job group level rather than a stage level). This 
change is beneficial where the user wants to avoid having a DoS while trying to 
access an eternal service from multiple executors without having the need to 
repartition or coalesce existing RDDs.
    
    This code change introduces a new user level configuration: 
`spark.job.[userJobGroup].maxConcurrentTasks` which is used to set the active 
no. of tasks executing at a given point in time.
    
    The user can use the feature by setting the appropriate jobGroup and 
passing the conf:
    
    ```
    conf.set("spark.job.group1.maxConcurrentTasks", "10")
    ...
    sc.setJobGroup("group1", "", false)
    sc.parallelize(1 to 100000, 10).map(x => x + 1).count
    sc.clearJobGroup
    ```
    
    #### changes proposed in this fix 
    This change limits the no. of tasks (in turn also the no. of executors to 
be acquired) than can run simultaneously in a given job group and its 
subsequent job/s and stage/s if the appropriate job group and max concurrency 
configs are set.
    
    ## How was this patch tested?
    Ran unit tests and multiple manual tests with various combinations of:
    - single/multiple/no job groups
    - executors with single/multi cores
    - dynamic allocation on/off


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhruve/spark impr/SPARK-20589

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19194
    
----
commit 4281151df9010b4e9fe91e588c07e872b8e0dd69
Author: Dhruve Ashar <[email protected]>
Date:   2017-09-11T16:45:49Z

    [SPARK-20589] Allow limiting task concurrency per stage

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to