GitHub user dhruve opened a pull request:
https://github.com/apache/spark/pull/19194
[SPARK-20589] Allow limiting task concurrency per stage
## What changes were proposed in this pull request?
This change allows the user to specify the maximum no. of tasks running in
a given job group. (Kindly see the jira comments section for more context on
why this is implemented at a job group level rather than a stage level). This
change is beneficial where the user wants to avoid having a DoS while trying to
access an eternal service from multiple executors without having the need to
repartition or coalesce existing RDDs.
This code change introduces a new user level configuration:
`spark.job.[userJobGroup].maxConcurrentTasks` which is used to set the active
no. of tasks executing at a given point in time.
The user can use the feature by setting the appropriate jobGroup and
passing the conf:
```
conf.set("spark.job.group1.maxConcurrentTasks", "10")
...
sc.setJobGroup("group1", "", false)
sc.parallelize(1 to 100000, 10).map(x => x + 1).count
sc.clearJobGroup
```
#### changes proposed in this fix
This change limits the no. of tasks (in turn also the no. of executors to
be acquired) than can run simultaneously in a given job group and its
subsequent job/s and stage/s if the appropriate job group and max concurrency
configs are set.
## How was this patch tested?
Ran unit tests and multiple manual tests with various combinations of:
- single/multiple/no job groups
- executors with single/multi cores
- dynamic allocation on/off
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhruve/spark impr/SPARK-20589
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19194.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19194
----
commit 4281151df9010b4e9fe91e588c07e872b8e0dd69
Author: Dhruve Ashar <[email protected]>
Date: 2017-09-11T16:45:49Z
[SPARK-20589] Allow limiting task concurrency per stage
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]