GitHub user dhruve opened a pull request:
https://github.com/apache/spark/pull/22288
[SPARK-22148] Acquire new executors to avoid hang because of blacklisting
## What changes were proposed in this pull request?
Every time a task is unschedulable because of the condition where no. of
task failures < no. of executors available, we currently abort the taskSet -
failing the job. This change tries to acquire new executors if
dynamicAllocation is turned on so that we can complete the job successfully.
## How was this patch tested?
I performed some manual tests to check and validate the behavior.
```scala
val rdd = sc.parallelize(Seq(1 to 10), 3)
import org.apache.spark.TaskContext
val mapped = rdd.mapPartitionsWithIndex ( (index, iterator) => { if (index
== 2) { Thread.sleep(30 * 1000); val attemptNum =
TaskContext.get.attemptNumber; if (attemptNum < 3) throw new Exception("Fail
for blacklisting")}; iterator.toList.map (x => x + " -> " + index).iterator } )
mapped.collect
```
Note: I am putting up this PR as initial draft to review the approach.
Todo List:
- Add unit tests
- Agree upon the conf name & value and update the docs
We can build on this approach further by:
- Taking into account static allocation
- Querying the RM to figure out if its a small cluster, then try to wait
some more or abort immediately.
- Try to distinguish between waiting for time while you acquire an executor
and time for being unable to schedule a task.
Open to suggestions.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhruve/spark bug/SPARK-22148
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22288.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22288
----
commit 5253b3134119b2a28cdaa1406d7bafb55f62cbc1
Author: Dhruve Ashar <dhruveashar@...>
Date: 2018-08-30T18:08:58Z
[SPARK-22148] Acquire new executors to avoid hang because of blacklisting
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]