GitHub user jiangxb1987 opened a pull request:
https://github.com/apache/spark/pull/22158
[SPARK-25161][Core] Fix several bugs in failure handling of barrier
execution mode
## What changes were proposed in this pull request?
Fix several bugs in failure handling of barrier execution mode:
* Mark TaskSet for a barrier stage as zombie when a task attempt fails;
* Multiple barrier task failures from a single barrier stage should not
trigger multiple stage retries;
* Barrier task failure from a previous failed stage attempt should not
trigger stage retry;
* Fail the job when a task from a barrier ResultStage failed;
* RDD.isBarrier() should not rely on `ShuffleDependency`s.
## How was this patch tested?
Added corresponding test cases in `DAGSchedulerSuite` and
`TaskSchedulerImplSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiangxb1987/spark failure
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22158.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22158
commit 32ea946c68c5f3108fb18f7e936ba440f7537144
Author: Xingbo Jiang
Date: 2018-08-20T17:19:35Z
update
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org