GitHub user liutang123 opened a pull request:
https://github.com/apache/spark/pull/22202
[SPARK-25211][Core] speculation and fetch failed result in hang of job
## What changes were proposed in this pull request?
In current `DAGScheduler.handleTaskCompletion` code, when a shuffleMapStage
with job not in runningStages and its `pendingPartitions` is empty, the job of
this shuffleMapStage will never complete.
*Think about below*
1. Stage 0 runs and generates shuffle output data.
2. Stage 1 reads the output from stage 0 and generates more shuffle data.
It has two tasks with the same partition: ShuffleMapTask0 and ShuffleMapTask0.1.
3. ShuffleMapTask0 fails to fetch blocks and sends a FetchFailed to the
driver. The driver resubmits stage 0 and stage 1. The driver will place stage 0
in runningStages and place stage 1 in waitingStages.
4. ShuffleMapTask0.1 successfully finishes and sends Success back to
driver. The driver will add the mapstatus to the set of output locations of
stage 1. because of stage 1 not in runningStages, the job will not complete.
5. stage 0 completes and the driver will run stage 1. But, because the
output sets of stage 1 is complete, the drive will not submit any tasks and
make stage 1 complte right now. Because the job complete relay on the
`CompletionEvent` and there will never a `CompletionEvent` come, the job will
hang.
## How was this patch tested?
UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liutang123/spark SPARK-25211
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22202.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22202
----
commit 4f51199daafec0466a5ac836c4f6281f5ba45381
Author: liulijia <liutang123@...>
Date: 2018-08-23T13:42:13Z
[SPARK-25211][Core] speculation and fetch failed result in hang of job
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]