Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/21698
Actually I have been thinking about the issue and feel we can still go
further with the current patch to fix the issue with child stages @mridulm
mentioned above, that requires track all the preceding stages that can generate
non-deterministic output for each active/finished stage, and also track the
stages that have partially finished (some tasks in the stage have finished,
while the rest still running or pending). In case of FetchFailure(or
ExecutorLost/WorkerLost), if the affected stage can generate non-deterministic
output, then clear all the shuffle files from that stage (to make sure weâll
retry all tasks for that stage), and also check all partially finished stages
that are succeeding stages of the affected stage, also clear all the shuffle
files of them (If a stage is succeeding stage of the affected stage but it has
already finished successfully and no shuffle files are lost, then itâs fine
we donât need to retry it).
The above fix proposal requires more code refactoring of DAGScheduler, and
it shall consume some memories to store additional informations (assume you
have M active/finished stages, and N stages may generate non-deterministic
output, then it shall take at most M * N Int values to store the stageIds).
Normally N is by far smaller than the total number of stages, so I think the
extra cost shall be acceptable.
Actually, you always have to pay a lot of extra effort on FetchFailure,
this may increase the effort by several times, but shall be statistically fine
considering most Spark jobs are short-running and don't hit FetchFailure quite
often (The major advantage of this approach is that you don't pay for any
penalty if you don't hit FetchFailure).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]