otterc opened a new pull request, #36293:
URL: https://github.com/apache/spark/pull/36293
### What changes were proposed in this pull request?
This change fixes the scenarios where a stage re-attempt doesn't complete
successfully, even though all the tasks complete when push-based shuffle is
enabled. With Adaptive Merge Finalization, a stage may be triggered for
finalization when it is the below state:
- The stage is not running (not in the running set of the DAGScheduler) -
had failed or canceled or waiting, and
- The stage has no pending partitions (all the tasks completed at-least once)
For such a stage when the finalization completes, the stage will still not
be marked as mergeFinalized.
The stage of the stage will be:
- `stage.shuffleDependency.mergeFinalized = false`
- `stage.shuffleDependency.getFinalizeTask != Nil`
- Merged statuses of the state are unregistered
When the stage is resubmitted, the newer attempt of the stage will never
complete even though its tasks may be completed. This is because the newer
attempt of the stage will have `shuffleMergeEnabled = true`, since with the
previous attempt the stage was never marked as mergedFinalized, and the
finalizeTask is present (from finalization attempt for previous stage attempt).
So, when all the tasks of the newer attempt complete, then these conditions
will be true:
- stage will be running
- There will be no pending partitions since all the tasks completed
- `stage.shuffleDependency.shuffleMergeEnabled = true`
- `stage.shuffleDependency.shuffleMergeFinalized = false`
- `stage.shuffleDependency.getFinalizeTask` is `not empty`
This leads the DAGScheduler to try scheduling finalization and not trigger
the completion of the Stage. However because of the last condition it never
even schedules the finalization and the stage never completes.
### Why are the changes needed?
The change fixes the above issue where the application gets stalled as some
stages don't complete successfully.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I have just modified the existing UT. A stage will be marked finalized
irrespective of its state and for deterministic stage we don't want to
unregister merge results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]