mridulm commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r796929787
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1821,7 +1834,7 @@ private[spark] class DAGScheduler(
}
if (runningStages.contains(shuffleStage) &&
shuffleStage.pendingPartitions.isEmpty) {
- if (!shuffleStage.shuffleDep.shuffleMergeFinalized &&
+ if (!shuffleStage.shuffleDep.isShuffleMergeFinalizedMarked &&
shuffleStage.shuffleDep.getMergerLocs.nonEmpty) {
checkAndScheduleShuffleMergeFinalize(shuffleStage)
} else {
Review comment:
In `processShuffleMapStageCompletion`, add something like
```
if (!shuffleStage.isIndeterminate &&
shuffleStage.shuffleDep.shuffleMergeEnabled) {
shuffleStage.shuffleDep.setShuffleMergeAllowed(false)
}
```
to ensure we dont retry merge for determinate stages ?
This is strictly not related to this PR, so I am fine with doing it in a
follow up PR as well to keep the scope contained (we can file a jira in that
case).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]