venkata91 commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r797872918
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -2313,7 +2326,7 @@ private[spark] class DAGScheduler(
// Register merge statuses if the stage is still running and shuffle merge
is not finalized yet.
// TODO: SPARK-35549: Currently merge statuses results which come after
shuffle merge
// TODO: is finalized is not registered.
- if (runningStages.contains(stage) &&
!stage.shuffleDep.shuffleMergeFinalized) {
+ if (runningStages.contains(stage) &&
!stage.shuffleDep.isShuffleMergeFinalizedMarked) {
Review comment:
@mridulm If `shuffleMergeId` is not same we are not finalizing the merge
right? But there could be a `finalize` request for the correct `shuffleMergeId`
but we could have messed up the `MergeStatuses` in that case coming from a
`shuffleMergeId`. Ok makes sense. Will file a separate JIRA for this one as
well.
On a side note, this is also another reason why we can
`setShuffleMergeAllowed(false)` for determinate stages in
`handleShuffleMergeFinalized` because it is possible that we might not finalize
due to different `shuffleMergeId` finalize request.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]