venkata91 commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r779745057
##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -144,12 +144,16 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C:
ClassTag](
_shuffleMergedFinalized = true
}
+ def shuffleMergeFinalized: Boolean = {
Review comment:
Yes I agree. With Adaptively fetching shuffle mergers, we need to check
whether the `shuffleId` is `shuffleMergeFinalized` or not but that internally
checks `shuffleMergeEnabled` which would be false since we didn't have enough
mergers during the start of the stage. Which is why we need to have an
additional method distinguishing whether shuffle merge is finalized vs whether
shuffleMergeEnabled and then shuffleMergeFinalized whenever we try to figure
which stage to be computed next. We need to prevent the next stage from
starting if the previous stage shuffle merge finalization is yet to be
completed
([here](https://github.com/apache/spark/blob/f6128a6f4215dc45a19209d799dd9bf98fab6d8a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L731)).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]