venkata91 commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r785245294
##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -144,12 +144,16 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C:
ClassTag](
_shuffleMergedFinalized = true
}
+ def shuffleMergeFinalized: Boolean = {
Review comment:
I refactored this in to `shuffleMergeAllowed` and `shuffleMergeEnabled`
to have a clear distinction where `shuffleMergeAllowed` controls the static
level of knobs like the following:
1. `Is RDD barrier?`
2. `Can Push shuffle be enabled?`
3. `Disabling push shuffle for retry once the determinate stage attempt is
finalized` etc.
and `shuffleMergeEnabled` will be checked only when `shuffleMergeAllowed` is
true along with that if sufficient mergers are available then it becomes `true`.
Given all the above, I think we still require two separate methods one for
`isShuffleMergeFinalized` only checking for the `shuffleMergedFinalized` value
and `numPartitions > 0` and another `isShuffleMergeFinalizedIfEnabled` checking
both `shuffleMergeEnabled` and `isShuffleMergeFinalized`.
In the existing code, before the `ShuffleMapStage` starts we are checking
`if (!shuffleMergeFinalized)` then only we are calling
`prepareShuffleServicesForShuffleMapStage` but it is possible the previous
stage never had `shuffleMergeEnabled` due to not enough mergers therefore even
the retry also not be shuffle merge enabled, ideally this can be shuffle merge
enabled if enough mergers are available. But for proceeding with the next
stage, we need to check for `isShuffleMergeFinalizedIfEnabled` which is
checking `if (shuffleMergeEnabled) isShuffleMergeFinalized else true`
Let me know what you think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]