mridulm commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r780010060
##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -144,12 +144,16 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C:
ClassTag](
_shuffleMergedFinalized = true
}
+ def shuffleMergeFinalized: Boolean = {
Review comment:
Let us split the state out into two variables - since they are
fundamentally two things represented earlier by one boolean.
* `shuffleMergeAllowed`
* `shufflerMergeEnabled`
If `shuffleMergeAllowed` is `false`, then all push based shuffle paths are
ignored - and is statically set at driver.
It defaults to `canShuffleMergeBeEnabled` - but can be explicitly disabled :
for example, when missing tasks are being computed for a determinate shuffle
after it has been finalized.
This would be set at driver before the stage starts - and can be depended
upon at executors to enable push based shuffle paths.
`shufflerMergeEnabled` can be set to `true` only if `shuffleMergeAllowed` is
`true` and additional conditions are met - for example, sufficient mergers are
available.
Earlier, this was determined statically when stage started - with this PR,
this can become `true` adaptively after stage starts.
`shuffleMergedFinalized` can remain as it was defined earlier.
Thoughts ?
+CC @Ngone51, @otterc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]