mridulm commented on a change in pull request #34122:
URL: https://github.com/apache/spark/pull/34122#discussion_r780010060



##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -144,12 +144,16 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
     _shuffleMergedFinalized = true
   }
 
+  def shuffleMergeFinalized: Boolean = {

Review comment:
       Let us split the state out into two variables - since they are 
fundamentally two things represented earlier by one boolean.
   
   * `shuffleMergeAllowed`
   * `shufflerMergeEnabled`
   
   If `shuffleMergeAllowed` is `false`, then all push based shuffle paths are 
ignored - and is statically set at driver.
   It defaults to `canShuffleMergeBeEnabled` - but can be explicitly disabled : 
for example, when missing tasks are being computed for a determinate shuffle 
after it has been finalized.
   This would be set at driver before the stage starts - and can be dependent 
upon at executors to enable push based shuffle paths.
   
   `shufflerMergeEnabled` can be set to `true` only if `shuffleMergeAllowed` is 
`true` and additional conditions are met - for example, sufficient mergers are 
available.
   Earlier, this was determined statically when stage started - with this PR, 
this can become `true` adaptively after stage starts.
   `shuffleMergedFinalized` can remain as it was defined earlier.
   
   Thoughts ?
   
   +CC @Ngone51, @otterc 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to