Victsm commented on a change in pull request #34156:
URL: https://github.com/apache/spark/pull/34156#discussion_r719496790



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1448,7 +1448,7 @@ private[spark] object MapOutputTracker extends Logging {
     // TODO: improve push based shuffle to read partial merged blocks 
satisfying the start/end
     // TODO: map indexes
     if (mergeStatuses.exists(_.nonEmpty) && startMapIndex == 0
-      && endMapIndex == mapStatuses.length) {
+      && endMapIndex == mapStatuses.length && endPartition - startPartition == 
1) {

Review comment:
       Push-based shuffle in practice shouldn't interfere with partition 
coalesce.
   A reducer should be able to fetch from multiple shuffle partitions as 
separate merged shuffle partitions, and that's the current behavior implemented 
in this code.
   The only thing that prevents it from working properly is that 
`ShuffleBlockFetcherIterator` later on tries to merge continuous blocks 
generated in this case for batch fetch, which is totally unnecessary.
   In 3.2.0, AQE, partition coalesce, and batch fetch are all default to true.
   My concern with this condition is that it will effectively disable 
push-based shuffle in 3.2.0 even if the push.enabled flag is set to true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to