Victsm commented on a change in pull request #34156:
URL: https://github.com/apache/spark/pull/34156#discussion_r719496790
##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1448,7 +1448,7 @@ private[spark] object MapOutputTracker extends Logging {
// TODO: improve push based shuffle to read partial merged blocks
satisfying the start/end
// TODO: map indexes
if (mergeStatuses.exists(_.nonEmpty) && startMapIndex == 0
- && endMapIndex == mapStatuses.length) {
+ && endMapIndex == mapStatuses.length && endPartition - startPartition ==
1) {
Review comment:
Push-based shuffle in practice shouldn't interfere with partition
coalesce.
A reducer should be able to fetch from multiple shuffle partitions as
separate merged shuffle partitions, and that's the current behavior implemented
in this code.
The only thing that prevents it from working properly is that
`ShuffleBlockFetcherIterator` later on tries to merge continuous blocks
generated in this case for batch fetch, which is totally unnecessary.
In 3.2.0, AQE, partition coalesce, and batch fetch are all default to true.
My concern with this condition is that it will effectively disable
push-based shuffle in 3.2.0 even if the push.enabled flag is set to true.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]