ulysses-you commented on a change in pull request #33310:
URL: https://github.com/apache/spark/pull/33310#discussion_r673612589
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
##########
@@ -181,6 +187,9 @@ class ShuffledRowRDD(
case PartialMapperPartitionSpec(mapIndex, _, _) =>
tracker.getMapLocation(dependency, mapIndex, mapIndex + 1)
+
+ case CoalescedMapperPartitionSpec(startMapIndex, endMapIndex,
numReducers) =>
+ tracker.getMapLocation(dependency, startMapIndex, endMapIndex)
Review comment:
I see this can reduce the partition number, but I'm not sure this
approach has benefits of perf. The origin idea of `OptimizeLocalShuffleReader`
is make reducer task run at the same executor of target mapper so that it can
reduce some network IO.
I think order by partition size only solve the issue partially. If we want
to coalesce mappers, shall we check the coalesced mappers are at the same
executor or node ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]