cloud-fan commented on a change in pull request #26434: [SPARK-29544] [SQL]
optimize skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#discussion_r345349809
##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1013,4 +1018,54 @@ private[spark] object MapOutputTracker extends Logging {
splitsByAddress.iterator
}
+
+ /**
+ * Given an array of map statuses, a range map output partitions and a range
+ * mappers (startMapId, endMapId),returns a sequence that, for each block
manager ID,
+ * lists the shuffle block IDs and corresponding shuffle
+ * block sizes stored at that block manager.
+ * Note that empty blocks are filtered in the result.
+ *
+ * If any of the statuses is null (indicating a missing location due to a
failed mapper),
+ * throws a FetchFailedException.
+ *
+ * @param shuffleId Identifier for the shuffle
+ * @param startPartition Start map output partition ID
+ * @param endPartition End map output partition ID
+ * @param statuses List of map statuses, indexed by map partition index.
+ * @param startMapId Start Map ID
+ * @param endMapId End map ID
+ * @return A sequence of 2-item tuples, where the first item in the tuple is
a BlockManagerId,
+ * and the second item is a sequence of (shuffle block id, shuffle
block size, map index)
+ * tuples describing the shuffle blocks that are stored at that
block manager.
+ */
+ def convertMapStatuses(
Review comment:
can we merge the two `convertMapStatuses`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]