mridulm commented on a change in pull request #34156:
URL: https://github.com/apache/spark/pull/34156#discussion_r720675561
##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -519,17 +521,19 @@ private[spark] abstract class MapOutputTracker(conf:
SparkConf) extends Logging
* but endMapIndex is excluded). If endMapIndex=Int.MaxValue, the actual
endMapIndex will be
* changed to the length of total map outputs.
*
- * @return A sequence of 2-item tuples, where the first item in the tuple is
a BlockManagerId,
- * and the second item is a sequence of (shuffle block id, shuffle
block size, map index)
- * tuples describing the shuffle blocks that are stored at that
block manager.
- * Note that zero-sized blocks are excluded in the result.
+ * @return A case class object which includes two attributes. The first
attribute is a sequence
+ * of 2-item tuples, where the first item in the tuple is a
BlockManagerId, and the
+ * second item is a sequence of (shuffle block id, shuffle block
size, map index) tuples
+ * tuples describing the shuffle blocks that are stored at that
block manager. Note that
+ * zero-sized blocks are excluded in the result. The second
attribute is a boolean flag,
+ * indicating whether batch fetch can be enabled.
*/
def getMapSizesByExecutorId(
shuffleId: Int,
startMapIndex: Int,
endMapIndex: Int,
startPartition: Int,
- endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
+ endPartition: Int): MapSizesByExecutorId
Review comment:
@Ngone51 We would be trying to infer behavior in this case based on the
response, instead of being explicit about what should be enabled/disabled (btw,
`blocksByAddress` is iterator, so we would need a `.toList` or some such as
well).
The implication of that is, for example, without the
`mergeStatuses.exists(_.exists(_ != null))` in `convertMapStatuses`,
`hasMergedBlock` will be false - and so we would enable batch fetch - and fail.
I want to avoid tight coupling based on pattern of blocks in response, and
instead explicitly let the response indicate whether we should enable/disable
feature.
This should allow us to reasonably independently evolve both features in
future as well.
Essentially, I want to get the interfaces right - the actual impl, as long
as it is correct for now, can be evolved in future as well.
Thoughts ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]