[GitHub] [spark] mridulm commented on a change in pull request #34156: [WIP] [SPARK-36892] [Core] Disable batch fetch for a shuffle when push based shuffle is enabled

GitBox Fri, 01 Oct 2021 20:03:06 -0700


mridulm commented on a change in pull request #34156:
URL: https://github.com/apache/spark/pull/34156#discussion_r720616373




##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -519,17 +521,19 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
    * but endMapIndex is excluded). If endMapIndex=Int.MaxValue, the actual 
endMapIndex will be
    * changed to the length of total map outputs.
    *
-   * @return A sequence of 2-item tuples, where the first item in the tuple is 
a BlockManagerId,
-   *         and the second item is a sequence of (shuffle block id, shuffle 
block size, map index)
-   *         tuples describing the shuffle blocks that are stored at that 
block manager.
-   *         Note that zero-sized blocks are excluded in the result.
+   * @return A case class object which includes two attributes. The first 
attribute is a sequence
+   *         of 2-item tuples, where the first item in the tuple is a 
BlockManagerId, and the
+   *         second item is a sequence of (shuffle block id, shuffle block 
size, map index) tuples
+   *         tuples describing the shuffle blocks that are stored at that 
block manager. Note that
+   *         zero-sized blocks are excluded in the result. The second 
attribute is a boolean flag,
+   *         indicating whether batch fetch can be enabled.
    */
   def getMapSizesByExecutorId(
       shuffleId: Int,
       startMapIndex: Int,
       endMapIndex: Int,
       startPartition: Int,
-      endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
+      endPartition: Int): MapSizesByExecutorId

Review comment:
       Given how close we are to RC, I am fine with this approach.
   
   In general though, we should be very careful about setting expectation that 
there would be compatibility guarantees with `private[spark]` classes; they are 
explicitly marked that way to make it very clear not to depend on them. Inspite 
of that, if there are projects/users depending on it, it is up to them to 
ensure compatibility - not spark project.
   
   @zhouyejoe Can you evaluate the change that @Ngone51 proposed please ? 
Internally, it could delegate to the same `getMapSizesByExecutorIdImpl` (which 
would be what you have here) and simply return response.iter for existing 
method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #34156: [WIP] [SPARK-36892] [Core] Disable batch fetch for a shuffle when push based shuffle is enabled

Reply via email to