[GitHub] [spark] mridulm commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

GitBox Tue, 27 Apr 2021 09:30:46 -0700


mridulm commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r621391061




##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int, 
reduceId: Int) exte
   override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex + 
"_" + reduceId
 }
 
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int) 
extends BlockId {
+  override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_" 
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+  appId: String,
+  shuffleId: Int,
+  reduceId: Int) extends BlockId {
+  override def name: String =
+    "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"

Review comment:
       This is a good point, unfortunately we do not set the attempt id in 
spark conf like we do for app id.
   A few options here would be:
   
   a) Also propagate attempt id via "spark.app.attemptId" is available (and use 
default value if missing).
   b) Defer registeration/directory creation to first task being run : 
ShuffleMapTask has `appAttemptId` as part of it. 
   c) For yarn, CONTAINER_ID env variable can be parsed to fetch attempt id - 
though this might not be optimal.
   
   I am not very keen on modifying protocol if possible.
   
   Thoughts ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

Reply via email to