[GitHub] [spark] otterc commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

GitBox Thu, 15 Apr 2021 19:12:31 -0700


otterc commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r614514919




##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int, 
reduceId: Int) exte
   override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex + 
"_" + reduceId
 }
 
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int) 
extends BlockId {
+  override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_" 
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+  appId: String,
+  shuffleId: Int,
+  reduceId: Int) extends BlockId {
+  override def name: String =
+    "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"

Review comment:
       The merged shuffle files are already written by the server and the file 
identifier doesn't include the attemptId. To include attemptId we do require a 
change on the push side as well where attempt Id is also included in the 
PushBlockMessage and the server uses that to create merged files. I think this 
is a much bigger change and we have this jira [SPARK-32923 
](https://issues.apache.org/jira/browse/SPARK-32923) to address it.
   
   One thing we can do here is that we introduce attemptId in these new Ids 
that we are creating but not use them while trying to find file and add a TODO.
   What do you think @mridulm @Victsm @zhouyejoe ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] otterc commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

Reply via email to