[GitHub] [spark] zhouyejoe commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

GitBox Wed, 05 May 2021 23:26:49 -0700


zhouyejoe commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r627110838




##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int, 
reduceId: Int) exte
   override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex + 
"_" + reduceId
 }
 
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int) 
extends BlockId {
+  override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_" 
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+  appId: String,
+  shuffleId: Int,
+  reduceId: Int) extends BlockId {
+  override def name: String =
+    "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"

Review comment:
       Thanks for all the discussion above. I will update the PR with the 
proposed ideas above: 1. No new RPC needs to be introduced. 2. The dirs will 
still be created by Executor as of the the permission issue. 3. Will let the 
Executor manage the merge_dir creation. 4. Will let the Executor to delete the 
last attempt's created merge_dir if it exists. 5. Register the merge attempt 
Dirs through ExecutorShuffleInfo, where a new String for shuffleManager field 
to distinguish the attemptID, e.g., "sort_merge_manager_attemptX", suggested by 
@Ngone51 . 
   
   @mridulm  IIUC, if we have the item 5 there, we don't need the shuffle 
service to list the dirs and figure out the largest attemptIDs for merge dirs, 
right? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhouyejoe commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

Reply via email to