Ngone51 commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r623036330



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int, 
reduceId: Int) exte
   override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex + 
"_" + reduceId
 }
 
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int) 
extends BlockId {
+  override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_" 
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+  appId: String,
+  shuffleId: Int,
+  reduceId: Int) extends BlockId {
+  override def name: String =
+    "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"

Review comment:
       I have a very tricky idea here, which is based on solution 3: 
   
   When an executor of an application attempt X tries to create the merge 
directory, it could first check whether the dir `merge_manager_X-1` exists. If 
exists, delete it. And if dir `merge_manager_X` not exists, then create 
`merge_manager_X`. And if the executor becomes the one who creates the merge 
dir, we send the `ExecutorShuffleInfo` with the special `shuffleManager`, e.g., 
"sort_merge_manager_attemptX". And `ExternalBlockHandler` can parse the 
`shuffleManager` into two parts. So `ExternalShuffleBlockResolver` can still 
register the normal `ExecutorShuffleInfo` and  `RemoteBlockPushResolver` can 
know whether to update the merge dir.
   
   
   Besides, I'd prefer soultion1. I think it's reasonable to add a new message 
for the push-based shuffle. We can have a new type for it, 
e.g.,`RegisterMergeDirectory`, which includes the merge directory directly and 
attempted too (of course). 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to