Ngone51 commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r623036330
##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int,
reduceId: Int) exte
override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex +
"_" + reduceId
}
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int)
extends BlockId {
+ override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_"
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+ appId: String,
+ shuffleId: Int,
+ reduceId: Int) extends BlockId {
+ override def name: String =
+ "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"
Review comment:
I have a very tricky idea here, which is based on solution 3:
When an executor of an application attempt X tries to create the merge
directory, it could first check whether the dir `merge_manager_X-1` exists. If
exists, delete it. And if dir `merge_manager_X` not exists, then create
`merge_manager_X`. And if the executor becomes the one who creates the merge
dir, we send the `ExecutorShuffleInfo` with the special `shuffleManager`, e.g.,
"sort_merge_manager_attemptX". And `ExternalBlockHandler` can parse the
`shuffleManager` into two parts. So `ExternalShuffleBlockResolver` can still
register the normal `ExecutorShuffleInfo` and `RemoteBlockPushResolver` can
know whether to update the merge dir.
Besides, I'd prefer soultion1. I think it's reasonable to add a new message
for the push-based shuffle. We can have a new type for it,
e.g.,`RegisterMergeDirectory`, which includes the merge directory directly and
attempted too (of course).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]