otterc commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r627590658



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int, 
reduceId: Int) exte
   override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex + 
"_" + reduceId
 }
 
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int) 
extends BlockId {
+  override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_" 
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+  appId: String,
+  shuffleId: Int,
+  reduceId: Int) extends BlockId {
+  override def name: String =
+    "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"

Review comment:
       @zhouyejoe If we are sending the merge directory name via 
RegisterExecutor message which is the 5th step, then I don't think 4th step is 
necessary where the executor deletes directory of old attempt.
   Currently, the executors don't delete the previous previous attempts block 
manager dirs and neither does the shuffle service. It is deleted by Yarn when 
the application is finally finished. All the blockMgr dirs hang around until 
the app is done.
   With merge_manager directory it will be the same case. In this solution, the 
remote shuffle service doesn't depend on  listing the dirs so what do we get 
from having the executor delete old attempt merge_manager dir?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to