otterc commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r616005406
##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int,
reduceId: Int) exte
override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex +
"_" + reduceId
}
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int)
extends BlockId {
+ override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_"
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+ appId: String,
+ shuffleId: Int,
+ reduceId: Int) extends BlockId {
+ override def name: String =
+ "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"
Review comment:
Also the issue @mridulm pointed out cannot be solved by either creating
a random merge dir or creating it under a block manager directory. Spark
shuffle server does **not** try to figure out which block mgr directories
belong to a specific attempt and just delete those. In fact, it just leverages
Yarn to delete the application local directories. There is a flag for cleaning
up local directories in `blockHandler.applicationRemoved(...)`, however the
flag is false when the `stopApplication` is invoked in `YarnShuffleService`.
https://github.com/apache/spark/blob/d37d18dd7f628bfa84df2478c84ee52b089e7651/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java#L368
So, this issue exists for the blockmgr dirs as well. Either we create these
dirs outside application local dirs which is managed by Yarn or I think this
should be a fix in Yarn not in spark. Yarn should create app local dirs for an
attempt and just delete those when an attempt fails.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]