Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/4984#discussion_r33726149
--- Diff:
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -124,10 +124,16 @@ private[spark] class DiskBlockManager(blockManager:
BlockManager, conf: SparkCon
(blockId, getFile(blockId))
}
+ /**
+ * Create local directories for storing block data. These directories are
+ * located inside configured local directories and won't
+ * be deleted on JVM exit when using the external shuffle service.
--- End diff --
I just read your comment again. I still don't see how the directory layout
is related to cleaning up shuffle files. The reason why we don't clean up
shuffle files in Mesos (and standalone mode) is simply because the shuffle
service doesn't know when an application exits. When shuffle service is
enabled, [executors no longer clean up the shuffle files on
exit](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L162),
so no one cleans these files up anymore. All we need to do then is to add this
missing code path.
Since the external shuffle service already
[knows](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/network/shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L147)
about the `localDirs` on each executor, it can just go ahead and delete these
directories (which contain the shuffle files written). Could you explain why
the directory structure needs to change? Why is it not sufficient to just
remove the shuffle directories?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]