Github user dragos commented on the pull request:
https://github.com/apache/spark/pull/4984#issuecomment-117351436
No worries. When I chatted to @tnachen there were a few misunderstandings
w.r.t to who deletes the shuffle files. Here's the story in a nutshell:
There are three levels on the file system:
- configured local dirs (this is what the user can specify in config)
- spark *local root dir*. This is created by Spark to hold all temp
files, and is deleted by a shutdown hook on exit). This is level is **missing**
in YARN and Standalone
- shuffle dir. This is **not** deleted by Spark on exit when the
external shuffle service is on.
What happens on Mesos is that the shuffle dir is indeed not deleted on
exit, but it's **parent** directory is! You won't notice that in Yarn, though,
since the intermediate temporary directory is not part of the picture.
The relevant code is in `Utils.getOrCreateLocalRootDirsImpl`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]