Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4412#discussion_r24267365
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -93,6 +93,14 @@ class SparkEnv (
// actorSystem.awaitTermination()
// Note that blockTransferService is stopped by BlockManager since it
is started by it.
+
+ // If we only stop sc, but the driver process still run as a services
then we need to delete
+ // the tmp dir, if not, it will create too many tmp dirs
+ try {
+ Utils.deleteRecursively(new File(sparkFilesDir))
--- End diff --
I agree; this seems unsafe. It would be a disaster if we accidentally
deleted directories that we didn't create, so we can't delete any path that
could point to the CWD. Instead, we might be able to either ensure that the
CWD is a subfolder of a spark local directory (so it will be cleaned up as part
of our baseDir cleanup) or just change `sparkFilesDir` to not download files to
the CWD (e.g. create a temporary directory in both the driver and executors).
Old versions of the `addFile` API contract said that files would be
downloaded to the CWD, but we haven't made that promise since Spark 0.7-ish, I
think; we only technically guarantee that SparkFIles.get will return the file
paths.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]