Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r190105740 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +97,10 @@ private[deploy] class Worker( private val APP_DATA_RETENTION_SECONDS = conf.getLong("spark.worker.cleanup.appDataTtl", 7 * 24 * 3600) + // Whether or not cleanup the non-shuffle files on executor finishes. + private val CLEANUP_NON_SHUFFLE_FILES_ENABLED = + conf.getBoolean("spark.worker.cleanup.nonShuffleFiles.enabled", true) --- End diff -- I think these configurations might actually be independent: - `spark.worker.cleanup.enabled` controls whether directories from finished applications (i.e. terminated drivers) are cleaned up is a decision which can impact users' ability to debug failed applications. In environments where clusters / underlying physical workers are ephemeral this configuration tends to matter a little less, but in environments with long-lived machines which host many Spark drivers over a long period of time (e.g. on prem) this configuration can matter. For example, I think that setting `spark.worker.cleanup.enabled = true` could mean that completely applications' executor logs get purged from the underlying host machine after a time period. - The new configuration being proposed here generally matters only _within a long-running Spark application_ and is kind of orthogonal to the choice to keep or discard failed applications' executor directories. I think that we're only removing files within block managers' nested folders, right? So there should be no user-controlled / user-generated files here, at least in theory. Therefore the impact of this configuration on ordinary users' ability to debug is likely to be small (I think you'd only ever really want access to the files that we're deleting if you're going to hex-dump files to debug corruption issues which trigger segfaults, an unlikely scenario). Given the kind of different motivations and lifecycles for these cleanup mechanisms and the motivations for different choices of defaults, I'm thinking that it might make sense to rename the new configuration so that we don't confuse users via similar names. What do you think about something like `spark.storage.removeNonShuffleBlockManagerFilesAfterExecutorDeath`? Really verbose, but a precise description of what's going on. I'm open to better name suggestions (since I tend to pick overly-complicated ones); this is just a conversation-starting strawman.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org