This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 338bb31c2fac [SPARK-46997][CORE] Enable `spark.worker.cleanup.enabled` by default 338bb31c2fac is described below commit 338bb31c2fac79fbc3482c23310b77d5306bd6c8 Author: Dongjoon Hyun <dh...@apple.com> AuthorDate: Wed Feb 7 22:51:14 2024 -0800 [SPARK-46997][CORE] Enable `spark.worker.cleanup.enabled` by default ### What changes were proposed in this pull request? This PR aims to enable `spark.worker.cleanup.enabled` by default as a part of Apache Spark 4.0.0. ### Why are the changes needed? Apache Spark community has been recommending (from Apache Spark 3.0 to 3.5) to enable `spark.worker.cleanup.enabled` when `spark.shuffle.service.db.enabled` is true. And, `spark.shuffle.service.db.enabled` has been `true` since SPARK-26288. https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/docs/spark-standalone.md?plain=1#L443 https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/docs/spark-standalone.md?plain=1#L473 https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/core/src/main/scala/org/apache/spark/internal/config/package.scala#L718-L724 Although `spark.shuffle.service.enabled` is disabled by default, `spark.worker.cleanup.enabled` is crucial for long-standing Spark Standalone clusters to avoid the disk full situation. https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/core/src/main/scala/org/apache/spark/internal/config/package.scala#L692-L696 ### Does this PR introduce _any_ user-facing change? Yes, but this has been a long-standing recommended configuration in the real production-level Spark Standalone clusters. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45055 from dongjoon-hyun/SPARK-46997. Authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- core/src/main/scala/org/apache/spark/internal/config/Worker.scala | 2 +- docs/core-migration-guide.md | 2 ++ docs/spark-standalone.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala index c53e181df002..5a67f3398a7d 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala @@ -62,7 +62,7 @@ private[spark] object Worker { val WORKER_CLEANUP_ENABLED = ConfigBuilder("spark.worker.cleanup.enabled") .version("1.0.0") .booleanConf - .createWithDefault(false) + .createWithDefault(true) val WORKER_CLEANUP_INTERVAL = ConfigBuilder("spark.worker.cleanup.interval") .version("1.0.0") diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 7a5b17397bec..26e6b0f1f444 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -28,6 +28,8 @@ license: | - Since Spark 4.0, Spark will compress event logs. To restore the behavior before Spark 4.0, you can set `spark.eventLog.compress` to `false`. +- Since Spark 4.0, Spark workers will clean up worker and stopped application directories periodically. To restore the behavior before Spark 4.0, you can set `spark.worker.cleanup.enabled` to `false`. + - Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB` by default which means Spark will use RocksDB store for shuffle service. To restore the behavior before Spark 4.0, you can set `spark.shuffle.service.db.backend` to `LEVELDB`. - In Spark 4.0, support for Apache Mesos as a resource manager was removed. diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index fbc83180d6b6..1eab3158e2e5 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -436,7 +436,7 @@ SPARK_WORKER_OPTS supports the following system properties: </tr> <tr> <td><code>spark.worker.cleanup.enabled</code></td> - <td>false</td> + <td>true</td> <td> Enable periodic cleanup of worker / application directories. Note that this only affects standalone mode, as YARN works differently. Only the directories of stopped applications are cleaned up. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org