GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/20011
[SPARK-20654][core] Add config to limit disk usage of the history server. This change adds a new configuration option and support code that limits how much disk space the SHS will use. The default value is pretty generous so that applications will, hopefully, only rarely need to be replayed because of their disk stored being evicted. This works by keeping track of how much data each application is using. Also, because it's not possible to know, before replaying, how much space will be needed, it's possible that usage will exceed the configured limit temporarily. The code uses the concept of a "lease" to try to limit how much the SHS will exceed the limit in those cases. Active UIs are also tracked, so they're never deleted. This works in tandem with the existing option of how many active UIs are loaded; because unused UIs will be unloaded, their disk stores will also become candidates for deletion. If the data is not deleted, though, re-loading the UI is pretty quick. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-20654 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20011 ---- commit 8b43f4d2c06ab0a9916113fc216711573db367e7 Author: Marcelo Vanzin <van...@cloudera.com> Date: 2017-04-25T19:59:42Z [SPARK-20654][core] Add config to limit disk usage of the history server. This change adds a new configuration option and support code that limits how much disk space the SHS will use. The default value is pretty generous so that applications will, hopefully, only rarely need to be replayed because of their disk stored being evicted. This works by keeping track of how much data each application is using. Also, because it's not possible to know, before replaying, how much space will be needed, it's possible that usage will exceed the configured limit temporarily. The code uses the concept of a "lease" to try to limit how much the SHS will exceed the limit in those cases. Active UIs are also tracked, so they're never deleted. This works in tandem with the existing option of how many active UIs are loaded; because unused UIs will be unloaded, their disk stores will also become candidates for deletion. If the data is not deleted, though, re-loading the UI is pretty quick. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org