GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/20011
[SPARK-20654][core] Add config to limit disk usage of the history server.
This change adds a new configuration option and support code that limits
how much disk space the SHS will use. The default value is pretty generous
so that applications will, hopefully, only rarely need to be replayed
because of their disk stored being evicted.
This works by keeping track of how much data each application is using.
Also, because it's not possible to know, before replaying, how much space
will be needed, it's possible that usage will exceed the configured limit
temporarily. The code uses the concept of a "lease" to try to limit how
much the SHS will exceed the limit in those cases.
Active UIs are also tracked, so they're never deleted. This works in
tandem with the existing option of how many active UIs are loaded; because
unused UIs will be unloaded, their disk stores will also become candidates
for deletion. If the data is not deleted, though, re-loading the UI is
pretty quick.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-20654
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20011.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20011
----
commit 8b43f4d2c06ab0a9916113fc216711573db367e7
Author: Marcelo Vanzin <[email protected]>
Date: 2017-04-25T19:59:42Z
[SPARK-20654][core] Add config to limit disk usage of the history server.
This change adds a new configuration option and support code that limits
how much disk space the SHS will use. The default value is pretty generous
so that applications will, hopefully, only rarely need to be replayed
because of their disk stored being evicted.
This works by keeping track of how much data each application is using.
Also, because it's not possible to know, before replaying, how much space
will be needed, it's possible that usage will exceed the configured limit
temporarily. The code uses the concept of a "lease" to try to limit how
much the SHS will exceed the limit in those cases.
Active UIs are also tracked, so they're never deleted. This works in
tandem with the existing option of how many active UIs are loaded; because
unused UIs will be unloaded, their disk stores will also become candidates
for deletion. If the data is not deleted, though, re-loading the UI is
pretty quick.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]