GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/20011

    [SPARK-20654][core] Add config to limit disk usage of the history server.

    This change adds a new configuration option and support code that limits
    how much disk space the SHS will use. The default value is pretty generous
    so that applications will, hopefully, only rarely need to be replayed
    because of their disk stored being evicted.
    
    This works by keeping track of how much data each application is using.
    Also, because it's not possible to know, before replaying, how much space
    will be needed, it's possible that usage will exceed the configured limit
    temporarily. The code uses the concept of a "lease" to try to limit how
    much the SHS will exceed the limit in those cases.
    
    Active UIs are also tracked, so they're never deleted. This works in
    tandem with the existing option of how many active UIs are loaded; because
    unused UIs will be unloaded, their disk stores will also become candidates
    for deletion. If the data is not deleted, though, re-loading the UI is
    pretty quick.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-20654

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20011
    
----
commit 8b43f4d2c06ab0a9916113fc216711573db367e7
Author: Marcelo Vanzin <van...@cloudera.com>
Date:   2017-04-25T19:59:42Z

    [SPARK-20654][core] Add config to limit disk usage of the history server.
    
    This change adds a new configuration option and support code that limits
    how much disk space the SHS will use. The default value is pretty generous
    so that applications will, hopefully, only rarely need to be replayed
    because of their disk stored being evicted.
    
    This works by keeping track of how much data each application is using.
    Also, because it's not possible to know, before replaying, how much space
    will be needed, it's possible that usage will exceed the configured limit
    temporarily. The code uses the concept of a "lease" to try to limit how
    much the SHS will exceed the limit in those cases.
    
    Active UIs are also tracked, so they're never deleted. This works in
    tandem with the existing option of how many active UIs are loaded; because
    unused UIs will be unloaded, their disk stores will also become candidates
    for deletion. If the data is not deleted, though, re-loading the UI is
    pretty quick.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to