[GitHub] spark pull request #19582: [SPARK-20644][core] Initial ground work for kvsto...

vanzin Thu, 26 Oct 2017 15:38:04 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/19582


    [SPARK-20644][core] Initial ground work for kvstore UI backend.

    There are two somewhat unrelated things going on in this patch, but
    both are meant to make integration of individual UI pages later on
    much easier.
    
    The first part is some tweaking of the code in the listener so that
    it does less updates of the kvstore for data that changes fast; for
    example, it avoids writing changes down to the store for every
    task-related event, since those can arrive very quickly at times.
    Instead, for these kinds of events, it chooses to only flush things
    if a certain interval has passed. The interval is based on how often
    the current spark-shell code updates the progress bar for jobs, so
    that users can get reasonably accurate data.
    
    The code also delays as much as possible hitting the underlying kvstore
    when replaying apps in the history server. This is to avoid unnecessary
    writes to disk.
    
    The second set of changes prepare the history server and SparkUI for
    integrating with the kvstore. A new class, AppStatusStore, is used
    for translating between the stored data and the types used in the
    UI / API. The SHS now populates a kvstore with data loaded from
    event logs when an application UI is requested.
    
    Because this store can hold references to disk-based resources, the
    code was modified to retrieve data from the store under a read lock.
    This allows the SHS to detect when the store is still being used, and
    only update it (e.g. because an updated event log was detected) when
    there is no other thread using the store.
    
    This changed ended up creating a lot of churn in the ApplicationCache
    code, which was cleaned up a lot in the process. I also removed some
    metrics which don't make too much sense with the new code.
    
    Tested with existing and added unit tests, and by making sure the SHS
    still works on a real cluster.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-20644

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19582
    
----
commit f73af34cabb8f4e7e993e6c6d88d4de603776b8e
Author: Marcelo Vanzin <[email protected]>
Date:   2016-11-23T21:59:35Z

    [SPARK-20644][core] Initial ground work for kvstore UI backend.
    
    There are two somewhat unrelated things going on in this patch, but
    both are meant to make integration of individual UI pages later on
    much easier.
    
    The first part is some tweaking of the code in the listener so that
    it does less updates of the kvstore for data that changes fast; for
    example, it avoids writing changes down to the store for every
    task-related event, since those can arrive very quickly at times.
    Instead, for these kinds of events, it chooses to only flush things
    if a certain interval has passed. The interval is based on how often
    the current spark-shell code updates the progress bar for jobs, so
    that users can get reasonably accurate data.
    
    The code also delays as much as possible hitting the underlying kvstore
    when replaying apps in the history server. This is to avoid unnecessary
    writes to disk.
    
    The second set of changes prepare the history server and SparkUI for
    integrating with the kvstore. A new class, AppStatusStore, is used
    for translating between the stored data and the types used in the
    UI / API. The SHS now populates a kvstore with data loaded from
    event logs when an application UI is requested.
    
    Because this store can hold references to disk-based resources, the
    code was modified to retrieve data from the store under a read lock.
    This allows the SHS to detect when the store is still being used, and
    only update it (e.g. because an updated event log was detected) when
    there is no other thread using the store.
    
    This changed ended up creating a lot of churn in the ApplicationCache
    code, which was cleaned up a lot in the process. I also removed some
    metrics which don't make too much sense with the new code.
    
    Tested with existing and added unit tests, and by making sure the SHS
    still works on a real cluster.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19582: [SPARK-20644][core] Initial ground work for kvsto...

Reply via email to