[GitHub] spark pull request: [SPARK-1768] History server enhancements.

vanzin Tue, 13 May 2014 00:59:21 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/718


    [SPARK-1768] History server enhancements.

    Two improvements to the history server:
    
    - Separate the HTTP handling from history fetching, so that it's easy to add
      new backends later (thinking about SPARK-1537 in the long run)
    
    - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
      memory for faster access. This allows the app limit to go away, since 
holding
      just the listing in memory shouldn't be too expensive unless the user has 
millions
      of completed apps in the history (at which point I'd expect other issues 
to arise
      aside from history server memory usage, such as FileSystem.listStatus()
      starting to become ridiculously expensive).
    
    I also fixed a few minor things along the way which aren't really worth 
mentioning.
    I also removed the app's log path from the UI since that information may 
not even
    exist depending on which backend is used (even though there is only one 
now).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark hist-server

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #718
    
----
commit 60e07aed62d9c6632775ad600a8c80fc37844201
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-07T23:30:02Z

    Separate history server from history backend.
    
    This change does two things, mainly:
    
    - Separate the logic of serving application history from fetching
      application history from the underlying storage. Not only this
      cleans up the code a little bit, but it also serves as initial
      work for SPARK-1537, where we may want to fetch application data
      from Yarn instead of HDFS.
    
      I've kept the current command line options working, but I changed
      the way configuration is set to be mostly based on SparkConf,
      so that it's easy to support new providers later.
    
    - Make it so the UI for each application is loaded lazily. The
      UIs are cached in memory (cache size configurable) for faster
      subsequent access. This means that we don't need a limit for
      the number of applications listed; the list should fit
      comfortably in memory (since it holds much less data).
    
      Because of this I lowered the number of applications kept in
      memory to 50 (since that setting doesn't influence the number
      of apps listed anymore).
    
    Later, we may want to provide paging in the listing UI, and also
    spilling the listing to disk and loading it on demand to avoid
    large memory usage / slow startup.

commit 0dff2e631f367435bbb6cacd8dab79e14d657e19
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-08T21:48:04Z

    Rudimentary paging support for the history UI.
    
    The provider's list api was tweaked a little bit so that the caller
    can get an atomic view of the data currently held in the provider.

commit 286d9eb96e4fcff97ca109e46e5c72a16be7d19d
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-09T20:39:26Z

    Ensure server.stop() is called when shutting down.
    
    Also remove the cleanup code from the fs provider. It would be
    better to clean up, but there's a race between that code's cleanup
    and Hadoop's shutdown hook, which closes all file systems kept in
    the cache. So if you try to clean up the fs provider in a shut
    down hook, you may end up with ugly exceptions in the output.
    
    But leave the stop() functionality around in case it's useful for
    future provider implementations.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

Reply via email to