[GitHub] spark pull request: [SPARK-1768] History server enhancements.

vanzin Tue, 17 Jun 2014 11:34:25 -0700

GitHub user vanzin reopened a pull request:

    https://github.com/apache/spark/pull/718


    [SPARK-1768] History server enhancements.

    Two improvements to the history server:
    
    - Separate the HTTP handling from history fetching, so that it's easy to add
      new backends later (thinking about SPARK-1537 in the long run)
    
    - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
      memory for faster access. This allows the app limit to go away, since 
holding
      just the listing in memory shouldn't be too expensive unless the user has 
millions
      of completed apps in the history (at which point I'd expect other issues 
to arise
      aside from history server memory usage, such as FileSystem.listStatus()
      starting to become ridiculously expensive).
    
    I also fixed a few minor things along the way which aren't really worth 
mentioning.
    I also removed the app's log path from the UI since that information may 
not even
    exist depending on which backend is used (even though there is only one 
now).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark hist-server

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #718
    
----
commit b28447862b515a45a6b7798b256014df23b55799
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-07T23:30:02Z

    Separate history server from history backend.
    
    This change does two things, mainly:
    
    - Separate the logic of serving application history from fetching
      application history from the underlying storage. Not only this
      cleans up the code a little bit, but it also serves as initial
      work for SPARK-1537, where we may want to fetch application data
      from Yarn instead of HDFS.
    
      I've kept the current command line options working, but I changed
      the way configuration is set to be mostly based on SparkConf,
      so that it's easy to support new providers later.
    
    - Make it so the UI for each application is loaded lazily. The
      UIs are cached in memory (cache size configurable) for faster
      subsequent access. This means that we don't need a limit for
      the number of applications listed; the list should fit
      comfortably in memory (since it holds much less data).
    
      Because of this I lowered the number of applications kept in
      memory to 50 (since that setting doesn't influence the number
      of apps listed anymore).
    
    Later, we may want to provide paging in the listing UI, and also
    spilling the listing to disk and loading it on demand to avoid
    large memory usage / slow startup.

commit bda2fa14142cc9dee5b309092c649eac152931b1
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-08T21:48:04Z

    Rudimentary paging support for the history UI.
    
    The provider's list api was tweaked a little bit so that the caller
    can get an atomic view of the data currently held in the provider.

commit eee2f5a5c500a74dd0c9fe454b3d91635b61fc25
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-09T20:39:26Z

    Ensure server.stop() is called when shutting down.
    
    Also remove the cleanup code from the fs provider. It would be
    better to clean up, but there's a race between that code's cleanup
    and Hadoop's shutdown hook, which closes all file systems kept in
    the cache. So if you try to clean up the fs provider in a shut
    down hook, you may end up with ugly exceptions in the output.
    
    But leave the stop() functionality around in case it's useful for
    future provider implementations.

commit 6fbe0d8d41c6a95a3e1a8db4445359afa134ccd8
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-09T22:00:45Z

    Better handle failures when loading app info.
    
    Instead of failing to load all the applications, just
    ignore the one that failed.

commit 91e96ca81ccfe8b57ed4b89bdb2db97ec31f80bb
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-09T22:27:04Z

    Fix scalastyle issues.

commit 49d2fd3227d63dfe800981f1a9d4adbec8a05c2e
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-09T23:41:15Z

    Fix a comment.

commit e8026f4eeb83152c88ceb87793625bc22d67f362
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-02T23:36:42Z

    Review feedback.
    
    Use monotonic time, plus other stylistic things.

commit e8521499051fdef2c091d9f302277b55e3c1f953
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-02T23:56:23Z

    Initialize new app array to expected size.
    
    To avoid reallocations.

commit 4406f6159f7dcc620c23031c5f562c8380db1851
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-03T17:34:47Z

    Cosmetic change to listing header.

commit b2c570ad0c16bb1296f01e996d04671a17cce421
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-03T17:58:31Z

    Make class package-private.

commit 6e2432fc5ad29e05b5d446fa6e940256587c64ee
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-04T17:42:07Z

    Second round of feedback.
    
    - Simplify some mt code.
    - Remove new argument that wasn't in 1.0.0, reword some comments.

commit ca5d3200d24ae4cc35f1e6b4e60593ec59144bad
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-04T18:02:20Z

    Remove code that deals with unfinished apps.
    
    The HS only reads logs from finished apps, so remove the code
    that checked whether the app was actually finished.

commit 249bcea9f32fc19cb40501a73b30a15adaf4858e
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-04T20:31:27Z

    Remove offset / count from provider interface.

commit 4e72c771da6759db0d3b7a9b27170a5f8bf40dc5
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-04T20:34:28Z

    Remove comment about ordering.
    
    While ordering is nice to have, it's hard to guarantee that, even with
    ordering, the listing won't change between two client requests (and
    thus end up with different info when the UI applies the paging
    parameters). So don't make it a requirement (even if an informal one).

commit 2a7f68d6d2fa7a9b259f00130e1bf1adce91fb29
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-10T00:25:02Z

    Address review feedback.
    
    Main changes:
    - Restore old command line handling.
    - Fix pagination.
    - Restore showing the log directory in the listing page.

commit 4da3a525060e537311439886d2a0bc2b71d2439c
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-10T18:52:57Z

    Remove UI from ApplicationHistoryInfo.
    
    This reduces the needed memory when lots of applications are listed,
    since there were 2 pointers wasted per entry to hold UI-specific
    information.

commit dd8cc4b6af0d739fdefbb59857855453bb4177c3
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-10T22:13:42Z

    Standardize on using spark.history.* configuration.
    
    Update documentation to mention the config options instead of the old
    command line argument, and update the startup script.

commit c21f8d84bb621dc788df5bad70012f71180a9873
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-11T17:29:35Z

    Feedback: formatting, docs.

commit 53620c9ace474ebf5cf733dc9c87ce24c3538edc
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-17T17:55:51Z

    Add mima exclude, fix scaladoc wording.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

Reply via email to