GitHub user vanzin reopened a pull request:
https://github.com/apache/spark/pull/718
[SPARK-1768] History server enhancements.
Two improvements to the history server:
- Separate the HTTP handling from history fetching, so that it's easy to add
new backends later (thinking about SPARK-1537 in the long run)
- Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
memory for faster access. This allows the app limit to go away, since
holding
just the listing in memory shouldn't be too expensive unless the user has
millions
of completed apps in the history (at which point I'd expect other issues
to arise
aside from history server memory usage, such as FileSystem.listStatus()
starting to become ridiculously expensive).
I also fixed a few minor things along the way which aren't really worth
mentioning.
I also removed the app's log path from the UI since that information may
not even
exist depending on which backend is used (even though there is only one
now).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark hist-server
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/718.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #718
----
commit b28447862b515a45a6b7798b256014df23b55799
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-07T23:30:02Z
Separate history server from history backend.
This change does two things, mainly:
- Separate the logic of serving application history from fetching
application history from the underlying storage. Not only this
cleans up the code a little bit, but it also serves as initial
work for SPARK-1537, where we may want to fetch application data
from Yarn instead of HDFS.
I've kept the current command line options working, but I changed
the way configuration is set to be mostly based on SparkConf,
so that it's easy to support new providers later.
- Make it so the UI for each application is loaded lazily. The
UIs are cached in memory (cache size configurable) for faster
subsequent access. This means that we don't need a limit for
the number of applications listed; the list should fit
comfortably in memory (since it holds much less data).
Because of this I lowered the number of applications kept in
memory to 50 (since that setting doesn't influence the number
of apps listed anymore).
Later, we may want to provide paging in the listing UI, and also
spilling the listing to disk and loading it on demand to avoid
large memory usage / slow startup.
commit bda2fa14142cc9dee5b309092c649eac152931b1
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-08T21:48:04Z
Rudimentary paging support for the history UI.
The provider's list api was tweaked a little bit so that the caller
can get an atomic view of the data currently held in the provider.
commit eee2f5a5c500a74dd0c9fe454b3d91635b61fc25
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-09T20:39:26Z
Ensure server.stop() is called when shutting down.
Also remove the cleanup code from the fs provider. It would be
better to clean up, but there's a race between that code's cleanup
and Hadoop's shutdown hook, which closes all file systems kept in
the cache. So if you try to clean up the fs provider in a shut
down hook, you may end up with ugly exceptions in the output.
But leave the stop() functionality around in case it's useful for
future provider implementations.
commit 6fbe0d8d41c6a95a3e1a8db4445359afa134ccd8
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-09T22:00:45Z
Better handle failures when loading app info.
Instead of failing to load all the applications, just
ignore the one that failed.
commit 91e96ca81ccfe8b57ed4b89bdb2db97ec31f80bb
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-09T22:27:04Z
Fix scalastyle issues.
commit 49d2fd3227d63dfe800981f1a9d4adbec8a05c2e
Author: Marcelo Vanzin <[email protected]>
Date: 2014-05-09T23:41:15Z
Fix a comment.
commit e8026f4eeb83152c88ceb87793625bc22d67f362
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-02T23:36:42Z
Review feedback.
Use monotonic time, plus other stylistic things.
commit e8521499051fdef2c091d9f302277b55e3c1f953
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-02T23:56:23Z
Initialize new app array to expected size.
To avoid reallocations.
commit 4406f6159f7dcc620c23031c5f562c8380db1851
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-03T17:34:47Z
Cosmetic change to listing header.
commit b2c570ad0c16bb1296f01e996d04671a17cce421
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-03T17:58:31Z
Make class package-private.
commit 6e2432fc5ad29e05b5d446fa6e940256587c64ee
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-04T17:42:07Z
Second round of feedback.
- Simplify some mt code.
- Remove new argument that wasn't in 1.0.0, reword some comments.
commit ca5d3200d24ae4cc35f1e6b4e60593ec59144bad
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-04T18:02:20Z
Remove code that deals with unfinished apps.
The HS only reads logs from finished apps, so remove the code
that checked whether the app was actually finished.
commit 249bcea9f32fc19cb40501a73b30a15adaf4858e
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-04T20:31:27Z
Remove offset / count from provider interface.
commit 4e72c771da6759db0d3b7a9b27170a5f8bf40dc5
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-04T20:34:28Z
Remove comment about ordering.
While ordering is nice to have, it's hard to guarantee that, even with
ordering, the listing won't change between two client requests (and
thus end up with different info when the UI applies the paging
parameters). So don't make it a requirement (even if an informal one).
commit 2a7f68d6d2fa7a9b259f00130e1bf1adce91fb29
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-10T00:25:02Z
Address review feedback.
Main changes:
- Restore old command line handling.
- Fix pagination.
- Restore showing the log directory in the listing page.
commit 4da3a525060e537311439886d2a0bc2b71d2439c
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-10T18:52:57Z
Remove UI from ApplicationHistoryInfo.
This reduces the needed memory when lots of applications are listed,
since there were 2 pointers wasted per entry to hold UI-specific
information.
commit dd8cc4b6af0d739fdefbb59857855453bb4177c3
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-10T22:13:42Z
Standardize on using spark.history.* configuration.
Update documentation to mention the config options instead of the old
command line argument, and update the startup script.
commit c21f8d84bb621dc788df5bad70012f71180a9873
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-11T17:29:35Z
Feedback: formatting, docs.
commit 53620c9ace474ebf5cf733dc9c87ce24c3538edc
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-17T17:55:51Z
Add mima exclude, fix scaladoc wording.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---