GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/185
[Hot Fix #42] Do not stop a SparkUI that has not bind()'ed
In Master, we create a SparkUI from event logs without bounding it to a
port. This avoids creating a new Jetty server for each application. However,
for each of these SparkUIs, since bind() is not called, calling stop() will
throw an assertion failure.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark ui-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/185.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #185
----
commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
Author: Andrew Or <[email protected]>
Date: 2014-02-04T02:18:04Z
Relax assumptions on compressors and serializers when batching
This commit introduces an intermediate layer of an input stream on the
batch level.
This guards against interference from higher level streams (i.e.
compression and
deserialization streams), especially pre-fetching, without specifically
targeting
particular libraries (Kryo) and forcing shuffle spill compression to use
LZF.
commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
Author: Andrew Or <[email protected]>
Date: 2014-02-04T02:18:04Z
Relax assumptions on compressors and serializers when batching
This commit introduces an intermediate layer of an input stream on the
batch level.
This guards against interference from higher level streams (i.e.
compression and
deserialization streams), especially pre-fetching, without specifically
targeting
particular libraries (Kryo) and forcing shuffle spill compression to use
LZF.
commit 3df700509955f7074821e9aab1e74cb53c58b5a5
Author: Andrew Or <[email protected]>
Date: 2014-02-04T02:27:49Z
Merge branch 'master' of github.com:andrewor14/incubator-spark
commit 287ef44e593ad72f7434b759be3170d9ee2723d2
Author: Andrew Or <[email protected]>
Date: 2014-02-04T21:38:32Z
Avoid reading the entire batch into memory; also simplify streaming logic
Additionally, address formatting comments.
commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
Author: Andrew Or <[email protected]>
Date: 2014-02-04T21:44:24Z
Typo: phyiscal -> physical
commit 13920c918efe22e66a1760b14beceb17a61fd8cc
Author: Andrew Or <[email protected]>
Date: 2014-02-05T00:34:15Z
Update docs
commit 090544a87a0767effd0c835a53952f72fc8d24f0
Author: Andrew Or <[email protected]>
Date: 2014-02-05T18:58:23Z
Privatize methods
commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
Author: Andrew Or <[email protected]>
Date: 2014-02-05T20:09:32Z
Also privatize fields
commit e3ae35f4fb1ce8e2d7398afdbabab3dbf4bb2ffe
Author: Andrew Or <[email protected]>
Date: 2014-02-11T00:15:15Z
Merge github.com:apache/incubator-spark
Conflicts:
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
commit 8e09306f6dd4ab421447d769572de58035d3d66a
Author: Andrew Or <[email protected]>
Date: 2014-02-12T01:48:16Z
Use JSON for ExecutorsUI
commit 10ed49dffe4a515bff42762cb025a3f64d9cd407
Author: Andrew Or <[email protected]>
Date: 2014-02-12T18:53:32Z
Merge github.com:apache/incubator-spark into persist-ui
commit dcbd312b1e4585445868dfb562f9c64ac2fc8cda
Author: Andrew Or <[email protected]>
Date: 2014-02-12T23:58:39Z
Add JSON Serializability for all SparkListenerEvent's
This also involves a clean-up in the way these events are structured. The
existing way
in which these events are defined maintains a lot of extraneous
information. To avoid
serializing the whole tree of RDD dependencies, for instance, this commit
cherry-picks
only the relevant fields. However, this means sacrificing JobLogger's
functionality of
tracing the entire RDD tree.
Additionally, this commit also involves minor formatting and naming
clean-ups within
the scope of the above changes.
commit bb222b9f7422cdf9e3a4c682bb271da1f75f4f75
Author: Andrew Or <[email protected]>
Date: 2014-02-13T04:35:09Z
ExecutorUI: render completely from JSON
Additionally, this commit fixes the bug in the local mode, where executor
IDs of tasks
do not match those of storage statuses (more detail in ExecutorsUI.scala).
This commit currently does not serialize the SparkListenerEvents yet, but
instead
serializes changes to each executor JSON. This is a big TODO in the
upcoming commit.
commit bf0b2e9e92d760d49ba7b26aaa41b9e3aef2420f
Author: Andrew Or <[email protected]>
Date: 2014-02-14T03:12:53Z
ExecutorUI: Serialize events rather than arbitary executor information
This involves adding a new SparkListenerStorageFetchEvent, and adding JSON
serializability
to all of the objects it depends on.
commit de8a1cdb833d80423aba629ba932b6f403ecd4ab
Author: Andrew Or <[email protected]>
Date: 2014-02-15T03:22:50Z
Serialize events both to and from JSON (rather than just to)
This requires every field of every event to be completely reconstructible
from its
JSON representation. This commit may contain incomplete state.
commit 8a2ebe6ba37b2d5efe344aa3bea343cda1411212
Author: Andrew Or <[email protected]>
Date: 2014-02-15T06:01:21Z
Fix bugs for EnvironmentUI and ExecutorsUI
In particular, EnvironmentUI was not rendering until a job begins, and
ExecutorsUI
reports an incorrect number (format) of total tasks.
commit c4cd48022b3a8dbf60f458196e21ba8c9cb3b88f
Author: Andrew Or <[email protected]>
Date: 2014-02-15T06:53:43Z
Also deserialize new events
This includes SparkListenerLoadEnvironment and
SparkListenerStorageStatusFetch
commit d859efc34c9a5f07bae7eca7b4ab72fa19fb7e29
Author: Andrew Or <[email protected]>
Date: 2014-02-15T22:01:14Z
BlockManagerUI: Add JSON functionality
commit 8add36bb08126fbcd02d23c446dd3ec970f1f549
Author: Andrew Or <[email protected]>
Date: 2014-02-15T22:40:49Z
JobProgressUI: Add JSON functionality
In addition, refactor FileLogger to log in one directory per logger
commit b3976b0a2eb21b4a887d01fd16869a0f37c36f8b
Author: Andrew Or <[email protected]>
Date: 2014-02-16T06:52:43Z
Add functionality of reconstructing a persisted UI from SparkContext
With this commit, any reconstruct SparkUI resides on default port of 14040
onwards.
Logged events are posted separately from live events, such that the live
SparkListeners
are not affected.
This commit also fixes a few JSON de/serialization bugs.
commit 4dfcd224504f392302a49ac82280b294c381f381
Author: Andrew Or <[email protected]>
Date: 2014-02-17T19:14:20Z
Merge git://git.apache.org/incubator-spark into persist-ui
commit f3fc13b53725cdfeddcecb2068ab5a533566772f
Author: Andrew Or <[email protected]>
Date: 2014-02-17T21:22:01Z
General refactor
This includes reverting previous formatting and naming changes that are
irrelevant to
this patch.
commit 3fd584e30aaf6552179bf9e9b350b130fa92d0ad
Author: Andrew Or <[email protected]>
Date: 2014-02-18T02:01:12Z
Fix two major bugs
First, JobProgessListener uses HashSets of TaskInfo and StageInfo, and
relies on the equality
of these objects to remove from the corresponding HashSets correctly. This
is not a luxury that
deserialized StageInfo's and TaskInfo's have. Instead, when removing from
these collections, we
must match by the ID rather than the object itself.
Second, although SparkUI differentiates between persisted and live UI's,
its children UI's and
their corresponding listeners do not. Thus, each revived UI essentially
duplicated all the logs
that reconstructed it in the first place. Further, these zombie UI's
continued to respond to
live SparkListenerEvents. This has been fixed by requiring that revived
UI's do not register
their listeners with the current SparkContext.
With the former fix, there were major incompatibility issues with the
existing way UI classes
access and mutate the collections. Formatting improvements associated with
smoothing out these
inconsistencies are included as part of this commit.
commit 5ac906d4dfd546c5d6b6e80540c8774f3985fecc
Author: Andrew Or <[email protected]>
Date: 2014-02-18T05:38:16Z
Mostly naming, formatting, and code style changes
commit 904c7294ac221a0cd9806af843219aaa8a847085
Author: Andrew Or <[email protected]>
Date: 2014-02-18T06:06:46Z
Fix another major bug
Previously, rendering the old, persisted UI continues to trigger load
environment and
storage status fetch events. These are now only triggered for the live UI.
A related TODO: Under JobProgressUI, the total duration is inaccurate;
right now it uses
the time when the old UI is revived, rather than when it was live. This
should be fixed.
commit 427301371117e9e7889f5df0f6bba51e5916e425
Author: Andrew Or <[email protected]>
Date: 2014-02-18T23:27:39Z
Add a gateway SparkListener to simplify event logging
Instead of having each SparkListener log an independent set of events,
centralize event
logging to avoid differentiating events across UI's and thus duplicating
logged events.
Also rename the "fromDisk" parameter to "live".
TODO: Storage page currently still relies on the previous SparkContext and
is not
rendering correctly.
commit 64d2ce1efee3aa5a8166c5fe108932b2279217fc
Author: Andrew Or <[email protected]>
Date: 2014-02-19T02:29:21Z
Fix BlockManagerUI bug by introducing new event
Previously, the storage information of persisted RDD's continued to rely on
the old SparkContext,
which is no longer accessible if the UI is rendered from disk. This fix
solves it by introducing
an event, SparkListenerGetRDDInfo, which captures this information.
Per discussion with Patrick, an alternative is to encapsulate this
information within
SparkListenerTaskEnd. This would bypass the need to create a new event, but
would also require
a non-trivial refactor of BlockManager / BlockStore.
commit 6814da0cf9af2a29810b6773463acee3b259c95f
Author: Andrew Or <[email protected]>
Date: 2014-02-19T18:36:01Z
Explicitly register each UI listener rather than through some magic
This (1) allows UISparkListener to be a simple trait and (2) is more
intuitive, since it
mirrors sc.addSparkListener(listener), for all other non-UI listeners.
commit d646df6786737d67d5ca1dbf593740a02a600991
Author: Andrew Or <[email protected]>
Date: 2014-02-20T02:47:35Z
Completely decouple SparkUI from SparkContext
This involves storing additional fields, such as the scheduling mode and
the app name, into the
new event, SparkListenerApplicationStart, since these attributes are no
longer accessible without
a SparkContext. Further, environment information is refactored to be loaded
on application start
(rather than on job start).
Persisted Spark UI's can no longer be created from SparkContext. The new
way of constructing them
is through a standalone scala program. org.apache.spark.ui.UIReloader is
introduced as an example
of how to do this.
commit e9e1c6dede36788d3cefe3c65366f5a79be97a1d
Author: Andrew Or <[email protected]>
Date: 2014-02-21T07:51:08Z
Move all JSON de/serialization logic to JsonProtocol
This makes all classes involved appear less cluttered.
commit 70e7e7acf09d8efd2c7e459ee450c1db140b8f5a
Author: Andrew Or <[email protected]>
Date: 2014-02-22T02:56:26Z
Formatting changes
commit 6631c02a8791d0321f003bb339344445f4dd0cab
Author: Andrew Or <[email protected]>
Date: 2014-02-24T18:52:21Z
More formatting changes, this time mainly for Json DSL
commit bbe3501c63029ffa9c1fd9053e7ab868d0f28b10
Author: Andrew Or <[email protected]>
Date: 2014-02-26T23:27:43Z
Embed storage status and RDD info in Task events
This commit achieves three main things. First and foremost, it embeds the
information
from the SparkListenerFetchStorageStatus and SparkListenerGetRDDInfo events
into events
that are more descriptive of the SparkListenerInterface. In particular,
every Task now
maintains a list of blocks whose storage status have been updated as a
result of the task.
Previously, this information is retrieved from fetching storage status from
the driver,
an action arbitrarily associated with a stage. This change involves keeping
track of
what blocks are dropped during each call to an RDD persist. A big TODO is
to also capture
the behavior of an RDD unpersist in a SparkListenerEvent.
Second, the SparkListenerEvent interface now handles the dynamic nature of
Executors.
In particular, a new event, SparkListenerExecutorStateChange, is
introduced, which triggers
a storage status fetch from the driver. The purpose of this is mainly to
decouple fetching
storage status from the driver from the Stage. Note that storage status is
not ready until
the remote BlockManagers have been registered, so this involves attaching a
registration
listener to the BlockManagerMasterActor.
Third, changes in environment properties is now supported. This accounts
for the fact that
the user can invoke sc.addFile and sc.addJar in his/her own application,
which should be
reflected appropriately on the EnvironmentUI. In the previous
implementation, coupling this
information with application start prevents this from happening.
Other relatively minor changes include: 1) Refactoring BlockStatus and
BlockManagerInfo to
not be a part of the BlockManagerMasterActor object, 2) Formatting changes,
especially those
involving multi-line arguments, and 3) Making all UI widgets and listeners
private[ui] instead
of private[spark].
commit 28019caa5712b8d7f1db039dc41876d91e530998
Author: Andrew Or <[email protected]>
Date: 2014-02-27T00:47:00Z
Merge github.com:apache/spark
Conflicts:
core/src/main/scala/org/apache/spark/CacheManager.scala
core/src/main/scala/org/apache/spark/SparkEnv.scala
core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala
core/src/main/scala/org/apache/spark/storage/BlockManager.scala
core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
core/src/main/scala/org/apache/spark/ui/SparkUI.scala
core/src/main/scala/org/apache/spark/ui/env/EnvironmentUI.scala
core/src/main/scala/org/apache/spark/ui/exec/ExecutorsUI.scala
core/src/main/scala/org/apache/spark/ui/jobs/IndexPage.scala
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
core/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala
core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
core/src/main/scala/org/apache/spark/ui/storage/IndexPage.scala
core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala
core/src/main/scala/org/apache/spark/util/TimeStampedHashMap.scala
core/src/main/scala/org/apache/spark/util/Utils.scala
core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala
commit d1f428591d6c33c2bb86f85468c7842b5ca00311
Author: Andrew Or <[email protected]>
Date: 2014-02-27T01:19:20Z
Migrate from lift-json to json4s-jackson
commit 7b2f8112795a53c35b10bc3d72e5be7b699ceb65
Author: Andrew Or <[email protected]>
Date: 2014-02-27T19:24:32Z
Guard against TaskMetrics NPE + Fix tests
commit 996d7a2f42d4e02c1e40ec22b0c4d7db86aa03e3
Author: Andrew Or <[email protected]>
Date: 2014-02-27T20:26:23Z
Reflect RDD unpersist on UI
This introduces a new event, SparkListenerUnpersistRDD.
commit 472fd8a4845e39a38f8d993a3527a7e77571ffad
Author: Andrew Or <[email protected]>
Date: 2014-02-27T23:03:59Z
Fix a couple of tests
commit d47585f22f243fc7e840af90132edb7e84b003ed
Author: Andrew Or <[email protected]>
Date: 2014-02-28T00:15:21Z
Clean up FileLogger
commit faa113e674a276ddf5cd7dc643c16b7bed2b5e44
Author: Andrew Or <[email protected]>
Date: 2014-02-28T01:12:19Z
General clean up
commit 4d2fb0c3b667284af31a27698083b2074d2797dc
Author: Andrew Or <[email protected]>
Date: 2014-02-28T01:49:38Z
Fix format fail
commit 0503e4b9e0b988d213e23d792e8f3a21415054d3
Author: Andrew Or <[email protected]>
Date: 2014-02-28T21:51:17Z
Fix PySpark tests + remove sc.clearFiles/clearJars
The reason why PySpark tests failed was because this PR previously
introduced a default
parameter to a few methods in SparkContext, and this was not understood by
the Py4j
conversion of JavaSparkContext, since Java does not have default
parameters. The fix
gets rid of the use of default parameters, which also simplifies the logic
of triggering
SparkListenerEnvironmentUpdate events.
This commit also deprecates sc.clearFiles and sc.clearJars, since they
achieve little
beyond deleting a few map entries when the SparkContext is already
terminating anyway.
commit 5d2cec1daaca58c8c4e09a454db9a92d8d4cc1da
Author: Andrew Or <[email protected]>
Date: 2014-02-28T23:23:35Z
JobLogger: ID -> Id
commit 2981d611e6802d8b5a1e3ae0de2fa97de1342956
Author: Andrew Or <[email protected]>
Date: 2014-03-01T04:17:11Z
Move SparkListenerBus out of DAGScheduler + Clean up
This PR introduces new SparkListenerEvents that are generated outside of
DAGSchduler.
Instead of going through multiple layers (SparkContext -> DAGScheduler ->
SparkListenerBus)
to post the event, we post them directly to sc.listenerBus (SparkContext ->
SparkListenerBus).
This commit also cleans up the initialization order of the UI and the
schedulers in SparkContext,
as well as variable names in DAGScheduler.
Further, some tests create events with null TaskInfo, which causes NPE on
certain UI listeners.
This is now fixed.
commit 2fee310538da401156c2823af61ca0e740e1b78e
Author: Andrew Or <[email protected]>
Date: 2014-03-01T05:18:48Z
Address Patrick's comments
This mainly involves (1) making event logging configurable and (2) setting
the log boundary
to be on the granularity of application rather than job. Other more minor
changes include
variable name changes, and directly assigning TaskMetrics.updatedBlocks
rather than appending
to it.
commit cceff2b71bb0a8bfae7a47939edb97d5cc13effb
Author: andrewor14 <[email protected]>
Date: 2014-03-01T05:55:11Z
Fix 100 char format fail
commit 36b3e5da9be9c5cb0c29fca62a76ccd66fe01c1c
Author: Andrew Or <[email protected]>
Date: 2014-03-03T07:05:17Z
Add HDFS support for event logging
commit 03eda0b267f47c539c12c2ea990836db154bd8c5
Author: Andrew Or <[email protected]>
Date: 2014-03-04T00:14:18Z
Fix HDFS flush behavior
org.apache.hadoop.fs.FSDataOutputStream only supports sync(), but not
flush(). This means that flushing
higher level streams doesn't actually do anything if the Hadoop FileSystem
is used. This is now fixed.
Further, this clarifies why logging local files is handled as a special
case by referencing a known,
unresolved bug in HDFS (HADOOP-7844).
commit aef411cfe1d8bbf3cdc53ae03b5cfb3401a660ba
Author: Andrew Or <[email protected]>
Date: 2014-03-04T02:08:55Z
Fix bug: storage status was not reflected on UI in the local case
This is because the executor ID from task events and that from the storage
status
list are different. The former is "localhost", but the latter is
"<driver>." To be
consistent, we will only use "<driver>."
commit bb4c503597f291fd26bd7e26639416a6faf4488a
Author: Andrew Or <[email protected]>
Date: 2014-03-04T02:36:13Z
Use a more mnemonic path for logging
Also get rid of a couple of unused vals.
commit 18b256d37c7a6335c002759a65d2df36dc8faf2e
Author: Andrew Or <[email protected]>
Date: 2014-03-05T00:27:35Z
Refactor out event logging and replaying logic from UI
This involves taking apart what was once GatewayUISparkListener, and
introducing in its place
the EventLoggingListener and SparkReplayerBus (and AppNameListener, but
this one is not important).
This allows the event logging and replaying functionalities to be used
outside of the context of
the SparkUI.
This commit also ensures all file system modifications go through the
Hadoop FileSystem. This
adds the functionality of reading event logs from HDFS. This also fixes a
bug in which Spark
would attempt to create a log directory in the local file system when the
path in fact refers
to an HDFS directory.
commit e3754310b165609251251c3da8dc34478e8ad55b
Author: Andrew Or <[email protected]>
Date: 2014-03-05T01:45:56Z
Add new constructors for SparkUI
For two reasons - first, the existing way is ugly because we have to
instantiate the SparkUI
by calling new SparkUI(null). Second, this provides a way to configure the
persisted port
through SparkConf.
commit 1ba34070f4d2e463e2aa37e0d05c518bd8771da5
Author: Andrew Or <[email protected]>
Date: 2014-03-05T02:39:47Z
Add a few configurable options to event logging
This includes compression and output buffer size.
commit 291b2be0f663643121b462c5d6dcda101943b2e0
Author: Andrew Or <[email protected]>
Date: 2014-03-05T18:49:27Z
Correct directory in log message "INFO: Logging events to <dir>"
commit 4f69c4a3cb61582eb5b82f1f341446067b97afa2
Author: Andrew Or <[email protected]>
Date: 2014-03-05T23:43:34Z
Master UI - Rebuild SparkUI on application finish
The Master UI now links to a reconstructed SparkUI when an application
finishes,
instead of keeping around a broken link. This involves exposing the path to
the
directory used for event logging.
This commit also allows us to get rid of the SparkListenerApplicationStart
event,
which contains only the application name. Further, the app URL is a
duplicated
parameter in ApplicationInfo and ApplicationDescription, and the app name is
passed as an extraneous argument to creating the backend schedulers in
SparkContext.
These are both fixed.
TODO: Master currently does not know how to read compressed event logs.
commit 176e68e6c2e3dc528f51f35719fb6c3160579f2e
Author: Andrew Or <[email protected]>
Date: 2014-03-06T00:04:50Z
Fix deprecated message for JavaSparkContext (minor)
commit ca258a44af514ac6096ce63bcb28922f8aa4d884
Author: Andrew Or <[email protected]>
Date: 2014-03-06T01:03:12Z
Master UI - add support for reading compressed event logs
In addition to passing an event log dir to ApplicationDescription, we also
pass the compression codec we're using, if we decide to compress logged
events.
commit d6e3b4aefbf80bf52ec4012b8f234a870d516196
Author: Andrew Or <[email protected]>
Date: 2014-03-07T01:19:59Z
Merge github.com:apache/spark
Conflicts:
core/src/main/scala/org/apache/spark/CacheManager.scala
core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
core/src/main/scala/org/apache/spark/storage/BlockManager.scala
core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
commit d59da5f87bbe860cc7273115997cf528e0be2db0
Author: Andrew Or <[email protected]>
Date: 2014-03-07T19:00:29Z
Avoid logging all the blocks on each executor
SparkListenerExecutorsStateChange is refactored into two events:
SparkListenerBlockManagerGained and SparkListenerBlockManagerLost.
Both of these convey the minimum amount of information needed to
reconstruct the storage status (i.e. the BlockManagerId, and in the
registration case, the maximum memory associated with the block
manager).
Further, each executor state change no longer involves logging
storage statuses for ALL executors, when only one has been updated.
commit b6eaea77c52d82012fc32059940fccb845c9f04e
Author: Andrew Or <[email protected]>
Date: 2014-03-10T22:48:57Z
Treating SparkUI as a handler of MasterUI
The main purpose for this is to avoid starting a new Jetty server for
each reconstructed SparkUI. This involves refactoring the existing way
of organizing handlers.
In particular, we currently use an immutable HandlerList to group all
handlers belonging to the same UI. However, this commit requires us to
attach handlers dynamically to a server after it has already started.
A further complication is that the simple HandlerCollection, which can
be mutable, does not perform longest prefix matching, such that a new
context can be engulfed by an existing one (which is extremely difficult
to debug, as the Jetty API is just absolutely superb).
With this commit, attached SparkUIs no longer need to start their own
servers, but simply reside under the /history prefix of the Master Web
UI.
commit 77ba28379dce8e6cfdf1064cf286debdcb385c66
Author: Andrew Or <[email protected]>
Date: 2014-03-11T01:51:48Z
Address Kay's and Patrick's comments
The biggest changes include - synchronizing all methods of listeners used by
the UI, refactor the StorageStatusListener such that listeners no longer
extend
it, and move the StorageStatusListener to storage.
commit dc93915ae051ffc2d855af73b5f7f174f34d56a1
Author: Andrew Or <[email protected]>
Date: 2014-03-11T05:06:41Z
Imports, comments, and code formatting (minor)
commit d801d117f2b60787c41ddfd371afd0d81d443e0b
Author: Andrew Or <[email protected]>
Date: 2014-03-12T02:13:46Z
Merge github.com:apache/spark (major)
The recent security patch pretty much redefined the way we use Jetty
handlers. This requires a major refactoring of our context handler
collection interface to also use the SecurityManager.
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/SparkEnv.scala
core/src/main/scala/org/apache/spark/deploy/master/Master.scala
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerWebUI.scala
core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
core/src/main/scala/org/apache/spark/storage/BlockManager.scala
core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
core/src/main/scala/org/apache/spark/ui/SparkUI.scala
core/src/main/scala/org/apache/spark/ui/env/EnvironmentUI.scala
core/src/main/scala/org/apache/spark/ui/storage/BlockManagerUI.scala
core/src/main/scala/org/apache/spark/util/Utils.scala
commit ac69ec897ba47acc1a8ffe7a94c03d1ab185e313
Author: Andrew Or <[email protected]>
Date: 2014-03-12T03:10:48Z
Fix test fail
commit bf80e3db07d700cdec1a6e6f00ae3a7b96c074ff
Author: Andrew Or <[email protected]>
Date: 2014-03-12T03:39:16Z
Imports, comments, and code formatting, once again (minor)
commit 3456090b6fa959825421bbb40ad5f5b0f2e0df0a
Author: Andrew Or <[email protected]>
Date: 2014-03-13T21:32:10Z
Address Patrick's comments
commit c5c2c8f04eda980feeacdcd345c71a145c06e8af
Author: Andrew Or <[email protected]>
Date: 2014-03-13T23:48:10Z
Remove list of (TaskInfo, TaskMetrics) from StageInfo
From an experiment, I discovered that up to 38% of the logged bytes is
made up of StageInfo (compared to 45% for TaskEnd, which is unavoidable).
This is because the StageInfo in the StageCompleted events currently
store lists of (TaskInfo, TaskMetrics) objects, which are duplicated in
TaskEnd events anyway.
This commit gets rid of this list, which significantly cuts down on log
size and thus log time.
commit 45fd84c838aab5b51ff7e6eee56670c5d1f73bea
Author: Andrew Or <[email protected]>
Date: 2014-03-14T00:03:09Z
Remove now deprecated test
commit 650eb12c09fdf351bab7dfda91f87c7af99f74c8
Author: Andrew Or <[email protected]>
Date: 2014-03-15T04:06:40Z
Add unit tests + Fix bugs found through tests
This covers all JSON de/serialization logic and block manager
reporting blocks with updated storage statuses during put.
commit 6740e49df5562ea990df03442c6d5fe57e450f03
Author: Andrew Or <[email protected]>
Date: 2014-03-15T04:22:59Z
Fix comment nits
commit 9e14f9714b78eb7cfb4f6372d6d2ee76096e621c
Author: Andrew Or <[email protected]>
Date: 2014-03-15T04:54:18Z
Moved around functionality + renamed classes per Patrick
commit f80bd31ef1446996d1be80fdae30c1a5a2eaf149
Author: Andrew Or <[email protected]>
Date: 2014-03-17T23:07:54Z
Simplify static handler and BlockManager status update logic
This commit gets rid of BlockManagerStatusListener, which is there only
because of initialization ordering issues. The solution is to declare
LiveListenerBus from the onset, pass this into BlockManagerMasterActor,
and have the LiveListenerBus buffer all events until all relevant listeners
are registered.
This is done by putting the creation of the asynchronous listener thread
into a start() method, such that all queued events are not actually
released to registered listeners until this is called and the listener
thread is created.
This also includes a couple of smaller clean-ups suggested by @pwendell.
commit 124429f89eb5575fed9453472fe7ec5bbcf4eb30
Author: Andrew Or <[email protected]>
Date: 2014-03-18T04:53:19Z
Clarify LiveListenerBus behavior + Add tests for new behavior
The new behavior being the buffering of events before the bus is started.
commit 222adcd92ecee8113c56f082a8e986748f6c2ae7
Author: Andrew Or <[email protected]>
Date: 2014-03-18T18:03:56Z
Merge github.com:apache/spark
Conflicts:
core/src/main/scala/org/apache/spark/SparkEnv.scala
commit 83af656cfafdc789ee514ea7ee704e5f40e74b3c
Author: Andrew Or <[email protected]>
Date: 2014-03-19T01:42:53Z
Scraps and pieces (no functionality change)
The biggest changes would probably include
(1) The refactoring of helper methods in StorageUtils
(2) The renaming of SparkListenerBlockManagerLost to
SparkListenerBlockManagerRemoved.
(3) Rendering an empty default page if the user requests an RDD
page for an RDD that doesn't exist (otherwise it just crashes)
The rest are mostly formatting and comments.
commit b8ba8173b16e294cb728e09f7cb78f9e2e6d4b45
Author: Andrew Or <[email protected]>
Date: 2014-03-19T01:47:11Z
Remove UI from map when removing application in Master
commit a1c5cd92487153d2e3d5b19bda19b08490d070ac
Author: Andrew Or <[email protected]>
Date: 2014-03-19T07:27:12Z
Merge github.com:apache/spark
Conflicts:
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerWebUI.scala
core/src/main/scala/org/apache/spark/ui/SparkUI.scala
commit e5f14fa5e63636c5eee5df084c913d938fdee541
Author: Andrew Or <[email protected]>
Date: 2014-03-19T17:43:39Z
Merge github.com:apache/spark
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
commit 4de48237f741c40cc0aa6cc52167ae31e7b942b1
Author: Andrew Or <[email protected]>
Date: 2014-03-19T21:23:19Z
Merge github.com:apache/spark
commit 98b24c5aef37f9e1dffe505ff1307897d573766d
Author: Andrew Or <[email protected]>
Date: 2014-03-20T02:25:04Z
Do not stop rendered SparkUIs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---