[GitHub] spark pull request: [Hot Fix #42] Do not stop a SparkUI that has n...

andrewor14 Wed, 19 Mar 2014 19:29:29 -0700

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/185


    [Hot Fix #42] Do not stop a SparkUI that has not bind()'ed

    In Master, we create a SparkUI from event logs without bounding it to a 
port. This avoids creating a new Jetty server for each application. However, 
for each of these SparkUIs, since bind() is not called, calling stop() will 
throw an assertion failure.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark ui-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #185
    
----
commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
Author: Andrew Or <[email protected]>
Date:   2014-02-04T02:18:04Z

    Relax assumptions on compressors and serializers when batching
    
    This commit introduces an intermediate layer of an input stream on the 
batch level.
    This guards against interference from higher level streams (i.e. 
compression and
    deserialization streams), especially pre-fetching, without specifically 
targeting
    particular libraries (Kryo) and forcing shuffle spill compression to use 
LZF.

commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
Author: Andrew Or <[email protected]>
Date:   2014-02-04T02:18:04Z

    Relax assumptions on compressors and serializers when batching
    
    This commit introduces an intermediate layer of an input stream on the 
batch level.
    This guards against interference from higher level streams (i.e. 
compression and
    deserialization streams), especially pre-fetching, without specifically 
targeting
    particular libraries (Kryo) and forcing shuffle spill compression to use 
LZF.

commit 3df700509955f7074821e9aab1e74cb53c58b5a5
Author: Andrew Or <[email protected]>
Date:   2014-02-04T02:27:49Z

    Merge branch 'master' of github.com:andrewor14/incubator-spark

commit 287ef44e593ad72f7434b759be3170d9ee2723d2
Author: Andrew Or <[email protected]>
Date:   2014-02-04T21:38:32Z

    Avoid reading the entire batch into memory; also simplify streaming logic
    
    Additionally, address formatting comments.

commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
Author: Andrew Or <[email protected]>
Date:   2014-02-04T21:44:24Z

    Typo: phyiscal -> physical

commit 13920c918efe22e66a1760b14beceb17a61fd8cc
Author: Andrew Or <[email protected]>
Date:   2014-02-05T00:34:15Z

    Update docs

commit 090544a87a0767effd0c835a53952f72fc8d24f0
Author: Andrew Or <[email protected]>
Date:   2014-02-05T18:58:23Z

    Privatize methods

commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
Author: Andrew Or <[email protected]>
Date:   2014-02-05T20:09:32Z

    Also privatize fields

commit e3ae35f4fb1ce8e2d7398afdbabab3dbf4bb2ffe
Author: Andrew Or <[email protected]>
Date:   2014-02-11T00:15:15Z

    Merge github.com:apache/incubator-spark
    
    Conflicts:
        
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala

commit 8e09306f6dd4ab421447d769572de58035d3d66a
Author: Andrew Or <[email protected]>
Date:   2014-02-12T01:48:16Z

    Use JSON for ExecutorsUI

commit 10ed49dffe4a515bff42762cb025a3f64d9cd407
Author: Andrew Or <[email protected]>
Date:   2014-02-12T18:53:32Z

    Merge github.com:apache/incubator-spark into persist-ui

commit dcbd312b1e4585445868dfb562f9c64ac2fc8cda
Author: Andrew Or <[email protected]>
Date:   2014-02-12T23:58:39Z

    Add JSON Serializability for all SparkListenerEvent's
    
    This also involves a clean-up in the way these events are structured. The 
existing way
    in which these events are defined maintains a lot of extraneous 
information. To avoid
    serializing the whole tree of RDD dependencies, for instance, this commit 
cherry-picks
    only the relevant fields. However, this means sacrificing JobLogger's 
functionality of
    tracing the entire RDD tree.
    
    Additionally, this commit also involves minor formatting and naming 
clean-ups within
    the scope of the above changes.

commit bb222b9f7422cdf9e3a4c682bb271da1f75f4f75
Author: Andrew Or <[email protected]>
Date:   2014-02-13T04:35:09Z

    ExecutorUI: render completely from JSON
    
    Additionally, this commit fixes the bug in the local mode, where executor 
IDs of tasks
    do not match those of storage statuses (more detail in ExecutorsUI.scala).
    
    This commit currently does not serialize the SparkListenerEvents yet, but 
instead
    serializes changes to each executor JSON. This is a big TODO in the 
upcoming commit.

commit bf0b2e9e92d760d49ba7b26aaa41b9e3aef2420f
Author: Andrew Or <[email protected]>
Date:   2014-02-14T03:12:53Z

    ExecutorUI: Serialize events rather than arbitary executor information
    
    This involves adding a new SparkListenerStorageFetchEvent, and adding JSON 
serializability
    to all of the objects it depends on.

commit de8a1cdb833d80423aba629ba932b6f403ecd4ab
Author: Andrew Or <[email protected]>
Date:   2014-02-15T03:22:50Z

    Serialize events both to and from JSON (rather than just to)
    
    This requires every field of every event to be completely reconstructible 
from its
    JSON representation. This commit may contain incomplete state.

commit 8a2ebe6ba37b2d5efe344aa3bea343cda1411212
Author: Andrew Or <[email protected]>
Date:   2014-02-15T06:01:21Z

    Fix bugs for EnvironmentUI and ExecutorsUI
    
    In particular, EnvironmentUI was not rendering until a job begins, and 
ExecutorsUI
    reports an incorrect number (format) of total tasks.

commit c4cd48022b3a8dbf60f458196e21ba8c9cb3b88f
Author: Andrew Or <[email protected]>
Date:   2014-02-15T06:53:43Z

    Also deserialize new events
    
    This includes SparkListenerLoadEnvironment and 
SparkListenerStorageStatusFetch

commit d859efc34c9a5f07bae7eca7b4ab72fa19fb7e29
Author: Andrew Or <[email protected]>
Date:   2014-02-15T22:01:14Z

    BlockManagerUI: Add JSON functionality

commit 8add36bb08126fbcd02d23c446dd3ec970f1f549
Author: Andrew Or <[email protected]>
Date:   2014-02-15T22:40:49Z

    JobProgressUI: Add JSON functionality
    
    In addition, refactor FileLogger to log in one directory per logger

commit b3976b0a2eb21b4a887d01fd16869a0f37c36f8b
Author: Andrew Or <[email protected]>
Date:   2014-02-16T06:52:43Z

    Add functionality of reconstructing a persisted UI from SparkContext
    
    With this commit, any reconstruct SparkUI resides on default port of 14040 
onwards.
    Logged events are posted separately from live events, such that the live 
SparkListeners
    are not affected.
    
    This commit also fixes a few JSON de/serialization bugs.

commit 4dfcd224504f392302a49ac82280b294c381f381
Author: Andrew Or <[email protected]>
Date:   2014-02-17T19:14:20Z

    Merge git://git.apache.org/incubator-spark into persist-ui

commit f3fc13b53725cdfeddcecb2068ab5a533566772f
Author: Andrew Or <[email protected]>
Date:   2014-02-17T21:22:01Z

    General refactor
    
    This includes reverting previous formatting and naming changes that are 
irrelevant to
    this patch.

commit 3fd584e30aaf6552179bf9e9b350b130fa92d0ad
Author: Andrew Or <[email protected]>
Date:   2014-02-18T02:01:12Z

    Fix two major bugs
    
    First, JobProgessListener uses HashSets of TaskInfo and StageInfo, and 
relies on the equality
    of these objects to remove from the corresponding HashSets correctly. This 
is not a luxury that
    deserialized StageInfo's and TaskInfo's have. Instead, when removing from 
these collections, we
    must match by the ID rather than the object itself.
    
    Second, although SparkUI differentiates between persisted and live UI's, 
its children UI's and
    their corresponding listeners do not. Thus, each revived UI essentially 
duplicated all the logs
    that reconstructed it in the first place. Further, these zombie UI's 
continued to respond to
    live SparkListenerEvents. This has been fixed by requiring that revived 
UI's do not register
    their listeners with the current SparkContext.
    
    With the former fix, there were major incompatibility issues with the 
existing way UI classes
    access and mutate the collections. Formatting improvements associated with 
smoothing out these
    inconsistencies are included as part of this commit.

commit 5ac906d4dfd546c5d6b6e80540c8774f3985fecc
Author: Andrew Or <[email protected]>
Date:   2014-02-18T05:38:16Z

    Mostly naming, formatting, and code style changes

commit 904c7294ac221a0cd9806af843219aaa8a847085
Author: Andrew Or <[email protected]>
Date:   2014-02-18T06:06:46Z

    Fix another major bug
    
    Previously, rendering the old, persisted UI continues to trigger load 
environment and
    storage status fetch events. These are now only triggered for the live UI.
    
    A related TODO: Under JobProgressUI, the total duration is inaccurate; 
right now it uses
    the time when the old UI is revived, rather than when it was live. This 
should be fixed.

commit 427301371117e9e7889f5df0f6bba51e5916e425
Author: Andrew Or <[email protected]>
Date:   2014-02-18T23:27:39Z

    Add a gateway SparkListener to simplify event logging
    
    Instead of having each SparkListener log an independent set of events, 
centralize event
    logging to avoid differentiating events across UI's and thus duplicating 
logged events.
    Also rename the "fromDisk" parameter to "live".
    
    TODO: Storage page currently still relies on the previous SparkContext and 
is not
    rendering correctly.

commit 64d2ce1efee3aa5a8166c5fe108932b2279217fc
Author: Andrew Or <[email protected]>
Date:   2014-02-19T02:29:21Z

    Fix BlockManagerUI bug by introducing new event
    
    Previously, the storage information of persisted RDD's continued to rely on 
the old SparkContext,
    which is no longer accessible if the UI is rendered from disk. This fix 
solves it by introducing
    an event, SparkListenerGetRDDInfo, which captures this information.
    
    Per discussion with Patrick, an alternative is to encapsulate this 
information within
    SparkListenerTaskEnd. This would bypass the need to create a new event, but 
would also require
    a non-trivial refactor of BlockManager / BlockStore.

commit 6814da0cf9af2a29810b6773463acee3b259c95f
Author: Andrew Or <[email protected]>
Date:   2014-02-19T18:36:01Z

    Explicitly register each UI listener rather than through some magic
    
    This (1) allows UISparkListener to be a simple trait and (2) is more 
intuitive, since it
    mirrors sc.addSparkListener(listener), for all other non-UI listeners.

commit d646df6786737d67d5ca1dbf593740a02a600991
Author: Andrew Or <[email protected]>
Date:   2014-02-20T02:47:35Z

    Completely decouple SparkUI from SparkContext
    
    This involves storing additional fields, such as the scheduling mode and 
the app name, into the
    new event, SparkListenerApplicationStart, since these attributes are no 
longer accessible without
    a SparkContext. Further, environment information is refactored to be loaded 
on application start
    (rather than on job start).
    
    Persisted Spark UI's can no longer be created from SparkContext. The new 
way of constructing them
    is through a standalone scala program. org.apache.spark.ui.UIReloader is 
introduced as an example
    of how to do this.

commit e9e1c6dede36788d3cefe3c65366f5a79be97a1d
Author: Andrew Or <[email protected]>
Date:   2014-02-21T07:51:08Z

    Move all JSON de/serialization logic to JsonProtocol
    
    This makes all classes involved appear less cluttered.

commit 70e7e7acf09d8efd2c7e459ee450c1db140b8f5a
Author: Andrew Or <[email protected]>
Date:   2014-02-22T02:56:26Z

    Formatting changes

commit 6631c02a8791d0321f003bb339344445f4dd0cab
Author: Andrew Or <[email protected]>
Date:   2014-02-24T18:52:21Z

    More formatting changes, this time mainly for Json DSL

commit bbe3501c63029ffa9c1fd9053e7ab868d0f28b10
Author: Andrew Or <[email protected]>
Date:   2014-02-26T23:27:43Z

    Embed storage status and RDD info in Task events
    
    This commit achieves three main things. First and foremost, it embeds the 
information
    from the SparkListenerFetchStorageStatus and SparkListenerGetRDDInfo events 
into events
    that are more descriptive of the SparkListenerInterface. In particular, 
every Task now
    maintains a list of blocks whose storage status have been updated as a 
result of the task.
    Previously, this information is retrieved from fetching storage status from 
the driver,
    an action arbitrarily associated with a stage. This change involves keeping 
track of
    what blocks are dropped during each call to an RDD persist. A big TODO is 
to also capture
    the behavior of an RDD unpersist in a SparkListenerEvent.
    
    Second, the SparkListenerEvent interface now handles the dynamic nature of 
Executors.
    In particular, a new event, SparkListenerExecutorStateChange, is 
introduced, which triggers
    a storage status fetch from the driver. The purpose of this is mainly to 
decouple fetching
    storage status from the driver from the Stage. Note that storage status is 
not ready until
    the remote BlockManagers have been registered, so this involves attaching a 
registration
    listener to the BlockManagerMasterActor.
    
    Third, changes in environment properties is now supported. This accounts 
for the fact that
    the user can invoke sc.addFile and sc.addJar in his/her own application, 
which should be
    reflected appropriately on the EnvironmentUI. In the previous 
implementation, coupling this
    information with application start prevents this from happening.
    
    Other relatively minor changes include: 1) Refactoring BlockStatus and 
BlockManagerInfo to
    not be a part of the BlockManagerMasterActor object, 2) Formatting changes, 
especially those
    involving multi-line arguments, and 3) Making all UI widgets and listeners 
private[ui] instead
    of private[spark].

commit 28019caa5712b8d7f1db039dc41876d91e530998
Author: Andrew Or <[email protected]>
Date:   2014-02-27T00:47:00Z

    Merge github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/CacheManager.scala
        core/src/main/scala/org/apache/spark/SparkEnv.scala
        core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
        core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala
        core/src/main/scala/org/apache/spark/storage/BlockManager.scala
        core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
        core/src/main/scala/org/apache/spark/storage/StorageUtils.scala
        core/src/main/scala/org/apache/spark/ui/SparkUI.scala
        core/src/main/scala/org/apache/spark/ui/env/EnvironmentUI.scala
        core/src/main/scala/org/apache/spark/ui/exec/ExecutorsUI.scala
        core/src/main/scala/org/apache/spark/ui/jobs/IndexPage.scala
        core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala
        core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
        core/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala
        core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
        core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
        core/src/main/scala/org/apache/spark/ui/storage/IndexPage.scala
        core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala
        core/src/main/scala/org/apache/spark/util/TimeStampedHashMap.scala
        core/src/main/scala/org/apache/spark/util/Utils.scala
        
core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala

commit d1f428591d6c33c2bb86f85468c7842b5ca00311
Author: Andrew Or <[email protected]>
Date:   2014-02-27T01:19:20Z

    Migrate from lift-json to json4s-jackson

commit 7b2f8112795a53c35b10bc3d72e5be7b699ceb65
Author: Andrew Or <[email protected]>
Date:   2014-02-27T19:24:32Z

    Guard against TaskMetrics NPE + Fix tests

commit 996d7a2f42d4e02c1e40ec22b0c4d7db86aa03e3
Author: Andrew Or <[email protected]>
Date:   2014-02-27T20:26:23Z

    Reflect RDD unpersist on UI
    
    This introduces a new event, SparkListenerUnpersistRDD.

commit 472fd8a4845e39a38f8d993a3527a7e77571ffad
Author: Andrew Or <[email protected]>
Date:   2014-02-27T23:03:59Z

    Fix a couple of tests

commit d47585f22f243fc7e840af90132edb7e84b003ed
Author: Andrew Or <[email protected]>
Date:   2014-02-28T00:15:21Z

    Clean up FileLogger

commit faa113e674a276ddf5cd7dc643c16b7bed2b5e44
Author: Andrew Or <[email protected]>
Date:   2014-02-28T01:12:19Z

    General clean up

commit 4d2fb0c3b667284af31a27698083b2074d2797dc
Author: Andrew Or <[email protected]>
Date:   2014-02-28T01:49:38Z

    Fix format fail

commit 0503e4b9e0b988d213e23d792e8f3a21415054d3
Author: Andrew Or <[email protected]>
Date:   2014-02-28T21:51:17Z

    Fix PySpark tests + remove sc.clearFiles/clearJars
    
    The reason why PySpark tests failed was because this PR previously 
introduced a default
    parameter to a few methods in SparkContext, and this was not understood by 
the Py4j
    conversion of JavaSparkContext, since Java does not have default 
parameters. The fix
    gets rid of the use of default parameters, which also simplifies the logic 
of triggering
    SparkListenerEnvironmentUpdate events.
    
    This commit also deprecates sc.clearFiles and sc.clearJars, since they 
achieve little
    beyond deleting a few map entries when the SparkContext is already 
terminating anyway.

commit 5d2cec1daaca58c8c4e09a454db9a92d8d4cc1da
Author: Andrew Or <[email protected]>
Date:   2014-02-28T23:23:35Z

    JobLogger: ID -> Id

commit 2981d611e6802d8b5a1e3ae0de2fa97de1342956
Author: Andrew Or <[email protected]>
Date:   2014-03-01T04:17:11Z

    Move SparkListenerBus out of DAGScheduler + Clean up
    
    This PR introduces new SparkListenerEvents that are generated outside of 
DAGSchduler.
    Instead of going through multiple layers (SparkContext -> DAGScheduler -> 
SparkListenerBus)
    to post the event, we post them directly to sc.listenerBus (SparkContext -> 
SparkListenerBus).
    This commit also cleans up the initialization order of the UI and the 
schedulers in SparkContext,
    as well as variable names in DAGScheduler.
    
    Further, some tests create events with null TaskInfo, which causes NPE on 
certain UI listeners.
    This is now fixed.

commit 2fee310538da401156c2823af61ca0e740e1b78e
Author: Andrew Or <[email protected]>
Date:   2014-03-01T05:18:48Z

    Address Patrick's comments
    
    This mainly involves (1) making event logging configurable and (2) setting 
the log boundary
    to be on the granularity of application rather than job. Other more minor 
changes include
    variable name changes, and directly assigning TaskMetrics.updatedBlocks 
rather than appending
    to it.

commit cceff2b71bb0a8bfae7a47939edb97d5cc13effb
Author: andrewor14 <[email protected]>
Date:   2014-03-01T05:55:11Z

    Fix 100 char format fail

commit 36b3e5da9be9c5cb0c29fca62a76ccd66fe01c1c
Author: Andrew Or <[email protected]>
Date:   2014-03-03T07:05:17Z

    Add HDFS support for event logging

commit 03eda0b267f47c539c12c2ea990836db154bd8c5
Author: Andrew Or <[email protected]>
Date:   2014-03-04T00:14:18Z

    Fix HDFS flush behavior
    
    org.apache.hadoop.fs.FSDataOutputStream only supports sync(), but not 
flush(). This means that flushing
    higher level streams doesn't actually do anything if the Hadoop FileSystem 
is used. This is now fixed.
    
    Further, this clarifies why logging local files is handled as a special 
case by referencing a known,
    unresolved bug in HDFS (HADOOP-7844).

commit aef411cfe1d8bbf3cdc53ae03b5cfb3401a660ba
Author: Andrew Or <[email protected]>
Date:   2014-03-04T02:08:55Z

    Fix bug: storage status was not reflected on UI in the local case
    
    This is because the executor ID from task events and that from the storage 
status
    list are different. The former is "localhost", but the latter is 
"<driver>." To be
    consistent, we will only use "<driver>."

commit bb4c503597f291fd26bd7e26639416a6faf4488a
Author: Andrew Or <[email protected]>
Date:   2014-03-04T02:36:13Z

    Use a more mnemonic path for logging
    
    Also get rid of a couple of unused vals.

commit 18b256d37c7a6335c002759a65d2df36dc8faf2e
Author: Andrew Or <[email protected]>
Date:   2014-03-05T00:27:35Z

    Refactor out event logging and replaying logic from UI
    
    This involves taking apart what was once GatewayUISparkListener, and 
introducing in its place
    the EventLoggingListener and SparkReplayerBus (and AppNameListener, but 
this one is not important).
    This allows the event logging and replaying functionalities to be used 
outside of the context of
    the SparkUI.
    
    This commit also ensures all file system modifications go through the 
Hadoop FileSystem. This
    adds the functionality of reading event logs from HDFS. This also fixes a 
bug in which Spark
    would attempt to create a log directory in the local file system when the 
path in fact refers
    to an HDFS directory.

commit e3754310b165609251251c3da8dc34478e8ad55b
Author: Andrew Or <[email protected]>
Date:   2014-03-05T01:45:56Z

    Add new constructors for SparkUI
    
    For two reasons - first, the existing way is ugly because we have to 
instantiate the SparkUI
    by calling new SparkUI(null). Second, this provides a way to configure the 
persisted port
    through SparkConf.

commit 1ba34070f4d2e463e2aa37e0d05c518bd8771da5
Author: Andrew Or <[email protected]>
Date:   2014-03-05T02:39:47Z

    Add a few configurable options to event logging
    
    This includes compression and output buffer size.

commit 291b2be0f663643121b462c5d6dcda101943b2e0
Author: Andrew Or <[email protected]>
Date:   2014-03-05T18:49:27Z

    Correct directory in log message "INFO: Logging events to <dir>"

commit 4f69c4a3cb61582eb5b82f1f341446067b97afa2
Author: Andrew Or <[email protected]>
Date:   2014-03-05T23:43:34Z

    Master UI - Rebuild SparkUI on application finish
    
    The Master UI now links to a reconstructed SparkUI when an application 
finishes,
    instead of keeping around a broken link. This involves exposing the path to 
the
    directory used for event logging.
    
    This commit also allows us to get rid of the SparkListenerApplicationStart 
event,
    which contains only the application name. Further, the app URL is a 
duplicated
    parameter in ApplicationInfo and ApplicationDescription, and the app name is
    passed as an extraneous argument to creating the backend schedulers in 
SparkContext.
    These are both fixed.
    
    TODO: Master currently does not know how to read compressed event logs.

commit 176e68e6c2e3dc528f51f35719fb6c3160579f2e
Author: Andrew Or <[email protected]>
Date:   2014-03-06T00:04:50Z

    Fix deprecated message for JavaSparkContext (minor)

commit ca258a44af514ac6096ce63bcb28922f8aa4d884
Author: Andrew Or <[email protected]>
Date:   2014-03-06T01:03:12Z

    Master UI - add support for reading compressed event logs
    
    In addition to passing an event log dir to ApplicationDescription, we also
    pass the compression codec we're using, if we decide to compress logged 
events.

commit d6e3b4aefbf80bf52ec4012b8f234a870d516196
Author: Andrew Or <[email protected]>
Date:   2014-03-07T01:19:59Z

    Merge github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/CacheManager.scala
        core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
        core/src/main/scala/org/apache/spark/storage/BlockManager.scala
        core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
        core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala

commit d59da5f87bbe860cc7273115997cf528e0be2db0
Author: Andrew Or <[email protected]>
Date:   2014-03-07T19:00:29Z

    Avoid logging all the blocks on each executor
    
    SparkListenerExecutorsStateChange is refactored into two events:
    SparkListenerBlockManagerGained and SparkListenerBlockManagerLost.
    Both of these convey the minimum amount of information needed to
    reconstruct the storage status (i.e. the BlockManagerId, and in the
    registration case, the maximum memory associated with the block
    manager).
    
    Further, each executor state change no longer involves logging
    storage statuses for ALL executors, when only one has been updated.

commit b6eaea77c52d82012fc32059940fccb845c9f04e
Author: Andrew Or <[email protected]>
Date:   2014-03-10T22:48:57Z

    Treating SparkUI as a handler of MasterUI
    
    The main purpose for this is to avoid starting a new Jetty server for
    each reconstructed SparkUI. This involves refactoring the existing way
    of organizing handlers.
    
    In particular, we currently use an immutable HandlerList to group all
    handlers belonging to the same UI. However, this commit requires us to
    attach handlers dynamically to a server after it has already started.
    A further complication is that the simple HandlerCollection, which can
    be mutable, does not perform longest prefix matching, such that a new
    context can be engulfed by an existing one (which is extremely difficult
    to debug, as the Jetty API is just absolutely superb).
    
    With this commit, attached SparkUIs no longer need to start their own
    servers, but simply reside under the /history prefix of the Master Web
    UI.

commit 77ba28379dce8e6cfdf1064cf286debdcb385c66
Author: Andrew Or <[email protected]>
Date:   2014-03-11T01:51:48Z

    Address Kay's and Patrick's comments
    
    The biggest changes include - synchronizing all methods of listeners used by
    the UI, refactor the StorageStatusListener such that listeners no longer 
extend
    it, and move the StorageStatusListener to storage.

commit dc93915ae051ffc2d855af73b5f7f174f34d56a1
Author: Andrew Or <[email protected]>
Date:   2014-03-11T05:06:41Z

    Imports, comments, and code formatting (minor)

commit d801d117f2b60787c41ddfd371afd0d81d443e0b
Author: Andrew Or <[email protected]>
Date:   2014-03-12T02:13:46Z

    Merge github.com:apache/spark (major)
    
    The recent security patch pretty much redefined the way we use Jetty
    handlers. This requires a major refactoring of our context handler
    collection interface to also use the SecurityManager.
    
    Conflicts:
        core/src/main/scala/org/apache/spark/SparkContext.scala
        core/src/main/scala/org/apache/spark/SparkEnv.scala
        core/src/main/scala/org/apache/spark/deploy/master/Master.scala
        core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
        core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerWebUI.scala
        core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala
        core/src/main/scala/org/apache/spark/storage/BlockManager.scala
        core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
        core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
        core/src/main/scala/org/apache/spark/ui/SparkUI.scala
        core/src/main/scala/org/apache/spark/ui/env/EnvironmentUI.scala
        core/src/main/scala/org/apache/spark/ui/storage/BlockManagerUI.scala
        core/src/main/scala/org/apache/spark/util/Utils.scala

commit ac69ec897ba47acc1a8ffe7a94c03d1ab185e313
Author: Andrew Or <[email protected]>
Date:   2014-03-12T03:10:48Z

    Fix test fail

commit bf80e3db07d700cdec1a6e6f00ae3a7b96c074ff
Author: Andrew Or <[email protected]>
Date:   2014-03-12T03:39:16Z

    Imports, comments, and code formatting, once again (minor)

commit 3456090b6fa959825421bbb40ad5f5b0f2e0df0a
Author: Andrew Or <[email protected]>
Date:   2014-03-13T21:32:10Z

    Address Patrick's comments

commit c5c2c8f04eda980feeacdcd345c71a145c06e8af
Author: Andrew Or <[email protected]>
Date:   2014-03-13T23:48:10Z

    Remove list of (TaskInfo, TaskMetrics) from StageInfo
    
    From an experiment, I discovered that up to 38% of the logged bytes is
    made up of StageInfo (compared to 45% for TaskEnd, which is unavoidable).
    This is because the StageInfo in the StageCompleted events currently
    store lists of (TaskInfo, TaskMetrics) objects, which are duplicated in
    TaskEnd events anyway.
    
    This commit gets rid of this list, which significantly cuts down on log
    size and thus log time.

commit 45fd84c838aab5b51ff7e6eee56670c5d1f73bea
Author: Andrew Or <[email protected]>
Date:   2014-03-14T00:03:09Z

    Remove now deprecated test

commit 650eb12c09fdf351bab7dfda91f87c7af99f74c8
Author: Andrew Or <[email protected]>
Date:   2014-03-15T04:06:40Z

    Add unit tests + Fix bugs found through tests
    
    This covers all JSON de/serialization logic and block manager
    reporting blocks with updated storage statuses during put.

commit 6740e49df5562ea990df03442c6d5fe57e450f03
Author: Andrew Or <[email protected]>
Date:   2014-03-15T04:22:59Z

    Fix comment nits

commit 9e14f9714b78eb7cfb4f6372d6d2ee76096e621c
Author: Andrew Or <[email protected]>
Date:   2014-03-15T04:54:18Z

    Moved around functionality + renamed classes per Patrick

commit f80bd31ef1446996d1be80fdae30c1a5a2eaf149
Author: Andrew Or <[email protected]>
Date:   2014-03-17T23:07:54Z

    Simplify static handler and BlockManager status update logic
    
    This commit gets rid of BlockManagerStatusListener, which is there only
    because of initialization ordering issues. The solution is to declare
    LiveListenerBus from the onset, pass this into BlockManagerMasterActor,
    and have the LiveListenerBus buffer all events until all relevant listeners
    are registered.
    
    This is done by putting the creation of the asynchronous listener thread
    into a start() method, such that all queued events are not actually
    released to registered listeners until this is called and the listener
    thread is created.
    
    This also includes a couple of smaller clean-ups suggested by @pwendell.

commit 124429f89eb5575fed9453472fe7ec5bbcf4eb30
Author: Andrew Or <[email protected]>
Date:   2014-03-18T04:53:19Z

    Clarify LiveListenerBus behavior + Add tests for new behavior
    
    The new behavior being the buffering of events before the bus is started.

commit 222adcd92ecee8113c56f082a8e986748f6c2ae7
Author: Andrew Or <[email protected]>
Date:   2014-03-18T18:03:56Z

    Merge github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/SparkEnv.scala

commit 83af656cfafdc789ee514ea7ee704e5f40e74b3c
Author: Andrew Or <[email protected]>
Date:   2014-03-19T01:42:53Z

    Scraps and pieces (no functionality change)
    
    The biggest changes would probably include
      (1) The refactoring of helper methods in StorageUtils
      (2) The renaming of SparkListenerBlockManagerLost to
          SparkListenerBlockManagerRemoved.
      (3) Rendering an empty default page if the user requests an RDD
          page for an RDD that doesn't exist (otherwise it just crashes)
    
    The rest are mostly formatting and comments.

commit b8ba8173b16e294cb728e09f7cb78f9e2e6d4b45
Author: Andrew Or <[email protected]>
Date:   2014-03-19T01:47:11Z

    Remove UI from map when removing application in Master

commit a1c5cd92487153d2e3d5b19bda19b08490d070ac
Author: Andrew Or <[email protected]>
Date:   2014-03-19T07:27:12Z

    Merge github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
        core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerWebUI.scala
        core/src/main/scala/org/apache/spark/ui/SparkUI.scala

commit e5f14fa5e63636c5eee5df084c913d938fdee541
Author: Andrew Or <[email protected]>
Date:   2014-03-19T17:43:39Z

    Merge github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala

commit 4de48237f741c40cc0aa6cc52167ae31e7b942b1
Author: Andrew Or <[email protected]>
Date:   2014-03-19T21:23:19Z

    Merge github.com:apache/spark

commit 98b24c5aef37f9e1dffe505ff1307897d573766d
Author: Andrew Or <[email protected]>
Date:   2014-03-20T02:25:04Z

    Do not stop rendered SparkUIs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #42] Do not stop a SparkUI that has n...

Reply via email to