GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/5765

    [TESTING ONLY] [DO] [NOT] [MERGE]

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark viz-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5765
    
----
commit 6b3403be587fce495276fcb137d3d8d7afc839a7
Author: Andrew Or <[email protected]>
Date:   2015-04-17T00:33:26Z

    Scope all RDD methods
    
    This commit provides a mechanism to set and unset the call scope
    around each RDD operation defined in RDD.scala. This is useful
    for tagging an RDD with the scope in which it is created. This
    will be extended to similar methods in SparkContext.scala and
    other relevant files in a future commit.

commit a9ed4f9e563a6b4ba4a351f0170da53b3a4c973f
Author: Andrew Or <[email protected]>
Date:   2015-04-17T00:46:19Z

    Add a few missing scopes to certain RDD methods

commit 5143523227d1dc989658f2f8a11e5fa97d8add03
Author: Andrew Or <[email protected]>
Date:   2015-04-17T01:44:08Z

    Expose the necessary information in RDDInfo
    
    This includes the scope field that we added in previous commits,
    and the parent IDs for tracking the lineage through the listener
    API.

commit 21843488193295fea8a08c3cb1556d0b62a809ba
Author: Andrew Or <[email protected]>
Date:   2015-04-17T18:00:31Z

    Translate RDD information to dot file
    
    It turns out that the previous scope information is insufficient
    for producing a valid dot file. In particular, the scope hierarchy
    was missing, but crucial to differentiate between a parent RDD
    being in the same encompassing scope and it being in a completely
    distinct scope. Also, unique scope identifiers are needed to
    simplify the code significantly.
    
    This commit further adds the translation logic in a UI listener
    that converts RDDInfos to dot files.

commit f22f3379edbdb301631440d1627fb633d0da143f
Author: Andrew Or <[email protected]>
Date:   2015-04-17T20:52:17Z

    First working implementation of visualization with vis.js

commit 9fac6f37e08b74ae19fa268923d10871ffe08aed
Author: Andrew Or <[email protected]>
Date:   2015-04-22T02:23:16Z

    Re-implement scopes through annotations instead
    
    The previous "working" implementation frequently ran into
    NotSerializableExceptions. Why? ClosureCleaner doesn't like
    closures being wrapped in other closures, and these closures
    are simply not cleaned (details are intentionally omitted here).
    
    This commit reimplements scoping through annotations. All methods
    that should be scoped are now annotated with @RDDScope. Then, on
    creation, each RDD derives its scope from the stack trace, similar
    to how it derives its call site. This is the cleanest approach
    that bypasses NotSerializableExceptions with least significant
    limitations.

commit 494d5c28b38d3d829f008a1bba406e63d4ec8680
Author: Andrew Or <[email protected]>
Date:   2015-04-22T02:39:14Z

    Revert a few unintended style changes

commit 6a7cdcaed6bb6fd856bd7e2e15b0d78cbdb0b2d1
Author: Andrew Or <[email protected]>
Date:   2015-04-22T03:00:30Z

    Move RDD scope util methods and logic to its own file
    
    Just a small code re-organization.

commit 5e22946945f683927cabafeb0ede3bc8e275e4a0
Author: Andrew Or <[email protected]>
Date:   2015-04-22T03:01:17Z

    Merge branch 'master' of github.com:apache/spark into viz

commit 205f838477de8cabd28aab6301a67fd7d07bc517
Author: Andrew Or <[email protected]>
Date:   2015-04-23T05:33:31Z

    Reimplement rendering with dagre-d3 instead of viz.js
    
    Before this commit, this patch relies on a JavaScript version of
    GraphViz that was compiled from C. Even the minified version of
    this resource was ~2.5M. The main motivation for switching away
    from this library, however, is that this is a complete black box
    of which we have absolutely no control. It is not at all extensible,
    and if something breaks we will have a hard time understanding
    why.
    
    The new library, dagre-d3, is not perfect either. It does not
    officially support clustering of nodes; for certain large graphs,
    the clusters will have a lot of unnecessary whitespace. A few in
    the dagre-d3 community are looking into a solution, but until then
    we will have to live with this (minor) inconvenience.

commit 86f78237b7623e4efa06c5feb053e0c304979c73
Author: Andrew Or <[email protected]>
Date:   2015-04-24T10:05:58Z

    Implement transitive cleaning + add missing documentation
    
    See in-code comments for more detail on what this means.

commit 2390a608ed74a9703d3763d040421dccb51242ec
Author: Andrew Or <[email protected]>
Date:   2015-04-24T10:08:11Z

    Feature flag this new behavior
    
    ... in case anything breaks, we should be able to resort to old
    behavior.

commit 438c68f82902c0b6899a4a8bb54783c1aef8a7dd
Author: Andrew Or <[email protected]>
Date:   2015-04-24T19:17:09Z

    Minor changes

commit a4866e3387ff5341280753909e0e1ed9a66502f2
Author: Andrew Or <[email protected]>
Date:   2015-04-24T23:18:18Z

    Add tests (still WIP)
    
    The existing ones are not passing yet because cleaning closures
    is not idempotent. This will be added in a future commit.

commit 06fd668eeec6ff773db8cf9e38c66937abf8ca5a
Author: Andrew Or <[email protected]>
Date:   2015-04-24T23:29:09Z

    Make closure cleaning idempotent
    
    We need this for tests because we clean the same closure many
    times there. Outside of tests this is probably not important.

commit 263593ddc9224774b7af76ceb7364d2ee82aef2c
Author: Andrew Or <[email protected]>
Date:   2015-04-24T23:58:13Z

    Finalize tests

commit 2106f125ec90f1984ea7b5ad5cb571f593ee1b5d
Author: Andrew Or <[email protected]>
Date:   2015-04-25T00:29:25Z

    Merge branch 'master' of github.com:apache/spark into closure-cleaner

commit 6d36f385a7783aea22152b9937cb685081a7c020
Author: Andrew Or <[email protected]>
Date:   2015-04-25T00:41:27Z

    Fix closure cleaner visibility

commit e6721706ac5e82b638062c3ae9f6dc35bf4e7e2d
Author: Andrew Or <[email protected]>
Date:   2015-04-25T02:31:08Z

    Guard against potential infinite cycles in method visitor
    
    Now we keep track of the methods that we visited to avoid visiting
    the same method twice.

commit a3aa465e35753e0ee9b70f97bb1f41fc61b0f5aa
Author: Andrew Or <[email protected]>
Date:   2015-04-25T08:57:48Z

    Add more tests for individual closure cleaner operations

commit eb127e54fa04ef17523806067d074b8560ea783e
Author: Andrew Or <[email protected]>
Date:   2015-04-25T09:20:47Z

    Use private method tester for a few things

commit 8b71cdb7953ce622fd94fda7e0c5daafeb145cca
Author: Andrew Or <[email protected]>
Date:   2015-04-25T21:46:47Z

    Update a few comments

commit e45e9049296e6f15a0e60febf3a5581db43c0ffb
Author: Andrew Or <[email protected]>
Date:   2015-04-25T23:37:37Z

    More minor updates (wording, renaming etc.)

commit 4aab379c0d6bc12a9e0c4a984d87bbfc21bd948b
Author: Andrew Or <[email protected]>
Date:   2015-04-25T23:38:52Z

    Merge branch 'master' of github.com:apache/spark into closure-cleaner

commit 6d4d3f1ac8da883fb814613afec35900b078b751
Author: Andrew Or <[email protected]>
Date:   2015-04-26T03:07:35Z

    Fix scala style?

commit fe7816fe25c2f68ff2eee931ebe7a95b1cc97cdf
Author: Andrew Or <[email protected]>
Date:   2015-04-27T19:37:41Z

    Merge branch 'master' of github.com:apache/spark into viz

commit 8dd5af265ee0c395c4c6d831ca697775d9e28104
Author: Andrew Or <[email protected]>
Date:   2015-04-27T21:50:45Z

    Fill in documentation + miscellaneous minor changes
    
    For instance, this adds ability to throw away old stage graphs.

commit 71281fa15d3bebac583e93ff84c5062f760b753d
Author: Andrew Or <[email protected]>
Date:   2015-04-27T22:40:52Z

    Embed the viz in the UI in a toggleable manner

commit 09d361eb53a98d758891f3db39d8c9d4c239ee88
Author: Andrew Or <[email protected]>
Date:   2015-04-27T23:42:19Z

    Add ID to node label (minor)

commit 52187fcfaafe8d9ac4531a4a76c2c79281d43f73
Author: Andrew Or <[email protected]>
Date:   2015-04-28T00:17:09Z

    Rat excludes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to