GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/5912
[SPARK-7347] Add hover to RDDs in DAG visualization
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark viz-hover
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5912.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5912
----
commit 6b3403be587fce495276fcb137d3d8d7afc839a7
Author: Andrew Or <[email protected]>
Date: 2015-04-17T00:33:26Z
Scope all RDD methods
This commit provides a mechanism to set and unset the call scope
around each RDD operation defined in RDD.scala. This is useful
for tagging an RDD with the scope in which it is created. This
will be extended to similar methods in SparkContext.scala and
other relevant files in a future commit.
commit a9ed4f9e563a6b4ba4a351f0170da53b3a4c973f
Author: Andrew Or <[email protected]>
Date: 2015-04-17T00:46:19Z
Add a few missing scopes to certain RDD methods
commit 5143523227d1dc989658f2f8a11e5fa97d8add03
Author: Andrew Or <[email protected]>
Date: 2015-04-17T01:44:08Z
Expose the necessary information in RDDInfo
This includes the scope field that we added in previous commits,
and the parent IDs for tracking the lineage through the listener
API.
commit 21843488193295fea8a08c3cb1556d0b62a809ba
Author: Andrew Or <[email protected]>
Date: 2015-04-17T18:00:31Z
Translate RDD information to dot file
It turns out that the previous scope information is insufficient
for producing a valid dot file. In particular, the scope hierarchy
was missing, but crucial to differentiate between a parent RDD
being in the same encompassing scope and it being in a completely
distinct scope. Also, unique scope identifiers are needed to
simplify the code significantly.
This commit further adds the translation logic in a UI listener
that converts RDDInfos to dot files.
commit f22f3379edbdb301631440d1627fb633d0da143f
Author: Andrew Or <[email protected]>
Date: 2015-04-17T20:52:17Z
First working implementation of visualization with vis.js
commit 9fac6f37e08b74ae19fa268923d10871ffe08aed
Author: Andrew Or <[email protected]>
Date: 2015-04-22T02:23:16Z
Re-implement scopes through annotations instead
The previous "working" implementation frequently ran into
NotSerializableExceptions. Why? ClosureCleaner doesn't like
closures being wrapped in other closures, and these closures
are simply not cleaned (details are intentionally omitted here).
This commit reimplements scoping through annotations. All methods
that should be scoped are now annotated with @RDDScope. Then, on
creation, each RDD derives its scope from the stack trace, similar
to how it derives its call site. This is the cleanest approach
that bypasses NotSerializableExceptions with least significant
limitations.
commit 494d5c28b38d3d829f008a1bba406e63d4ec8680
Author: Andrew Or <[email protected]>
Date: 2015-04-22T02:39:14Z
Revert a few unintended style changes
commit 6a7cdcaed6bb6fd856bd7e2e15b0d78cbdb0b2d1
Author: Andrew Or <[email protected]>
Date: 2015-04-22T03:00:30Z
Move RDD scope util methods and logic to its own file
Just a small code re-organization.
commit 5e22946945f683927cabafeb0ede3bc8e275e4a0
Author: Andrew Or <[email protected]>
Date: 2015-04-22T03:01:17Z
Merge branch 'master' of github.com:apache/spark into viz
commit 205f838477de8cabd28aab6301a67fd7d07bc517
Author: Andrew Or <[email protected]>
Date: 2015-04-23T05:33:31Z
Reimplement rendering with dagre-d3 instead of viz.js
Before this commit, this patch relies on a JavaScript version of
GraphViz that was compiled from C. Even the minified version of
this resource was ~2.5M. The main motivation for switching away
from this library, however, is that this is a complete black box
of which we have absolutely no control. It is not at all extensible,
and if something breaks we will have a hard time understanding
why.
The new library, dagre-d3, is not perfect either. It does not
officially support clustering of nodes; for certain large graphs,
the clusters will have a lot of unnecessary whitespace. A few in
the dagre-d3 community are looking into a solution, but until then
we will have to live with this (minor) inconvenience.
commit fe7816fe25c2f68ff2eee931ebe7a95b1cc97cdf
Author: Andrew Or <[email protected]>
Date: 2015-04-27T19:37:41Z
Merge branch 'master' of github.com:apache/spark into viz
commit 8dd5af265ee0c395c4c6d831ca697775d9e28104
Author: Andrew Or <[email protected]>
Date: 2015-04-27T21:50:45Z
Fill in documentation + miscellaneous minor changes
For instance, this adds ability to throw away old stage graphs.
commit 71281fa15d3bebac583e93ff84c5062f760b753d
Author: Andrew Or <[email protected]>
Date: 2015-04-27T22:40:52Z
Embed the viz in the UI in a toggleable manner
commit 09d361eb53a98d758891f3db39d8c9d4c239ee88
Author: Andrew Or <[email protected]>
Date: 2015-04-27T23:42:19Z
Add ID to node label (minor)
commit 52187fcfaafe8d9ac4531a4a76c2c79281d43f73
Author: Andrew Or <[email protected]>
Date: 2015-04-28T00:17:09Z
Rat excludes
commit c3bfcae2ae12e1ebc2a817df4eb9dca8fcce463f
Author: Andrew Or <[email protected]>
Date: 2015-04-27T23:21:04Z
Re-implement scopes using closures instead of annotations
The problem with annotations is that there is no way to associate
an RDD's scope with another's. This is because the stack trace
simply does not expose enough information for us to associate one
instance of a method invocation with another.
So, we're back to closures. Note that this still suffers from the
same not serializable issue previously discussed, and this is being
fixed in the ClosureCleaner separately.
commit aa868a98430fafa0c3227d34140d211c38549a1e
Author: Andrew Or <[email protected]>
Date: 2015-04-27T23:34:31Z
Ensure that HadoopRDD is actually serializable
commit 4310271e39bb67f489a18a5070374c71b8439c37
Author: Andrew Or <[email protected]>
Date: 2015-04-28T00:30:26Z
Merge branch 'master' of github.com:apache/spark into viz2
commit 7ef957cdfd1889f27dc9b4be81d22d15d4225eb9
Author: Andrew Or <[email protected]>
Date: 2015-04-28T00:31:19Z
Fix scala style
commit d19c4da59f126b5ee0126fbc88f19b2055e6f359
Author: Andrew Or <[email protected]>
Date: 2015-04-28T21:05:59Z
Merge branch 'master' of github.com:apache/spark into viz2
commit 6e2cfeae9db3b05ac836a229e888af1a54e4f9d3
Author: Andrew Or <[email protected]>
Date: 2015-04-29T00:40:30Z
Remove all return statements in `withScope`
The closure cleaner doesn't like these statements, for a good
reason.
commit 43de96ef71eb5e6ca81102c6e5a5f75b55cdebeb
Author: Andrew Or <[email protected]>
Date: 2015-04-29T00:42:43Z
Add parent IDs to StageInfo
commit 5e388ea6bf356c9700aeeb325429d27940788c5e
Author: Andrew Or <[email protected]>
Date: 2015-04-29T01:03:18Z
Fix line too long
commit 5f07e9c3f1ab16f4bf89606a9e3b2633be305df7
Author: Andrew Or <[email protected]>
Date: 2015-04-29T02:49:58Z
Remove more return statements from scopes
commit ab9141660cac4503309efa07f3b801e9216fc8b9
Author: Andrew Or <[email protected]>
Date: 2015-05-01T04:10:39Z
Introduce visualization to the Job Page
This includes a generalization of the visualization previously
displayed on the stage page. More functionality is needed in
JavaScript to prevent the job visualization from looking too
cluttered. This is still WIP.
commit 5c7ce164f8ba820daaa5e19dbaa8be166ac90e64
Author: Andrew Or <[email protected]>
Date: 2015-05-01T19:26:53Z
Connect RDDs across stages + update style
This requires us to track incoming and outgoing edges in each
stage on the backend, and render the connecting edges manually
ourselves in d3.
commit deb48a0d0580ecfef0ad1fb3b867ef365723785d
Author: Andrew Or <[email protected]>
Date: 2015-05-01T20:45:09Z
Translate stage boxes taking into account the width
Previously we had a lot of overlapping boxes for say ALS. This is
because we did not take into account of the widths of the previous
boxes.
commit 0706992a995d711c268fecab69da421b3dd12144
Author: Andrew Or <[email protected]>
Date: 2015-05-01T20:59:28Z
Add link from jobs to stages
commit b80cc52d81e6963d78b670299b40c8fc033f40e2
Author: Andrew Or <[email protected]>
Date: 2015-05-03T00:37:25Z
Merge branch 'master' of github.com:apache/spark into viz2
Conflicts:
core/src/main/scala/org/apache/spark/storage/RDDInfo.scala
core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala
core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala
commit f9830a2f09a2a8f55b3fd0309c46374f75a6501d
Author: Andrew Or <[email protected]>
Date: 2015-05-03T23:23:43Z
Refactor + clean up + document JS visualization code
This commit should not introduce any substantial functionality
differences. It just cleans up the JavaScript side of this patch
such that it is easier to follow.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]