GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/1255
[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI
The existing code in `ExecutorPage.scala` requires a linear scan through
all the blocks to filter out the uncached ones. Every refresh could be
expensive if there are many blocks and many executors.
The proper semantics should be the following: `StorageStatusListener`
should contain only block statuses that are cached. This means as soon as a
block is unpersisted by any means, its status should be removed. This is
reflected in the changes made in `StorageStatusListener.scala`.
The `StorageTab` must also be updated, because it currently detects dropped
blocks only if their storage levels change to `StorageLevel.NONE`, which no
longer happens, because now we simply remove their statuses.
If you have been following this chain of PRs like @pwendell, you will
quickly notice that this reverts the changes in #1249, which reverts the
changes in #1080. In other words, we are adding back the changes from #1080,
and fixing SPARK-2307 on top of those changes. Please ask questions if you are
confused.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark storage-ui-fix-reprise
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1255.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1255
----
commit 3afde3fff4e07296b7fc9ddab50a76e3a321bf53
Author: Andrew Or <[email protected]>
Date: 2014-06-28T06:01:25Z
Correctly report the number of blocks on SparkUI
This is actually quite tricky to get right. With this commit,
StorageStatusListener will only hold cached blocks (i.e. no blocks
with StorageLevel.NONE).
This means the StorageTab needs special handling, because it
currently relies on dropped blocks having StorageLevel.NONE, rather
than disappearing altogether in the storage status list.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---