GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/19679
[SPARK-20647][core] Port StorageTab to the new UI backend.
This required adding information about StreamBlockId to the store,
which is not available yet via the API. So an internal type was added
until there's a need to expose that information in the API.
The UI only lists RDDs that have cached partitions, and that information
wasn't being correctly captured in the listener, so that's also fixed,
along with some minor (internal) API adjustments so that the UI can
get the correct data.
Because of the way partitions are cached, some optimizations w.r.t. how
often the data is flushed to the store could not be applied to this code;
because of that, some different ways to make the code more performant
were added to the data structures tracking RDD blocks, with the goal of
avoiding expensive copies when lots of blocks are being updated.
Tested with existing and updated unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-20647
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19679.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19679
----
commit 7147bd241b8acd6a944d3bba9170f98f8233cc3b
Author: Marcelo Vanzin <[email protected]>
Date: 2017-01-30T22:48:30Z
[SPARK-20647][core] Port StorageTab to the new UI backend.
This required adding information about StreamBlockId to the store,
which is not available yet via the API. So an internal type was added
until there's a need to expose that information in the API.
The UI only lists RDDs that have cached partitions, and that information
wasn't being correctly captured in the listener, so that's also fixed,
along with some minor (internal) API adjustments so that the UI can
get the correct data.
Because of the way partitions are cached, some optimizations w.r.t. how
often the data is flushed to the store could not be applied to this code;
because of that, some different ways to make the code more performant
were added to the data structures tracking RDD blocks, with the goal of
avoiding expensive copies when lots of blocks are being updated.
Tested with existing and updated unit tests.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]