[2/2] spark git commit: [SPARK-20657][CORE] Speed up rendering of the stages page.

2018-01-11 Thread wenchen
[SPARK-20657][CORE] Speed up rendering of the stages page.

There are two main changes to speed up rendering of the tasks list
when rendering the stage page.

The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.

To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.

The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.

This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).

To make subsequent calls faster, some quantiles are cached in the
status store. This makes UIs much faster after the first time a stage
has been loaded.

With the above changes, a lot of code in the UI layer could be simplified.

Author: Marcelo Vanzin 

Closes #20013 from vanzin/SPARK-20657.

(cherry picked from commit 1c70da3bfbb4016e394de2c73eb0db7cdd9a6968)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b7813012
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b7813012
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b7813012

Branch: refs/heads/branch-2.3
Commit: b78130123baba87554503e81b8aee3121666ba91
Parents: d9a973d
Author: Marcelo Vanzin 
Authored: Thu Jan 11 19:41:48 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Jan 11 19:42:19 2018 +0800

--
 .../org/apache/spark/util/kvstore/LevelDB.java  |   1 +
 .../apache/spark/status/AppStatusListener.scala |  57 +-
 .../apache/spark/status/AppStatusStore.scala| 389 +---
 .../apache/spark/status/AppStatusUtils.scala|  68 ++
 .../org/apache/spark/status/LiveEntity.scala| 344 ---
 .../spark/status/api/v1/StagesResource.scala|   3 +-
 .../org/apache/spark/status/api/v1/api.scala|   3 +
 .../org/apache/spark/status/storeTypes.scala| 327 ++-
 .../apache/spark/ui/jobs/ExecutorTable.scala|   4 +-
 .../org/apache/spark/ui/jobs/JobPage.scala  |   2 +-
 .../org/apache/spark/ui/jobs/StagePage.scala| 919 ++-
 ...summary_w__custom_quantiles_expectation.json |   3 +
 ...task_summary_w_shuffle_read_expectation.json |   3 +
 ...ask_summary_w_shuffle_write_expectation.json |   3 +
 .../spark/status/AppStatusListenerSuite.scala   | 105 ++-
 .../spark/status/AppStatusStoreSuite.scala  | 104 +++
 .../org/apache/spark/ui/StagePageSuite.scala|  10 +-
 scalastyle-config.xml   |   2 +-
 18 files changed, 1361 insertions(+), 986 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b7813012/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
--
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
index 4f9e10c..0e491ef 100644
--- a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
+++ b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
@@ -83,6 +83,7 @@ public class LevelDB implements KVStore {
 if (versionData != null) {
   long version = serializer.deserializeLong(versionData);
   if (version != STORE_VERSION) {
+close();
 throw new UnsupportedStoreVersionException();
   }
 } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/b7813012/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala 
b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
index 88b75dd..b4edcf2 100644
--- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
+++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
@@ 

[2/2] spark git commit: [SPARK-20657][CORE] Speed up rendering of the stages page.

2018-01-11 Thread wenchen
[SPARK-20657][CORE] Speed up rendering of the stages page.

There are two main changes to speed up rendering of the tasks list
when rendering the stage page.

The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.

To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.

The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.

This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).

To make subsequent calls faster, some quantiles are cached in the
status store. This makes UIs much faster after the first time a stage
has been loaded.

With the above changes, a lot of code in the UI layer could be simplified.

Author: Marcelo Vanzin 

Closes #20013 from vanzin/SPARK-20657.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c70da3b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c70da3b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c70da3b

Branch: refs/heads/master
Commit: 1c70da3bfbb4016e394de2c73eb0db7cdd9a6968
Parents: 87c98de
Author: Marcelo Vanzin 
Authored: Thu Jan 11 19:41:48 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Jan 11 19:41:48 2018 +0800

--
 .../org/apache/spark/util/kvstore/LevelDB.java  |   1 +
 .../apache/spark/status/AppStatusListener.scala |  57 +-
 .../apache/spark/status/AppStatusStore.scala| 389 +---
 .../apache/spark/status/AppStatusUtils.scala|  68 ++
 .../org/apache/spark/status/LiveEntity.scala| 344 ---
 .../spark/status/api/v1/StagesResource.scala|   3 +-
 .../org/apache/spark/status/api/v1/api.scala|   3 +
 .../org/apache/spark/status/storeTypes.scala| 327 ++-
 .../apache/spark/ui/jobs/ExecutorTable.scala|   4 +-
 .../org/apache/spark/ui/jobs/JobPage.scala  |   2 +-
 .../org/apache/spark/ui/jobs/StagePage.scala| 919 ++-
 ...summary_w__custom_quantiles_expectation.json |   3 +
 ...task_summary_w_shuffle_read_expectation.json |   3 +
 ...ask_summary_w_shuffle_write_expectation.json |   3 +
 .../spark/status/AppStatusListenerSuite.scala   | 105 ++-
 .../spark/status/AppStatusStoreSuite.scala  | 104 +++
 .../org/apache/spark/ui/StagePageSuite.scala|  10 +-
 scalastyle-config.xml   |   2 +-
 18 files changed, 1361 insertions(+), 986 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1c70da3b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
--
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
index 4f9e10c..0e491ef 100644
--- a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
+++ b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
@@ -83,6 +83,7 @@ public class LevelDB implements KVStore {
 if (versionData != null) {
   long version = serializer.deserializeLong(versionData);
   if (version != STORE_VERSION) {
+close();
 throw new UnsupportedStoreVersionException();
   }
 } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/1c70da3b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala 
b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
index 88b75dd..b4edcf2 100644
--- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
+++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
@@ -377,6 +377,10 @@ private[spark] class AppStatusListener(
 Option(liveStages.get((event.stageId, e