[2/2] spark git commit: [SPARK-20657][CORE] Speed up rendering of the stages page.
[SPARK-20657][CORE] Speed up rendering of the stages page. There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIs much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified. Author: Marcelo Vanzin Closes #20013 from vanzin/SPARK-20657. (cherry picked from commit 1c70da3bfbb4016e394de2c73eb0db7cdd9a6968) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b7813012 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b7813012 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b7813012 Branch: refs/heads/branch-2.3 Commit: b78130123baba87554503e81b8aee3121666ba91 Parents: d9a973d Author: Marcelo Vanzin Authored: Thu Jan 11 19:41:48 2018 +0800 Committer: Wenchen Fan Committed: Thu Jan 11 19:42:19 2018 +0800 -- .../org/apache/spark/util/kvstore/LevelDB.java | 1 + .../apache/spark/status/AppStatusListener.scala | 57 +- .../apache/spark/status/AppStatusStore.scala| 389 +--- .../apache/spark/status/AppStatusUtils.scala| 68 ++ .../org/apache/spark/status/LiveEntity.scala| 344 --- .../spark/status/api/v1/StagesResource.scala| 3 +- .../org/apache/spark/status/api/v1/api.scala| 3 + .../org/apache/spark/status/storeTypes.scala| 327 ++- .../apache/spark/ui/jobs/ExecutorTable.scala| 4 +- .../org/apache/spark/ui/jobs/JobPage.scala | 2 +- .../org/apache/spark/ui/jobs/StagePage.scala| 919 ++- ...summary_w__custom_quantiles_expectation.json | 3 + ...task_summary_w_shuffle_read_expectation.json | 3 + ...ask_summary_w_shuffle_write_expectation.json | 3 + .../spark/status/AppStatusListenerSuite.scala | 105 ++- .../spark/status/AppStatusStoreSuite.scala | 104 +++ .../org/apache/spark/ui/StagePageSuite.scala| 10 +- scalastyle-config.xml | 2 +- 18 files changed, 1361 insertions(+), 986 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b7813012/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java -- diff --git a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java index 4f9e10c..0e491ef 100644 --- a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java +++ b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java @@ -83,6 +83,7 @@ public class LevelDB implements KVStore { if (versionData != null) { long version = serializer.deserializeLong(versionData); if (version != STORE_VERSION) { +close(); throw new UnsupportedStoreVersionException(); } } else { http://git-wip-us.apache.org/repos/asf/spark/blob/b7813012/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala index 88b75dd..b4edcf2 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala @@
[2/2] spark git commit: [SPARK-20657][CORE] Speed up rendering of the stages page.
[SPARK-20657][CORE] Speed up rendering of the stages page. There are two main changes to speed up rendering of the tasks list when rendering the stage page. The first one makes the code only load the tasks being shown in the current page of the tasks table, and information related to only those tasks. One side-effect of this change is that the graph that shows task-related events now only shows events for the tasks in the current page, instead of the previously hardcoded limit of "events for the first 1000 tasks". That ends up helping with readability, though. To make sorting efficient when using a disk store, the task wrapper was extended to include many new indices, one for each of the sortable columns in the UI, and metrics for which quantiles are calculated. The second changes the way metric quantiles are calculated for stages. Instead of using the "Distribution" class to process data for all task metrics, which requires scanning all tasks of a stage, the code now uses the KVStore "skip()" functionality to only read tasks that contain interesting information for the quantiles that are desired. This is still not cheap; because there are many metrics that the UI and API track, the code needs to scan the index for each metric to gather the information. Savings come mainly from skipping deserialization when using the disk store, but the in-memory code also seems to be faster than before (most probably because of other changes in this patch). To make subsequent calls faster, some quantiles are cached in the status store. This makes UIs much faster after the first time a stage has been loaded. With the above changes, a lot of code in the UI layer could be simplified. Author: Marcelo Vanzin Closes #20013 from vanzin/SPARK-20657. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c70da3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c70da3b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c70da3b Branch: refs/heads/master Commit: 1c70da3bfbb4016e394de2c73eb0db7cdd9a6968 Parents: 87c98de Author: Marcelo Vanzin Authored: Thu Jan 11 19:41:48 2018 +0800 Committer: Wenchen Fan Committed: Thu Jan 11 19:41:48 2018 +0800 -- .../org/apache/spark/util/kvstore/LevelDB.java | 1 + .../apache/spark/status/AppStatusListener.scala | 57 +- .../apache/spark/status/AppStatusStore.scala| 389 +--- .../apache/spark/status/AppStatusUtils.scala| 68 ++ .../org/apache/spark/status/LiveEntity.scala| 344 --- .../spark/status/api/v1/StagesResource.scala| 3 +- .../org/apache/spark/status/api/v1/api.scala| 3 + .../org/apache/spark/status/storeTypes.scala| 327 ++- .../apache/spark/ui/jobs/ExecutorTable.scala| 4 +- .../org/apache/spark/ui/jobs/JobPage.scala | 2 +- .../org/apache/spark/ui/jobs/StagePage.scala| 919 ++- ...summary_w__custom_quantiles_expectation.json | 3 + ...task_summary_w_shuffle_read_expectation.json | 3 + ...ask_summary_w_shuffle_write_expectation.json | 3 + .../spark/status/AppStatusListenerSuite.scala | 105 ++- .../spark/status/AppStatusStoreSuite.scala | 104 +++ .../org/apache/spark/ui/StagePageSuite.scala| 10 +- scalastyle-config.xml | 2 +- 18 files changed, 1361 insertions(+), 986 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1c70da3b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java -- diff --git a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java index 4f9e10c..0e491ef 100644 --- a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java +++ b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java @@ -83,6 +83,7 @@ public class LevelDB implements KVStore { if (versionData != null) { long version = serializer.deserializeLong(versionData); if (version != STORE_VERSION) { +close(); throw new UnsupportedStoreVersionException(); } } else { http://git-wip-us.apache.org/repos/asf/spark/blob/1c70da3b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala index 88b75dd..b4edcf2 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala @@ -377,6 +377,10 @@ private[spark] class AppStatusListener( Option(liveStages.get((event.stageId, e