GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/21532
[SPARK-24524][SQL]Improve aggregateMetrics: reduce memory usage and number
of loops
## What changes were proposed in this pull request?
The function `aggregateMetrics` process metrics from both executors and
driver. The data can be large.
This PR is to improve the implementation with one loop(before converting to
string) and one dynamic data structure.
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark aggMetrics
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21532.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21532
----
commit 0ce71c09bf5593c16e0eff5ae6e4aa3bd4c6ca26
Author: Gengliang Wang <gengliang.wang@...>
Date: 2018-06-11T21:32:11Z
Improve aggregateMetrics with less memory usage and loops
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]