GitHub user zhouyejoe opened a pull request:
https://github.com/apache/spark/pull/19170
[SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying
history logs in Spark History Server
## What changes were proposed in this pull request?
Filter out BlockStatus update data in logs when replaying. If the
accumulable name matches the accumulableBlacklist, they will be filtered out.
In this case, no more BlockStatus objects will be created in Spark History
Server. SPARK-20084 adds a function called "accumulablesToJson", this patch
adds a function called "accumulablesFromJson" to match.
Remove the manual added BlockStatus updates in Unit test to pass the unit
test.
## How was this patch tested?
Unit test passed after the unit test change.
Deployed in our production cluster, memory consumption drops a lot. Less
than 5 short term Full GC happened when replaying more than 4K history logs,
which has logs generated from 1.6.x and 2.1.0.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhouyejoe/spark SPARK-21961
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19170.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19170
----
commit 04c1e2aa24c61f13f1df5148416bb00f0649fcaf
Author: Ye Zhou <[email protected]>
Date: 2017-09-08T23:10:38Z
[SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying
history logs in Spark History Server
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]