GitHub user zhouyejoe opened a pull request:

    https://github.com/apache/spark/pull/19170

    [SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying 
history logs in Spark History Server

    ## What changes were proposed in this pull request?
    
    Filter out BlockStatus update data in logs when replaying. If the 
accumulable name matches the accumulableBlacklist, they will be filtered out. 
In this case, no more BlockStatus objects will be created in Spark History 
Server. SPARK-20084 adds a function called "accumulablesToJson", this patch 
adds a function called "accumulablesFromJson" to match.
    
    Remove the manual added BlockStatus updates in Unit test to pass the unit 
test.
    
    ## How was this patch tested?
    Unit test passed after the unit test change.
    Deployed in our production cluster, memory consumption drops a lot. Less 
than 5 short term Full GC happened when replaying more than 4K history logs, 
which has logs generated from 1.6.x and 2.1.0. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhouyejoe/spark SPARK-21961

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19170
    
----
commit 04c1e2aa24c61f13f1df5148416bb00f0649fcaf
Author: Ye Zhou <[email protected]>
Date:   2017-09-08T23:10:38Z

    [SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying 
history logs in Spark History Server

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to