[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ye Zhou resolved SPARK-21961. ----------------------------- Resolution: Won't Fix > Filter out BlockStatuses Accumulators during replaying history logs in Spark > History Server > ------------------------------------------------------------------------------------------- > > Key: SPARK-21961 > URL: https://issues.apache.org/jira/browse/SPARK-21961 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.0, 2.2.0 > Reporter: Ye Zhou > Priority: Major > Attachments: Objects_Count_in_Heap.png, One_Thread_Took_24GB.png > > > As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of > memory in Driver. Recently we also noticed the same issue in Spark History > Server. Even though in SPARK-20084, those event logs are getting removed from > history log. But multiple versions of Spark including 1.6.x and 2.1.0 > versions are deployed in our production cluster, none of them have these two > patches included. > In this case, those event logs will still be in shown up in logs and Spark > History Server will replay them. Spark History Server continuously get severe > Full GCs even though we tried to limit cache size as well as enlarge the > heapsize to 40GB. We also tried with different GC tuning parameters, like > using CMS or G1GC. None of them works. > We made a heap dump, and found that the top memory consumer objects is > BlockStatus. There was even one thread that took 23GB heap which was > replaying one log file. > Since the former two tickets has resolved related issues in both driver and > writing to history logs, we should also consider add this filter to Spark > History Server in order to decrease the memory consumption for replaying one > history log. For use cases like us, where we have multiple older versions of > Spark deployed, this filter should be pretty useful. > We have deployed our Spark History Server with this filter which works fine > in our production cluster, which has processed thousands of logs and only got > several full GC in total. > !https://issues.apache.org/jira/secure/attachment/12886191/Objects_Count_in_Heap.png! > !https://issues.apache.org/jira/secure/attachment/12886190/One_Thread_Took_24GB.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org