Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/spark/pull/11800#issuecomment-212540659
  
    Even before this change we were getting OOM errors. The issue primarily 
seems to be creation of lot of young objects. In addition to this fix we also 
moved to G1 gc and we are using -XX:NewRatio=1 to allocate half the space to 
Eden.
    
    We have deployed this fix in production since a week and we have observed 
one OOM crash. The heap dump is 12GB and I am still analyzing it but initial 
analysis again points at lot of string,char[] instances being created. If you 
are interested I can share the heap dump.
    
    Overall one of the big issue is during startup history server tries to load 
all the logs available ( with default 7  day retention) which in a large multi 
tenant cluster like ours is a lot of files. Most users won't really click 
through their application but deleting the event log too early is also not a 
good option. Ideally I would propose that history server creates simple summary 
files (needed to actually show the application summary on UI) so the next time 
history server starts it does not need to process entire event log but only a 
summary file. Only when a user clicks on the application we need to process the 
entire event log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to