[ https://issues.apache.org/jira/browse/YARN-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208283#comment-15208283 ]
Rohith Sharma K S commented on YARN-4852: ----------------------------------------- I was justifying how without YARN-3487 might cause oom. There could be other reason causing for nodeUpdate queue pill up which need to be analysed. For leaving out a suspect of YARN-3487, apply the patch in the cluster. If issue occur again it is easy to focus on particular area. > Resource Manager Ran Out of Memory > ---------------------------------- > > Key: YARN-4852 > URL: https://issues.apache.org/jira/browse/YARN-4852 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Gokul > Attachments: threadDump.log > > > Resource Manager went out of memory (max heap size: 8 GB, CMS GC) and shut > down itself. > Heap dump analysis reveals that 1200 instances of RMNodeImpl class hold 86% > of memory. When digged deep, there are around 0.5 million objects of > UpdatedContainerInfo (nodeUpdateQueue inside RMNodeImpl). This in turn > contains around 1.7 million objects of YarnProtos$ContainerIdProto, > ContainerStatusProto, ApplicationAttemptIdProto, ApplicationIdProto each of > which retain around 1 GB heap. > Full GC was triggered multiple times when RM went OOM and only 300 MB of heap > was released. So all these objects look like live objects. > RM's usual heap usage is around 4 GB but it suddenly spiked to 8 GB in 20 > mins time and went OOM. > There are no spike in job submissions, container numbers at the time of issue > occurrence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)