[
https://issues.apache.org/jira/browse/YARN-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206873#comment-15206873
]
Jian He commented on YARN-4852:
-------------------------------
Suspect you may run into YARN-3487, in which case CS lock is hold and the
UpdatedContainerInfo gets piled up.
[~slukog], how often do you see this ? would you like to patch YARN-3487 and
give it a try ?
> Resource Manager Ran Out of Memory
> ----------------------------------
>
> Key: YARN-4852
> URL: https://issues.apache.org/jira/browse/YARN-4852
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Gokul
> Attachments: rm-thread-dump.txt
>
>
> Resource Manager went out of memory (max heap size: 8 GB, CMS GC) and shut
> down itself.
> Heap dump analysis reveals that 1200 instances of RMNodeImpl class hold 86%
> of memory. When digged deep, there are around 0.5 million objects of
> UpdatedContainerInfo (nodeUpdateQueue inside RMNodeImpl). This in turn
> contains around 1.7 million objects of YarnProtos$ContainerIdProto,
> ContainerStatusProto, ApplicationAttemptIdProto, ApplicationIdProto each of
> which retain around 1 GB heap.
> Full GC was triggered multiple times when RM went OOM and only 300 MB of heap
> was released. So all these objects look like live objects.
> RM's usual heap usage is around 4 GB but it suddenly spiked to 8 GB in 20
> mins time and went OOM.
> There are no spike in job submissions, container numbers at the time of issue
> occurrence.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)