Gokul created YARN-4852:
---------------------------
Summary: Resource Manager Ran Out of Memory
Key: YARN-4852
URL: https://issues.apache.org/jira/browse/YARN-4852
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Gokul
Resource Manager went out of memory (max heap size: 8 GB, CMS GC) and shut down
itself.
Heap dump analysis reveals that 1200 instances of RMNodeImpl class hold 86% of
memory. When digged deep, there are around 0.5 million objects of
UpdatedContainerInfo (nodeUpdateQueue inside RMNodeImpl). This in turn contains
around 1.7 million objects of YarnProtos$ContainerIdProto,
ContainerStatusProto, ApplicationAttemptIdProto, ApplicationIdProto each of
which retain around 1 GB heap.
Full GC was triggered multiple times when RM went OOM and only 300 MB of heap
was released. So all these objects look like live objects.
RM's usual heap usage is around 4 GB but it suddenly spiked to 8 GB in 20 mins
time and went OOM.
There are no spike in job submissions, container numbers at the time of issue
occurrence.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)