[
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360737#comment-15360737
]
Junping Du commented on YARN-5296:
----------------------------------
I just did some tests today. Actually, the first patch involve another issue as
{{scheduler.shutdown()}} will affect later coming container metrics (exception
get thrown) as scheduler is marked as static to share with all objects. In v2
patch, cancel the individual task when container get finished which indeed fix
previous OOM issue from jmap dump analysis (attached screenshot).
> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---------------------------------------------------------------------------
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.9.0
> Reporter: Karam Singh
> Assignee: Junping Du
> Attachments: YARN-5296.patch
>
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check
> NM heap using Memory Analyser again 96% heap is being used by
> ContainerMetrics.
> 4. Start one more grimdmix run, while run going on , NMs started going down
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser,
> OOM was caused by ContainerMetrics
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]