[
https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565511#comment-16565511
]
Jason Lowe commented on YARN-8609:
----------------------------------
Thanks for the report and patch!
IMHO any truncation should not be tied to recovery, as the NM could OOM just
tracking container diagnostics. Recovery involves reloading what was already
in memory before the crash/restart. If the diagnostics of a container were 27M
in the recovery file then that means it was 27M in the NM heap before it
recovered as well.
Recovery does take more memory to recover than normal operations, and YARN-8242
and the work there will help reduce that load. Rather than forcing a rather
draconian truncation (27M to 5000 bytes is rather extreme), this should be a
configurable setting and applied when diagnostics are added to a container
rather than upon recovery. See ContainerImpl#addDiagnostics. Otherwise
reported container statuses will suddenly will change when the NM restarts and
that is counter to the goals of the NM recovery feature.
> NM oom because of large container statuses
> ------------------------------------------
>
> Key: YARN-8609
> URL: https://issues.apache.org/jira/browse/YARN-8609
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Xianghao Lu
> Priority: Major
> Attachments: YARN-8609.001.patch, contain_status.jpg, oom.jpeg
>
>
> Sometimes, NodeManger will send large container statuses to ResourceManager
> when NodeManger start with recovering, as a result , NodeManger will be
> failed to start because of oom.
> In my case, the large container statuses size is 135M, which contain 11
> container statuses, and I find the diagnostics of 5 containers are very
> large(27M), so, I truncate the container diagnostics as the patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]