[
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323980#comment-14323980
]
Junping Du commented on YARN-3194:
----------------------------------
I think NM after restarted will try to relaunch these running containers as
RecoveredContainers, and if it cannot locate the pid (assume container get
completed during NM downtime), it would report and trigger the complete of
containers. Do I miss anything here?
[~jlowe], I remember we discussed this case in some JIRA under YARN-1336, did
you see this problem before?
> After NM restart,completed containers are not released which are sent during
> NM registration
> --------------------------------------------------------------------------------------------
>
> Key: YARN-3194
> URL: https://issues.apache.org/jira/browse/YARN-3194
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Environment: NM restart is enabled
> Reporter: Rohith
> Assignee: Rohith
> Attachments: 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM
> process only ContainerState.RUNNING. If container is completed when NM was
> down then those containers resources wont be release which result in
> applications to hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)