[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323980#comment-14323980
 ] 

Junping Du commented on YARN-3194:
----------------------------------

I think NM after restarted will try to relaunch these running containers as 
RecoveredContainers, and if it cannot locate the pid (assume container get 
completed during NM downtime), it would report and trigger the complete of 
containers. Do I miss anything here? 
[~jlowe], I remember we discussed this case in some JIRA under YARN-1336, did 
you see this problem before?

> After NM restart,completed containers are not released which are sent during 
> NM registration
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: NM restart is enabled
>            Reporter: Rohith
>            Assignee: Rohith
>         Attachments: 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
> process only ContainerState.RUNNING. If container is completed when NM was 
> down then those containers resources wont be release which result in 
> applications to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to