[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325212#comment-14325212
 ] 

Junping Du commented on YARN-3194:
----------------------------------

bq. I didn't see this problem originally, but I suspect it was because there 
were two things that masked it. As mentioned above, this problem doesn't 
manifest before YARN-2997. In addition, I was testing it with MapReduce 
applications, and the MR AM will explicitly kill containers for tasks that have 
completed (as reported by the umbilical connection between the AM and tasks).
I see. I think that's why we didn't notice this issue before. However, this bug 
should happen after YARN-2997, so we should mark affected version to be 2.7.

bq. I don't see why we would process container status sent during a reconnect 
differently than a regular status update from the NM.
I think we can do some code refactor work here. However, I think two things 
could be different between reconnect and regular resource update: 1. Port 
number could be changed (use ephemeral port when disable NM work preserving); 
2. Resource could be updated (assume NM's resource could be updated before). 
Isn't it?

> After NM restart,completed containers are not released by RM which are sent 
> during NM registration
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: NM restart is enabled
>            Reporter: Rohith
>            Assignee: Rohith
>         Attachments: 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
> process only ContainerState.RUNNING. If container is completed when NM was 
> down then those containers resources wont be release which result in 
> applications to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to