[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3194:
-----------------------------
    Fix Version/s: 2.6.2

I committed this to branch-2.6 as well.

> RM should handle NMContainerStatuses sent by NM while registering if NM is 
> Reconnected node
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: NM restart is enabled
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Blocker
>             Fix For: 2.7.0, 2.6.2
>
>         Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
> #### Some applications are running on this node – *Problem is here*
> #### Zero applications are running on this node – NOT A PROBLEM
> Since NMContainerStatus are not handled, RM never get to know about 
> completedContainer and never release resource held be containers. RM will not 
> allocate new containers for pending resource request as long as the 
> completedContainer event is triggered. This results in applications to wait 
> indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to