[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3194:
-------------------------
    Description: 
On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
registration. The registration can be treated by RM as New node or Reconnecting 
node. RM triggers corresponding event on the basis of node added or node 
reconnected state. 
# Node added event : Again here 2 scenario's can occur 
## New node is registering with different ip:port – NOT A PROBLEM
## Old node is re-registering because of RESYNC command from RM when RM restart 
– NOT A PROBLEM

# Node reconnected event : 
## Existing node is re-registering i.e RM treat it as reconnecting node when RM 
is not restarted 
### NM RESTART NOT Enabled – NOT A PROBLEM
### NM RESTART is Enabled 
#### Some applications are running on this node – *Problem is here*
#### Zero applications are running on this node – NOT A PROBLEM

  was:On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But 
RM process only ContainerState.RUNNING. If container is completed when NM was 
down then those containers resources wont be release which result in 
applications to hang.


> After NM restart, RM should handle NMCotainerStatuses sent by NM while 
> registering if NM is Reconnected node
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: NM restart is enabled
>            Reporter: Rohith
>            Assignee: Rohith
>            Priority: Blocker
>         Attachments: 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
> #### Some applications are running on this node – *Problem is here*
> #### Zero applications are running on this node – NOT A PROBLEM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to