[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohith updated YARN-3194: ------------------------- Attachment: 0001-YARN-3194.patch > After NM restart, RM should handle NMCotainerStatuses sent by NM while > registering if NM is Reconnected node > ------------------------------------------------------------------------------------------------------------ > > Key: YARN-3194 > URL: https://issues.apache.org/jira/browse/YARN-3194 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.0 > Environment: NM restart is enabled > Reporter: Rohith > Assignee: Rohith > Priority: Blocker > Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch > > > On NM restart ,NM sends all the outstanding NMContainerStatus to RM during > registration. The registration can be treated by RM as New node or > Reconnecting node. RM triggers corresponding event on the basis of node added > or node reconnected state. > # Node added event : Again here 2 scenario's can occur > ## New node is registering with different ip:port – NOT A PROBLEM > ## Old node is re-registering because of RESYNC command from RM when RM > restart – NOT A PROBLEM > # Node reconnected event : > ## Existing node is re-registering i.e RM treat it as reconnecting node when > RM is not restarted > ### NM RESTART NOT Enabled – NOT A PROBLEM > ### NM RESTART is Enabled > #### Some applications are running on this node – *Problem is here* > #### Zero applications are running on this node – NOT A PROBLEM > Since NMContainerStatus are not handled, RM never get to know about > completedContainer and never release resource held be containers. RM will not > allocate new containers for pending resource request as long as the > completedContainer event is triggered. This results in applications to wait > indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)