[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node

2015-10-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3194:
-
Fix Version/s: 2.6.2

I committed this to branch-2.6 as well.

> RM should handle NMContainerStatuses sent by NM while registering if NM is 
> Reconnected node
> ---
>
> Key: YARN-3194
> URL: https://issues.apache.org/jira/browse/YARN-3194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: NM restart is enabled
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 2.7.0, 2.6.2
>
> Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
>  Some applications are running on this node – *Problem is here*
>  Zero applications are running on this node – NOT A PROBLEM
> Since NMContainerStatus are not handled, RM never get to know about 
> completedContainer and never release resource held be containers. RM will not 
> allocate new containers for pending resource request as long as the 
> completedContainer event is triggered. This results in applications to wait 
> indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3194:
-
Summary: RM should handle NMContainerStatuses sent by NM while registering 
if NM is Reconnected node  (was: After NM restart, RM should handle 
NMCotainerStatuses sent by NM while registering if NM is Reconnected node)

 RM should handle NMContainerStatuses sent by NM while registering if NM is 
 Reconnected node
 ---

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
 registration. The registration can be treated by RM as New node or 
 Reconnecting node. RM triggers corresponding event on the basis of node added 
 or node reconnected state. 
 # Node added event : Again here 2 scenario's can occur 
 ## New node is registering with different ip:port – NOT A PROBLEM
 ## Old node is re-registering because of RESYNC command from RM when RM 
 restart – NOT A PROBLEM
 # Node reconnected event : 
 ## Existing node is re-registering i.e RM treat it as reconnecting node when 
 RM is not restarted 
 ### NM RESTART NOT Enabled – NOT A PROBLEM
 ### NM RESTART is Enabled 
  Some applications are running on this node – *Problem is here*
  Zero applications are running on this node – NOT A PROBLEM
 Since NMContainerStatus are not handled, RM never get to know about 
 completedContainer and never release resource held be containers. RM will not 
 allocate new containers for pending resource request as long as the 
 completedContainer event is triggered. This results in applications to wait 
 indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)