[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001800#comment-15001800
 ] 

zhihai xu commented on YARN-4344:
---------------------------------

Thanks for reporting this issue [~vvasudev]! Thanks for the review [~Jason 
Lowe]! 
[~rohithsharma] tried to clean up the code at YARN-3286. Based on the following 
comment from [~jianhe] at YARN-3286,
{code}
I think this has changed the behavior that without any RM/NM restart features 
enabled, earlier restarting a node will trigger RM to kill all the containers 
on this node, but now it won't ?
{code}
The patch may cause compatibility issue. Maybe we can merge the case 
{{rmNode.getHttpPort() == newNode.getHttpPort()}} with {{rmNode.getHttpPort() 
!= newNode.getHttpPort()}} for noRunningApps.
Thoughts?

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4344
>                 URL: https://issues.apache.org/jira/browse/YARN-4344
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>            Priority: Critical
>         Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to