zhihai xu commented on YARN-4344:

Thanks for reporting this issue [~vvasudev]! Thanks for the review [~Jason 
[~rohithsharma] tried to clean up the code at YARN-3286. Based on the following 
comment from [~jianhe] at YARN-3286,
I think this has changed the behavior that without any RM/NM restart features 
enabled, earlier restarting a node will trigger RM to kill all the containers 
on this node, but now it won't ?
The patch may cause compatibility issue. Maybe we can merge the case 
{{rmNode.getHttpPort() == newNode.getHttpPort()}} with {{rmNode.getHttpPort() 
!= newNode.getHttpPort()}} for noRunningApps.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> ------------------------------------------------------------------------------------------
>                 Key: YARN-4344
>                 URL: https://issues.apache.org/jira/browse/YARN-4344
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>            Priority: Critical
>         Attachments: YARN-4344.001.patch
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.

This message was sent by Atlassian JIRA

Reply via email to