[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001800#comment-15001800 ]
zhihai xu commented on YARN-4344: --------------------------------- Thanks for reporting this issue [~vvasudev]! Thanks for the review [~Jason Lowe]! [~rohithsharma] tried to clean up the code at YARN-3286. Based on the following comment from [~jianhe] at YARN-3286, {code} I think this has changed the behavior that without any RM/NM restart features enabled, earlier restarting a node will trigger RM to kill all the containers on this node, but now it won't ? {code} The patch may cause compatibility issue. Maybe we can merge the case {{rmNode.getHttpPort() == newNode.getHttpPort()}} with {{rmNode.getHttpPort() != newNode.getHttpPort()}} for noRunningApps. Thoughts? > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > ------------------------------------------------------------------------------------------ > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.1, 2.6.2 > Reporter: Varun Vasudev > Assignee: Varun Vasudev > Priority: Critical > Attachments: YARN-4344.001.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)