[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170235#comment-14170235 ]
Hudson commented on YARN-2641: ------------------------------ FAILURE: Integrated in Hadoop-trunk-Commit #6254 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6254/]) YARN-2641. Decommission nodes on -refreshNodes instead of next NM-RM heartbeat. (Zhihai Xu via kasha) (kasha: rev da709a2eac7110026169ed3fc4d0eaf85488d3ef) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java > Decommission nodes on -refreshNodes instead of next NM-RM heartbeat > ------------------------------------------------------------------- > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Affects Versions: 2.5.0 > Reporter: zhihai xu > Assignee: zhihai xu > Fix For: 2.7.0 > > Attachments: YARN-2641.000.patch, YARN-2641.001.patch, > YARN-2641.002.patch, YARN-2641.003.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). > This will be a much more serious issue: > After RM is refreshed (refreshNodes), If the NM to be decommissioned is > killed before NM sent heartbeat to RM. The RMNode will never be > decommissioned in RM. The RMNode will only expire in RM after > "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)