[ https://issues.apache.org/jira/browse/YARN-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820113#comment-13820113 ]
Junping Du commented on YARN-558: --------------------------------- +1 on use case in cloud services. I think one feasible way to achieve this (although not convenient) now is: - first, decommission nodes by putting them on decommission list and call refreshNodes(). - then, wait at least one heartbeat() of each nodes to make sure decommissioned nodes are clear - at last, remove nodes from decommission list and refreshNodes() again. We do need something simpler. > Add ability to completely remove nodemanager from resourcemanager. > ------------------------------------------------------------------ > > Key: YARN-558 > URL: https://issues.apache.org/jira/browse/YARN-558 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Reporter: Garth Goodson > Priority: Minor > Labels: feature > > I would like to add the ability to completely remove a nodemanager from the > resourcemanager's state. > I run a cloud service where I want to dynamically bring up nodes to act as > nodemanagers and then bring them down again when not needed. These nodes > have dynamically assigned IPs, thus the alternative of decommissioning them > via an excludes file leads to a large (unbounded) list of decommissioned > nodes that may never be commissioned again. I would like the ability to move > a node from a decommissioned state to completely removing it from the > resource manager. > I have thought of two ways of implementing this. > 1) Add an optional timeout between the decommission state -> being removed > from the nodemanager. > 2) Add an explicit RPC to remove a node that is decommissioned. > Any additional thoughts/discussion are welcome. -- This message was sent by Atlassian JIRA (v6.1#6144)