[
https://issues.apache.org/jira/browse/YARN-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820113#comment-13820113
]
Junping Du commented on YARN-558:
---------------------------------
+1 on use case in cloud services. I think one feasible way to achieve this
(although not convenient) now is:
- first, decommission nodes by putting them on decommission list and call
refreshNodes().
- then, wait at least one heartbeat() of each nodes to make sure decommissioned
nodes are clear
- at last, remove nodes from decommission list and refreshNodes() again.
We do need something simpler.
> Add ability to completely remove nodemanager from resourcemanager.
> ------------------------------------------------------------------
>
> Key: YARN-558
> URL: https://issues.apache.org/jira/browse/YARN-558
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Reporter: Garth Goodson
> Priority: Minor
> Labels: feature
>
> I would like to add the ability to completely remove a nodemanager from the
> resourcemanager's state.
> I run a cloud service where I want to dynamically bring up nodes to act as
> nodemanagers and then bring them down again when not needed. These nodes
> have dynamically assigned IPs, thus the alternative of decommissioning them
> via an excludes file leads to a large (unbounded) list of decommissioned
> nodes that may never be commissioned again. I would like the ability to move
> a node from a decommissioned state to completely removing it from the
> resource manager.
> I have thought of two ways of implementing this.
> 1) Add an optional timeout between the decommission state -> being removed
> from the nodemanager.
> 2) Add an explicit RPC to remove a node that is decommissioned.
> Any additional thoughts/discussion are welcome.
--
This message was sent by Atlassian JIRA
(v6.1#6144)