[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Zhi updated YARN-4676:
-----------------------------
    Description: 
YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
graceful decommission
YARN nodes. After user issues the refreshNodes request, ResourceManager 
automatically evaluates
status of all affected nodes to kicks out decommission or recommission actions. 
RM asynchronously
tracks container and application status related to DECOMMISSIONING nodes to 
decommission the
nodes immediately after there are ready to be decommissioned. Decommissioning 
timeout at individual
nodes granularity is supported and could be dynamically updated. The mechanism 
naturally supports multiple
independent graceful decommissioning “sessions” where each one involves 
different sets of nodes with
different timeout settings. Such support is ideal and necessary for graceful 
decommission request issued
by external cluster management software instead of human.

DecommissioningNodeWatcher inside ResourceTrackingService tracks 
DECOMMISSIONING nodes status automatically and asynchronously after 
client/admin made the graceful decommission request. It tracks DECOMMISSIONING 
nodes status to decide when, after all running containers on the node have 
completed, will be transitioned into DECOMMISSIONED state. NodesListManager 
detect and handle include and exclude list changes to kick out decommission or 
recommission as necessary.

  was:
YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
graceful decommission
YARN nodes. After user issues the refreshNodes request, ResourceManager 
automatically evaluates
status of all affected nodes to kick out decommission or recommission actions. 
RM asynchronously
tracks container and application status related to DECOMMISSIONING nodes to 
decommission the
nodes immediately after there are ready to be decommissioned. Decommissioning 
timeout at individual
nodes granularity is supported and is dynamically updatable. The logic 
naturally supports multiple
independent graceful decommissioning “sessions” where each one involves 
different sets of nodes with
different timeout settings. Such support is ideal and necessary for graceful 
decommission request issued
by external cluster management software instead of human.

DecommissioningNodeWatcher inside ResourceTrackingService tracks 
DECOMMISSIONING nodes status automatically and asynchronously after 
client/admin made the graceful decommission request. It tracks DECOMMISSIONING 
nodes status to decide when, after all running containers on the node have 
completed, will be transitioned into DECOMMISSIONED state. NodesListManager 
detect and handle include and exclude list changes to kick out decommission or 
recommission as necessary.


> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to