[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256516#comment-15256516
 ] 

Junping Du commented on YARN-4676:
----------------------------------

Hi [~danzhi], thanks for the patch update. Sorry for coming late in review, I 
just quickly go through the patch and have some high level comments so far:
1. In discussion of YARN-914 (umbrella for graceful decommission), we are 
proposing two ways to track node's decommission timeout. One is in command line 
side (YARN-3225) and the other in RM side (patch here). I think both way have 
pros and cons, and can useful in different condition. However, in your patch, 
it looks like you remove the client side track of timeout value. Can we instead 
to have a configuration to configure which way we are using to track - CLI side 
or RM side and keep both logic here?

2. If we need to track the node decommissioning progress in RM side, then we 
need to make sure we don't lose tracking work during RM restart with work 
preserving. I didn't see related code in your patch and attached proposal - 
that's something we need to propose a solution. (CLI side has RMProxy for 
handling this case)

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to