[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238478#comment-15238478
 ] 

Daniel Zhi commented on YARN-4676:
----------------------------------

1. I don't expect it will disappear by next patch but will focus on other 
issues first.
2. I will revert these two files (I didn't notice them due to my local diff 
tool skipped empty changes).
3. I will restore the resolve() (it was due to my manual merge).
4. Yes it will simplify the code.
5. refreshNodes(long timeout) basically remains unchanged. The client enforces 
a timeout which is not fully integrated with the automatic logic in RM side 
(NodesListManager uses the internal default timeout (3600 seconds)). Given the 
code checks status every second, it was likely expect a smaller timeout from 
command line. So the command line timeout experience would be same as before. A 
deeper integration is to pass the timeout through RefreshNodesRequest to 
NodesListManager to honor it. The client-side wait-and-check can still be there 
but no need to FORCEFUL decommission as it supposes to happen automatically.
6. I am surprised that update() no longer throw exception (maybe the code 
evolved since original version). So I will remove updateNoThrow() (and will log 
full exception in readDecommissioningTimeout).
7. I will add synchronized. It will be called by every node during every 
heartbeat. But the implementation is efficient enough to not have contention 
due to synchronized. 
8. Is there a list on what "docs" include?


> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to