[
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238478#comment-15238478
]
Daniel Zhi commented on YARN-4676:
----------------------------------
1. I don't expect it will disappear by next patch but will focus on other
issues first.
2. I will revert these two files (I didn't notice them due to my local diff
tool skipped empty changes).
3. I will restore the resolve() (it was due to my manual merge).
4. Yes it will simplify the code.
5. refreshNodes(long timeout) basically remains unchanged. The client enforces
a timeout which is not fully integrated with the automatic logic in RM side
(NodesListManager uses the internal default timeout (3600 seconds)). Given the
code checks status every second, it was likely expect a smaller timeout from
command line. So the command line timeout experience would be same as before. A
deeper integration is to pass the timeout through RefreshNodesRequest to
NodesListManager to honor it. The client-side wait-and-check can still be there
but no need to FORCEFUL decommission as it supposes to happen automatically.
6. I am surprised that update() no longer throw exception (maybe the code
evolved since original version). So I will remove updateNoThrow() (and will log
full exception in readDecommissioningTimeout).
7. I will add synchronized. It will be called by every node during every
heartbeat. But the implementation is efficient enough to not have contention
due to synchronized.
8. Is there a list on what "docs" include?
> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Zhi
> Assignee: Daniel Zhi
> Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch,
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch,
> YARN-4676.008.patch, YARN-4676.009.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks
> DECOMMISSIONING nodes status automatically and asynchronously after
> client/admin made the graceful decommission request. It tracks
> DECOMMISSIONING nodes status to decide when, after all running containers on
> the node have completed, will be transitioned into DECOMMISSIONED state.
> NodesListManager detect and handle include and exclude list changes to kick
> out decommission or recommission as necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)