[
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Zhi updated YARN-4676:
-----------------------------
Attachment: YARN-4676.009.patch
Added YARN-4676.009.patch that contains following changes:
1. Use a single decom list in NodesListManager
2. Fix hadoop.yarn.server.resourcemanager.TestRMNodeTransitions
3. new name: yarn.resourcemanager.decommissioning.default.timeout
4. Define and use NM_EXIT_WAIT_MS;
5. fixed one ConcurrentModificationException
All other unit test errors do not appear to be related to my patch. Following
are local unit test status with (first column) and without (second column)
YARN-4676.009.patch:
PASS PASS hadoop.fs.shell.find.TestIname
PASS PASS hadoop.yarn.server.resourcemanager.TestRMNodeTransitions (FIXED)
PASS PASS
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
FAIL FAIL hadoop.yarn.server.resourcemanager.TestClientRMTokens
PASS PASS hadoop.yarn.server.resourcemanager.TestAMAuthorization
PASS PASS hadoop.yarn.client.TestGetGroups
PASS PASS org.apache.hadoop.util.TestNativeLibraryChecker
PASS PASS org.apache.hadoop.yarn.client.cli.TestYarnCLI
FAIL FAIL org.apache.hadoop.yarn.client.api.impl.TestYarnClient
PASS PASS org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
PASS PASS org.apache.hadoop.yarn.client.api.impl.TestNMClient
> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Zhi
> Assignee: Daniel Zhi
> Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch,
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch,
> YARN-4676.008.patch, YARN-4676.009.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks
> DECOMMISSIONING nodes status automatically and asynchronously after
> client/admin made the graceful decommission request. It tracks
> DECOMMISSIONING nodes status to decide when, after all running containers on
> the node have completed, will be transitioned into DECOMMISSIONED state.
> NodesListManager detect and handle include and exclude list changes to kick
> out decommission or recommission as necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)