[
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379860#comment-14379860
]
Junping Du commented on YARN-3225:
----------------------------------
Thanks [~devaraj.k] for updating the patch!
Sorry for coming late for some major comments:
In ResourceManagerAdministrationProtocol.java,
{code}
+
+ public RefreshNodesGracefullyResponse refreshNodesGracefully(
+ RefreshNodesGracefullyRequest refreshNodesGracefullyRequest)
+ throws YarnException, IOException;
+
+ @Public
+ @Evolving
+ @Idempotent
+ public RefreshNodesForcefullyResponse refreshNodesForcefully(
+ RefreshNodesForcefullyRequest refreshNodesForcefullyRequest)
+ throws YarnException, IOException;
{code}
I think we don't have to add a new APIs here but can reuse existing
refreshNodes(), we can add additional optional field (like boolean value) to
RefreshNodesRequest to differentiate decommission immediately or with delay
(gracefully). There should be no difference for decommission forcelly and
previous decommission as there should be no side effect to decommission a
decommissioned node (API marked with Idempotent). That could keep API much
simpler.
{code}
+ public CheckForDecommissioningNodesResponse checkForDecommissioningNodes(
+ CheckForDecommissioningNodesRequest checkForDecommissioningNodesRequest)
+ throws YarnException, IOException;
{code}
May be it is better to add getDecommissioningNodes() to return a list of
decommissioning nodes instead of returning a boolean value here? We can print
it out the decommissioning nodes that haven't finished (or a subset of them if
large size) when hitting timeout at the end. That could be helpful for Admin to
understand things going on there.
> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> -----------------------------------------------------------------------
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Junping Du
> Assignee: Devaraj K
> Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on
> decommission list to decommissioning status and track timeout to terminate
> the nodes that haven't get finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)