Junping Du commented on YARN-3225:

Thanks [~devaraj.k] for updating the patch! 
Sorry for coming late for some major comments:
In ResourceManagerAdministrationProtocol.java, 
+  public RefreshNodesGracefullyResponse refreshNodesGracefully(
+      RefreshNodesGracefullyRequest refreshNodesGracefullyRequest)
+      throws YarnException, IOException;
+  @Public
+  @Evolving
+  @Idempotent
+  public RefreshNodesForcefullyResponse refreshNodesForcefully(
+      RefreshNodesForcefullyRequest refreshNodesForcefullyRequest)
+      throws YarnException, IOException;
I think we don't have to add a new APIs here but can reuse existing 
refreshNodes(), we can add additional optional field (like boolean value) to 
RefreshNodesRequest to differentiate decommission immediately or with delay 
(gracefully). There should be no difference for decommission forcelly and 
previous decommission as there should be no side effect to decommission a 
decommissioned node (API marked with Idempotent). That could keep API much 

+  public CheckForDecommissioningNodesResponse checkForDecommissioningNodes(
+      CheckForDecommissioningNodesRequest checkForDecommissioningNodesRequest)
+      throws YarnException, IOException;
May be it is better to add getDecommissioningNodes() to return a list of 
decommissioning nodes instead of returning a boolean value here? We can print 
it out the decommissioning nodes that haven't finished (or a subset of them if 
large size) when hitting timeout at the end. That could be helpful for Admin to 
understand things going on there.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> -----------------------------------------------------------------------
>                 Key: YARN-3225
>                 URL: https://issues.apache.org/jira/browse/YARN-3225
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Junping Du
>            Assignee: Devaraj K
>         Attachments: YARN-3225-1.patch, YARN-3225.patch, YARN-914.patch
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.

This message was sent by Atlassian JIRA

Reply via email to