Devaraj K commented on YARN-3225:

Thanks [~djp] for explanation.

If there are some long running containers in the NM and RMAdmin CLI gets 
terminated before issuing forceful decommission then the NM could in the 
“DECOMMISSIONING” state irrespective of timeout. AM I missing anything?

bq. AMs who has running containers on this NM will get noticed with preemption 
framework (with timeout), especially AM itself run against on this node. 
If we don't pass timeout to RM then how are we going to achieve this? You mean 
this will be handled later, once the basic things are done.

bq. the timeout can be updated to shorter or longer.
With RMAdmin CLI handling timeout, If we want to make the timeout shorter then 
we can issue the command in new/same CLI with the shorter timeout, it would be 
fine. For making timeout longer, if we use new CLI then there is a chance of 
forceful decommission happening with the old CLI timeout. Is there any 
constraint like this needs to be done with the same CLI?

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> -----------------------------------------------------------------------
>                 Key: YARN-3225
>                 URL: https://issues.apache.org/jira/browse/YARN-3225
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Junping Du
>            Assignee: Devaraj K
>         Attachments: YARN-3225.patch, YARN-914.patch
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.

This message was sent by Atlassian JIRA

Reply via email to