[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359239#comment-14359239
 ] 

Junping Du commented on YARN-3225:
----------------------------------

Nice discussion, [~devaraj.k]!
bq. If there are some long running containers in the NM and RMAdmin CLI gets 
terminated before issuing forceful decommission then the NM could in the 
“DECOMMISSIONING” state irrespective of timeout. AM I missing anything?
If users terminate the blocking/pending CLI, then it only means they want to 
track timeout themselves or they want to adjust timeout value ahead or delay. 
In this case, the decommissioning nodes either get decommissioned when app 
finished (a clean quit), or wait user to decommission again later. We can add 
some alert messages later if some nodes are in decommissioning stage for really 
long time. The basic idea is we agree to not tracking timeout in RM side for 
each individual nodes. 

bq.  If we don't pass timeout to RM then how are we going to achieve this? You 
mean this will be handled later, once the basic things are done.
You are right that timeout value could be useful to pass down to AM for 
preemption containers (however, no any effect on terminating nodes). Let's keep 
it here and we can leverage it later when we are notifying AM.

bq. For making timeout longer, if we use new CLI then there is a chance of 
forceful decommission happening with the old CLI timeout. Is there any 
constraint like this needs to be done with the same CLI?
Not quite understanding the case described here. Users should terminate the 
current CLI and launch a new CLI for adjusted timeout values if they want to 
wait shorter or longer. If it already passed previous timeout values, current 
CLI should quit already with all nodes decommissioned. Am I missing something 
here?

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> -----------------------------------------------------------------------
>
>                 Key: YARN-3225
>                 URL: https://issues.apache.org/jira/browse/YARN-3225
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Junping Du
>            Assignee: Devaraj K
>         Attachments: YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to