[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

Arun Suresh (JIRA) Fri, 29 Sep 2017 13:57:18 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arun Suresh updated YARN-5465:
------------------------------
    Target Version/s: 3.1.0  (was: 2.9.0)

> Server-Side NM Graceful Decommissioning subsequent call behavior
> ----------------------------------------------------------------
>
>                 Key: YARN-5465
>                 URL: https://issues.apache.org/jira/browse/YARN-5465
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful
>            Reporter: Robert Kanter
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

Reply via email to