[ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5465:
---------------------------------------
    Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> ----------------------------------------------------------------
>
>                 Key: YARN-5465
>                 URL: https://issues.apache.org/jira/browse/YARN-5465
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful
>            Reporter: Robert Kanter
>            Priority: Major
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to