[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun Suresh updated YARN-5465: ------------------------------ Target Version/s: 3.1.0 (was: 2.9.0) > Server-Side NM Graceful Decommissioning subsequent call behavior > ---------------------------------------------------------------- > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful > Reporter: Robert Kanter > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org