[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-5465: - Target Version/s: 3.5.0 (was: 3.4.0) > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Priority: Major > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-5465: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Priority: Major > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-5465: - Target Version/s: 3.3.0 (was: 3.2.0) Bulk update: moved all 3.2.0 non-blocker issues, please move back if it is a blocker. > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Priority: Major > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5465: -- Target Version/s: 3.1.0 (was: 2.9.0) > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org