[ https://issues.apache.org/jira/browse/YARN-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prabhu Joseph updated YARN-11466: --------------------------------- Description: Currently, YARN Graceful Decommission waits for the completion of both running containers and the running applications (https://issues.apache.org/jira/browse/YARN-9608) of those containers launched on the node under decommission. This adds unnecessary huge cost to users on cloud deployments as most of the idle nodes are under decommission waiting for the running application to complete. This feature aims to improve the Graceful Decommission logic by waiting for the actual shuffle data to be consumed by dependent tasks rather than the entire application. Below is the high-level design I have in mind. Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes shuffle data metrics (like shuffle data being present or not). NodeManager periodically collects the shuffle data metrics from the configured AuxiliaryShuffleServices and sends them along with the heartbeat to the ResourceManager. The graceful decommission logic runs inside ResourceManager waits until the shuffle data is consumed, with a maximum wait time up to the configured graceful decommission timeout. was: Currently, YARN Graceful Decommission waits for the completion of both running containers and the running applications of those containers launched on the node under decommission. This adds unnecessary cost to users on cloud deployments. This feature aims to improve the Graceful Decommission logic by waiting for the actual shuffle data to be consumed by dependent tasks rather than the entire application. Below is the high-level design I have in mind. Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes shuffle data metrics (like shuffle data being present or not). NodeManager periodically collects the shuffle data metrics from the configured AuxiliaryShuffleServices and sends them along with the heartbeat to the ResourceManager. The graceful decommission logic runs inside ResourceManager waits until the shuffle data is consumed, with a maximum wait time up to the configured graceful decommission timeout. > Graceful Decommission for Shuffle Services > ------------------------------------------ > > Key: YARN-11466 > URL: https://issues.apache.org/jira/browse/YARN-11466 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > > Currently, YARN Graceful Decommission waits for the completion of both > running containers and the running applications > (https://issues.apache.org/jira/browse/YARN-9608) of those containers > launched on the node under decommission. This adds unnecessary huge cost to > users on cloud deployments as most of the idle nodes are under decommission > waiting for the running application to complete. > This feature aims to improve the Graceful Decommission logic by waiting for > the actual shuffle data to be consumed by dependent tasks rather than the > entire application. Below is the high-level design I have in mind. > Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) > through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes > shuffle data metrics (like shuffle data being present or not). NodeManager > periodically collects the shuffle data metrics from the configured > AuxiliaryShuffleServices and sends them along with the heartbeat to the > ResourceManager. The graceful decommission logic runs inside ResourceManager > waits until the shuffle data is consumed, with a maximum wait time up to the > configured graceful decommission timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org