[ 
https://issues.apache.org/jira/browse/YARN-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-11466:
---------------------------------
    Description: 
Currently, YARN Graceful Decommission waits for the completion of both running 
containers and the running applications 
(https://issues.apache.org/jira/browse/YARN-9608) of those containers launched 
on the node under decommission. This adds unnecessary huge cost to users on 
cloud deployments as most of the idle nodes are under decommission waiting for 
the running application to complete.

This feature aims to improve the Graceful Decommission logic by waiting for the 
actual shuffle data to be consumed by dependent tasks rather than the entire 
application. Below is the high-level design I have in mind.

Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) 
through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes 
shuffle data metrics (like shuffle data being present or not). NodeManager 
periodically collects the shuffle data metrics from the configured 
AuxiliaryShuffleServices and sends them along with the heartbeat to the 
ResourceManager. The graceful decommission logic runs inside ResourceManager 
waits until the shuffle data is consumed, with a maximum wait time up to the 
configured graceful decommission timeout.

  was:
Currently, YARN Graceful Decommission waits for the completion of both running 
containers and the running applications of those containers launched on the 
node under decommission. This adds unnecessary cost to users on cloud 
deployments. This feature aims to improve the Graceful Decommission logic by 
waiting for the actual shuffle data to be consumed by dependent tasks rather 
than the entire application.

Below is the high-level design I have in mind.

Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) 
through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes 
shuffle data metrics (like shuffle data being present or not). NodeManager 
periodically collects the shuffle data metrics from the configured 
AuxiliaryShuffleServices and sends them along with the heartbeat to the 
ResourceManager. The graceful decommission logic runs inside ResourceManager 
waits until the shuffle data is consumed, with a maximum wait time up to the 
configured graceful decommission timeout.




> Graceful Decommission for Shuffle Services
> ------------------------------------------
>
>                 Key: YARN-11466
>                 URL: https://issues.apache.org/jira/browse/YARN-11466
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> Currently, YARN Graceful Decommission waits for the completion of both 
> running containers and the running applications 
> (https://issues.apache.org/jira/browse/YARN-9608) of those containers 
> launched on the node under decommission. This adds unnecessary huge cost to 
> users on cloud deployments as most of the idle nodes are under decommission 
> waiting for the running application to complete.
> This feature aims to improve the Graceful Decommission logic by waiting for 
> the actual shuffle data to be consumed by dependent tasks rather than the 
> entire application. Below is the high-level design I have in mind.
> Add a new interface (say AuxiliaryShuffleService extends AuxiliaryService) 
> through which the workloads (Spark, Tez, MapReduce) ShuffleHandler exposes 
> shuffle data metrics (like shuffle data being present or not). NodeManager 
> periodically collects the shuffle data metrics from the configured 
> AuxiliaryShuffleServices and sends them along with the heartbeat to the 
> ResourceManager. The graceful decommission logic runs inside ResourceManager 
> waits until the shuffle data is consumed, with a maximum wait time up to the 
> configured graceful decommission timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to