[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carlo Curino updated YARN-567: ------------------------------ Description: A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. was: A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. > RM changes to support preemption for FairScheduler and CapacityScheduler > ------------------------------------------------------------------------ > > Key: YARN-567 > URL: https://issues.apache.org/jira/browse/YARN-567 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Reporter: Carlo Curino > Assignee: Carlo Curino > > A common tradeoff in scheduling jobs is between keeping the cluster busy and > enforcing capacity/fairness properties. FairScheduler and CapacityScheduler > takes opposite stance on how to achieve this. > The FairScheduler, leverages task-killing to quickly reclaim resources from > currently running jobs and redistributing them among new jobs, thus keeping > the cluster busy but waste useful work. The CapacityScheduler is typically > tuned > to limit the portion of the cluster used by each queue so that the likelihood > of violating capacity is low, thus never wasting work, but risking to keep > the cluster underutilized or have jobs waiting to obtain their rightful > capacity. > By introducing the notion of a work-preserving preemption we can remove this > tradeoff. This requires a protocol for preemption (YARN-45), and > ApplicationMasters that can answer to preemption efficiently (e.g., by > saving their intermediate state, this will be posted for MapReduce in a > separate JIRA soon), together with a scheduler that can issues preemption > requests (discussed in separate JIRAs YARN-568 and YARN-569). > The changes we track with this JIRA are common to FairScheduler and > CapacityScheduler, and are mostly propagation of preemption decisions through > the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira