[
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661410#comment-13661410
]
Carlo Curino commented on YARN-569:
---
The findbugs warnings are on accesses of a ResourceCalculator and
minAllocation, so not really concerning.
CapacityScheduler: support for preemption (using a capacity monitor)
Key: YARN-569
URL: https://issues.apache.org/jira/browse/YARN-569
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf,
preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.patch,
YARN-569.patch
There is a tension between the fast-pace reactive role of the
CapacityScheduler, which needs to respond quickly to
applications resource requests, and node updates, and the more introspective,
time-based considerations
needed to observe and correct for capacity balance. To this purpose we opted
instead of hacking the delicate
mechanisms of the CapacityScheduler directly to add support for preemption by
means of a Capacity Monitor,
which can be run optionally as a separate service (much like the
NMLivelinessMonitor).
The capacity monitor (similarly to equivalent functionalities in the fairness
scheduler) operates running on intervals
(e.g., every 3 seconds), observe the state of the assignment of resources to
queues from the capacity scheduler,
performs off-line computation to determine if preemption is needed, and how
best to edit the current schedule to
improve capacity, and generates events that produce four possible actions:
# Container de-reservations
# Resource-based preemptions
# Container-based preemptions
# Container killing
The actions listed above are progressively more costly, and it is up to the
policy to use them as desired to achieve the rebalancing goals.
Note that due to the lag in the effect of these actions the policy should
operate at the macroscopic level (e.g., preempt tens of containers
from a queue) and not trying to tightly and consistently micromanage
container allocations.
- Preemption policy (ProportionalCapacityPreemptionPolicy):
-
Preemption policies are by design pluggable, in the following we present an
initial policy (ProportionalCapacityPreemptionPolicy) we have been
experimenting with. The ProportionalCapacityPreemptionPolicy behaves as
follows:
# it gathers from the scheduler the state of the queues, in particular, their
current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it
computes a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule
and achieve capacity balance (accounting for natural completion rates, and
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the
last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount
of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues preemptions for containers from the same
applications (reverse chronological order, last assigned container first)
again until necessary or until no containers except the AM container are left,
# (if not enough) it moves onto unreserve and preempt from the next
application.
# containers that have been asked to preempt are tracked across executions.
If a containers is among the one to be preempted for more than a certain
time, the container is moved in a the list of containers to be forcibly
killed.
Notes:
(*) at the moment, in order to avoid double-counting of the requests, we only
look at the ANY part of pending resource requests, which means we might not
preempt on behalf of AMs that ask only for specific locations but not any.
(**) The ideal balance state is one in which each queue has at least its
guaranteed capacity, and the spare capacity is distributed among queues (that
wants some) as a weighted fair share. Where the weighting is based on the
guaranteed capacity of a queue, and the function runs to a fix point.
Tunables of the ProportionalCapacityPreemptionPolicy:
# observe-only mode (i.e., log the actions it would take, but behave as
read-only)
# how frequently to run the policy
# how long to wait between preemption and kill of a container
# which fraction of the containers I would like to obtain should I preempt
(has to do with the natural rate at which containers are returned)
# deadzone size, i.e., what