[
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101166#comment-15101166
]
Wangda Tan commented on YARN-4108:
----------------------------------
Thanks for looking at this, [~eepayne].
bq. In the lazy preemption case, PCPP will send an event to the scheduler to
mark a container killable. Can PCPP check if it's already been marked before
sending, so that maybe event traffic will be less in the RM?
Agree, we can create a killable map similar to preempted-map in PCPP
bq. Currently, if both queueA and queueB are over their guaranteed capacity,
preemption will still occur if queueA is more over capacity than queueB. I
think it is probably important to preserve this behavior (YARN-2592).
Thank for pointing me this patch, quick read comments on YARN-2592. I think we
can still keep the same behavior in the new proposal: currently I assume only
queue with usage less than guranteed can preempt containers from others, but we
can relax this limit to: queue doesn't have to-be-preempted containers could
preempt from others.
However, I think allowing two over-satisfied queues shooting at each other may
not reasonable, if we have 3 queues configured to, a=10, b=20, c=70. when c
uses nothing, we cannot simply interpret a's new capacity = 33 and b's new
capacity = 66. (a:b = 10:20). Since admin only configured capacities of a/b to
10/20, we should strictly follow what admin configured.
bq. don't see anyplace where ResourceLimits#isAllowPreemption is called. But,
if it is, Will the following code in LeafQueue change preemption behavior?...
Yes, LeafQueue decides an app could kill containers or not. And app will use it
in
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer}}
for deciding {{toKillContainers}}.
bq. I'm just trying to understand how things will be affected when headroom for
a parent queue is (limit - used) + killable. Doesn't that say that a parent
queue has more headroom than it's already acutally using? Is it relying on this
behavior so that the assignment code will determine that it has more headroom
when there are killable containers, and then rely on the leafqueue to kill
those containers?
I'm not sure if I understand your question properly, let me trying to explain
this behavior:
ParentQueue will add its own killable container to headroom
(getTotalKillableResource is a bad naming, it should be
{{getTotalKillableResourceForThisQueue}}). Since these containers are all
belongs to the parent queue, it has rights to kill all of them to satisfy
max-queue-capacity.
Killable container will be actually killed in two cases:
- An under-satisfied leaf queue trying to allocate on a node, but the node
doesn't have enough resources, so it will kill containers *on the node* to
allocate the new container
- A queue who is using more than max-capacity, an it has killable container, we
will try to kill containers for such queues to make sure it doesn't violate
max-capacity. You can check following code in ParentQueue#allocateResource:
{code}
// check if we need to kill (killable) containers if maximum resource
violated.
if (getQueueCapacities().getAbsoluteMaximumCapacity(nodePartition)
< getQueueCapacities().getAbsoluteUsedCapacity(nodePartition)) {
killContainersToEnforceMaxQueueCapacity(nodePartition, clusterResource);
}
{code}
bq. NPE if getChildQueues() returns null
Nice catching, updated locally
bq. CSAssignment#toKillContainers: I would call them containersToKill
Agree, updated locally
bq. It would be interesting to know what your thoughts are on making further
modifications to PCPP to make more informed choices about which containers to
kill.
I don't have clear ideas for this, a rough idea in my mind is, we could adding
some field to scheduler to indicate some special request (e.g.
large/hard-locality, etc.) is starving and head-of-line (HOL). And doing scan
in PCPP at background, after PCPP marks container-to-be-preempted, we can
leverage marked starving-and-HOL request to modify existing marked
to-be-preempted containers.
Again, this is a rough thinking, I'm not sure if it is doable.
> CapacityScheduler: Improve preemption to preempt only those containers that
> would satisfy the incoming request
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf,
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf,
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113),
> cross applicaiton preemption (such as priority-based (YARN-1963) /
> fairness-based (YARN-3319)).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)