[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101166#comment-15101166
 ] 

Wangda Tan commented on YARN-4108:
----------------------------------

Thanks for looking at this, [~eepayne].

bq. In the lazy preemption case, PCPP will send an event to the scheduler to 
mark a container killable. Can PCPP check if it's already been marked before 
sending, so that maybe event traffic will be less in the RM?
Agree, we can create a killable map similar to preempted-map in PCPP

bq. Currently, if both queueA and queueB are over their guaranteed capacity, 
preemption will still occur if queueA is more over capacity than queueB. I 
think it is probably important to preserve this behavior (YARN-2592).
Thank for pointing me this patch, quick read comments on YARN-2592. I think we 
can still keep the same behavior in the new proposal: currently I assume only 
queue with usage less than guranteed can preempt containers from others, but we 
can relax this limit to: queue doesn't have to-be-preempted containers could 
preempt from others.
However, I think allowing two over-satisfied queues shooting at each other may 
not reasonable, if we have 3 queues configured to, a=10, b=20, c=70. when c 
uses nothing, we cannot simply interpret a's new capacity = 33 and b's new 
capacity = 66. (a:b = 10:20). Since admin only configured capacities of a/b to 
10/20, we should strictly follow what admin configured.

bq. don't see anyplace where ResourceLimits#isAllowPreemption is called. But, 
if it is, Will the following code in LeafQueue change preemption behavior?...
Yes, LeafQueue decides an app could kill containers or not. And app will use it 
in 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer}}
 for deciding {{toKillContainers}}.

bq. I'm just trying to understand how things will be affected when headroom for 
a parent queue is (limit - used) + killable. Doesn't that say that a parent 
queue has more headroom than it's already acutally using? Is it relying on this 
behavior so that the assignment code will determine that it has more headroom 
when there are killable containers, and then rely on the leafqueue to kill 
those containers?
I'm not sure if I understand your question properly, let me trying to explain 
this behavior: 
ParentQueue will add its own killable container to headroom 
(getTotalKillableResource is a bad naming, it should be 
{{getTotalKillableResourceForThisQueue}}). Since these containers are all 
belongs to the parent queue, it has rights to kill all of them to satisfy 
max-queue-capacity.
Killable container will be actually killed in two cases:
- An under-satisfied leaf queue trying to allocate on a node, but the node 
doesn't have enough resources, so it will kill containers *on the node* to 
allocate the new container
- A queue who is using more than max-capacity, an it has killable container, we 
will try to kill containers for such queues to make sure it doesn't violate 
max-capacity. You can check following code in ParentQueue#allocateResource:
{code}
    // check if we need to kill (killable) containers if maximum resource 
violated.
    if (getQueueCapacities().getAbsoluteMaximumCapacity(nodePartition)
        < getQueueCapacities().getAbsoluteUsedCapacity(nodePartition)) {
      killContainersToEnforceMaxQueueCapacity(nodePartition, clusterResource);
    }
{code} 
 
bq. NPE if getChildQueues() returns null
Nice catching, updated locally

bq. CSAssignment#toKillContainers: I would call them containersToKill
Agree, updated locally 

bq. It would be interesting to know what your thoughts are on making further 
modifications to PCPP to make more informed choices about which containers to 
kill.
I don't have clear ideas for this, a rough idea in my mind is, we could adding 
some field to scheduler to indicate some special request (e.g. 
large/hard-locality, etc.) is starving and head-of-line (HOL). And doing scan 
in PCPP at background, after PCPP marks container-to-be-preempted, we can 
leverage marked starving-and-HOL request to modify existing marked 
to-be-preempted containers.
Again, this is a rough thinking, I'm not sure if it is doable.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4108
>                 URL: https://issues.apache.org/jira/browse/YARN-4108
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to