[
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192324#comment-16192324
]
Steven Rand commented on YARN-7290:
-----------------------------------
An additional problem is that we call {{app.trackContainerForPreemption}} in
{{preemptContainers}}, so after {{identifyContainersToPreempt}} has returned.
Therefore after we've finished iterating through one container in the value of
{{rr.getNumContainers()}}, we will have added some containers to
{{containersToPreempt}}, but {{resourcesToBePreempted}} will not have been
updated for any app. This allows subsequent calls to
{{canContainerBePreempted}} in the same for loop to return {{true}}
incorrectly, since we've already decided to preempt some containers, but the
apps aren't aware of it yet.
> canContainerBePreempted can return true when it shouldn't
> ---------------------------------------------------------
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 3.0.0-beta1
> Reporter: Steven Rand
> Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
> Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to
> consider containers for the same app that we already added to
> {{preemptableContainers}} in
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a
> case where we preempt multiple containers from the same app, none of which by
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2
> containers, each of which is 3g and 3vcores in size. At this point both
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below
> its fair share, but happens anyway because all six times that we call
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to
> account for containers that we're already planning on preempting in
> FSPreemptionThread#identifyContainersToPreemptOnNode.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]