[ 
https://issues.apache.org/jira/browse/YARN-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7469:
-----------------------------
    Attachment: YARN-7469.001.patch

Attaching a proposal for a patch to fix this problem.

Proposed fix: In {{calculateToBePreemptedResourcePerApp}}, if 
{{USERLIMIT_FIRST}}  policy is set, subtract off minimum container size. 
Basically, the code in {{skipContainerBasedOnintraQueuePolicy}} skips the 
container if it will bring it down to the user limit because the capacity 
scheduler assigns one container more than the user limit.

Also, in 2.8, this fix has a problem of oscillation due to the difference in 
how user limit is calculated between 2.8 and later releases. Basically 
(ignoring ULF, MULP, and maybe others), the calculation in 2.8 is {{total used 
resources / number of active users}} while the calculation in later releases is 
{{total active resources / number of active users}}. With this fix in 2.8, it 
would cause the value of {{getResourceLimitForAllUsers}} (used by preemption 
monitor) to be greater than {{getHeadroom}} used by leafqueue, which would 
cause more preemption to occur than necessary.

Bottom line is that I'm still working on a 2.8 solution.

> Capacity Scheduler Intra-queue preemption: User can starve if newest app is 
> exactly at user limit
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7469
>                 URL: https://issues.apache.org/jira/browse/YARN-7469
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, yarn
>    Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: UnitTestToShowStarvedUser.patch, YARN-7469.001.patch
>
>
> Queue Configuration:
> - Total Memory: 20GB
> - 2 Queues
> -- Queue1
> --- Memory: 10GB
> --- MULP: 10%
> --- ULF: 2.0
> - Minimum Container Size: 0.5GB
> Use Case:
> - User1 submits app1 to Queue1 and consumes 20GB
> - User2 submits app2 to Queue1 and requests 7.5GB
> - Preemption monitor preempts 7.5GB from app1. Capacity Scheduler gives those 
> resources to User2
> - User 3 submits app3 to Queue1. To begin with, app3 is requesting 1 
> container for the AM.
> - Preemption monitor never preempts a container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to