[ 
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345266#comment-15345266
 ] 

Jason Lowe commented on YARN-4280:
----------------------------------

bq.  IIRC the headroom is a combination of the user limits and the queue limits.

OK  it looks like this has changed a bit since I last got deep into it.  I 
think the headroom is the queue headroom with no user-specific stuff in it, so 
I think we're OK there.  Thinking about this more, I believe we can simplify 
the blockedResource field added to CSAssignment and instead just track this 
with a boolean flag or expand the existing CSAssignent skipped boolean to be an 
enumeration of skipped types.  When we flag the assignment as 
"queue-limit-skipped" then the parent queue can know that the 
allocation/reservation wasn't made solely due to insufficient free resources in 
the child queue's limits. Then it can lower its own limits by the child's 
limits to effectively block other sibling queues from using the remaining 
resources for that child queue until the allocation can be made, the ask is 
removed, a higher priority app starts asking, queues are resorted, etc. etc.

> CapacityScheduler reservations may not prevent indefinite postponement on a 
> busy cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4280
>                 URL: https://issues.apache.org/jira/browse/YARN-4280
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.6.1, 2.8.0, 2.7.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4280.001.patch, YARN-4280.002.patch, 
> YARN-4280.003.patch, YARN-4280.004.patch
>
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
> total cluster capacity. There are 2 applications, appX that runs on Queue A, 
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
> GB containers.
> The user limit is high enough for the application to reach 100% of the 
> cluster resource. 
> appX is running at total cluster capacity, full with 1G containers releasing 
> only one container at a time. appY comes in with a request of 2GB container 
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it 
> has higher priority and should reserve for its 2 GB request. Since this 
> request puts the alloc+reserve above total capacity of the cluster, 
> reservation is not made. appX comes in with a 1GB request and since 1GB is 
> still available, the request is allocated. 
> This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to