[ 
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345203#comment-15345203
 ] 

Jason Lowe commented on YARN-4280:
----------------------------------

Thanks for updating the patch, Kuhu!

I'm confused by the nullAssigment changes, especially in LeafQueue.  
NULL_ASSIGNMENT was replaced with an explicit object creation but otherwise is 
the same.  NULL_ASSIGNMENT should continue to be used for the early-out cases 
and normal non-assignments to avoid unnecessary object creation.  It will also 
make the patch substantially smaller.  If we're creating one because the 
blocked resource might be set on it then we should only create the object when 
we need to set the blocked resource.  If we're creating it because the caller 
might modify the returned assignment then the caller needs to be fixed to make 
a copy, since a constant assignment object can be returned sometimes.

Does this need to be using Resources.componentwiseMax instead of Resources.max?
{code}
      childLimits.setLimit(Resources.max(
          resourceCalculator,cluster, Resources.subtract(childLimits.getLimit(),
              finalBlockedLimits.getLimit()), Resources.none()));
{code}

Rather than subtracting the blocked resources from each result of 
getResourceLimitsOfChild, I think it would be better to adjust the parent 
limits that are already passed to getResourceLimitsOfChild if a child reports 
blocked resources.

finalBlockedLimits should just be a Resource instead of a ResourceLimit.  It 
only ever uses the Resource within the ResourceLimit in practice, and it is 
just a Resource total anyway.

Some debug logs when we're asking for blocked resources in the assignment or 
applying them to parent limits would be helpful for analysis and debugging.

I'm confused why we're checking the headroom to determine the amount of blocked 
resources.  IIRC the headroom is a combination of the user limits and the queue 
limits.  We only want to report blocked resources when we are blocked by the 
queue limits.  If the user cannot make a reservation only due to the user's own 
limits then we don't want to report any blocked resources.  We only want to 
report resources when we would have either allocated or made a reservation but 
the queue's limits prevent the full allocation.  Then, and only then, we want 
to report the blocked resources as the amount remaining available in the queue 
so those resources are reserved relative to other queues until we are able to 
make the full allocation or reservation.

On a related note, I'm confused on why LeafQueue is subtracting the headroom 
from the blocked resources.  What does this represent?  Seems like this could 
report more blocked resources than the queue has available, which would allow 
the queue to influence more capacity than its configured max.  


bq. With this approach, I think allocations will be skipped for other queues 
untill this 8GB is served.

If I understand Sunil's question properly, then yes it will block other queues 
under the parent queue until that 8GB is served, and that is exactly what is 
needed to solve the problem.  Let me restate the scenario to make sure I am 
understanding it properly.  By "One queue is under served and it has a single 
pending demand for 8GB" then I assume you mean a leaf queue that wants to 
allocate 8GB, and the leaf queue would normally be able to allocate it but the 
usage of the parent is such that there's less than 8GB available in the parent. 
  In other words, this is a failure to reserve due to parental limits.  In this 
case, if we fail to block the other sibling queues from allocating their 
smaller 2GB requests then we have the same type of scenario as in the JIRA 
description -- a higher priority queue that is indefinitely starved by lower 
priority queues because it can't reserve the remaining resources.  So yes, we 
need the other queues to stop allocating until the higher-priority queue's 
allocation is satisfied or we have a priority inversion and indefinite 
postponement issues.

> CapacityScheduler reservations may not prevent indefinite postponement on a 
> busy cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4280
>                 URL: https://issues.apache.org/jira/browse/YARN-4280
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.6.1, 2.8.0, 2.7.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4280.001.patch, YARN-4280.002.patch, 
> YARN-4280.003.patch, YARN-4280.004.patch
>
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
> total cluster capacity. There are 2 applications, appX that runs on Queue A, 
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
> GB containers.
> The user limit is high enough for the application to reach 100% of the 
> cluster resource. 
> appX is running at total cluster capacity, full with 1G containers releasing 
> only one container at a time. appY comes in with a request of 2GB container 
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it 
> has higher priority and should reserve for its 2 GB request. Since this 
> request puts the alloc+reserve above total capacity of the cluster, 
> reservation is not made. appX comes in with a 1GB request and since 1GB is 
> still available, the request is allocated. 
> This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to