[
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342666#comment-15342666
]
Jason Lowe commented on YARN-4280:
----------------------------------
I haven't had a chance to look at the patch yet, but I'm not thrilled with the
thought of storing this state somewhere beyond the allocate call or using the
existing reserved container logic for this. I think it's going to add
special-case logic to the handling of reserved containers in all sorts of
places. In addition if we cannot really do the reservation if the parent queue
max resource will be violated then that doesn't solve the original problem. We
need the parent queue to completely stop assigning to other leaf queues if the
only reason we cannot reserve is because we would exceed the max capacity of
our parent when we did the reservation.
My thinking of the algorithm implementation would have no extra state being
stored with a queue. As soon as we do that we will have all sorts of cleanup
cases that have to be handled, like the app finished case Wangda mentioned
above, etc. I see the same issue if we go with actual reserved containers,
since we need to make sure those get cleaned up in similar situations otherwise
we will "leak" reservations. For example, another app in the leaf queue could
become higher priority or a higher-priority app that wasn't asking starts
asking again. If we made the "semi-reserved" container from a previous call,
we would need to release that container to allow the new, higher-priority app
to get its resource. In short, I think it will be messy to get it to work
correctly in all cases.
Instead I saw the algorithm as simply having the leaf queue returning the
amount of space it is trying to consume when this situation occurs, capped by
its max capacity. The parent queue would deduct that amount from its local
variable tracking the parent queue's current capability limits that it passes
to the child queues. If after deducting the limit is non-zero then the parent
can continue calling other child queues with the reduced parent limit (to keep
other children from eating into the higher-priority queue that is trying to
fulfil a reservation). If after deducting the limit is zero then the parent
can return early to its parent with a similar deduction, etc. The amount we're
holding back trying to fulfil the reservation is only ever stored in local
variables, so it never persists beyond the current scheduler allocate call.
Therefore there's nothing to clean up -- it automatically cleans up on the next
scheduler allocate call if we are either able to finally allocate/reserve a
real container or some higher-priority app starts asking or the app terminates,
etc.
> CapacityScheduler reservations may not prevent indefinite postponement on a
> busy cluster
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.6.1, 2.8.0, 2.7.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: YARN-4280.001.patch, YARN-4280.002.patch
>
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at
> total cluster capacity. There are 2 applications, appX that runs on Queue A,
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2
> GB containers.
> The user limit is high enough for the application to reach 100% of the
> cluster resource.
> appX is running at total cluster capacity, full with 1G containers releasing
> only one container at a time. appY comes in with a request of 2GB container
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it
> has higher priority and should reserve for its 2 GB request. Since this
> request puts the alloc+reserve above total capacity of the cluster,
> reservation is not made. appX comes in with a 1GB request and since 1GB is
> still available, the request is allocated.
> This can continue indefinitely causing priority inversion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]