[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487802#comment-14487802
 ] 

Wangda Tan commented on YARN-3434:
----------------------------------

[~tgraves], you're right. But I'm wondering why this could happen:

When continousReservation enabled, it will do check in assignContainer:
{code}
        if (reservationsContinueLooking && rmContainer == null) {
          // we could possibly ignoring parent queue capacity limits when
          // reservationsContinueLooking is set.
          // If we're trying to reserve a container here, not container will be
          // unreserved for reserving the new one. Check limits again before
          // reserve the new container
          if (!checkLimitsToReserve(clusterResource, 
              application, capability)) {
            return Resources.none();
          }
        }
{code}

When continousReservation disabled, assignContainers will ensure user-limit 
will not be violated.

My point is, *user-limit and queue max capacity are all checked before reserve 
new container*. And allocation from reserved container will unreserve before 
continue. So I think in your case, 
https://issues.apache.org/jira/browse/YARN-3434?focusedCommentId=14485834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14485834:
 job-2 cannot reserve 25 * 12 GB containers. Did I miss anything?

And I've a question about continous reservation checking behavior, may or may 
not related to this issue: Now it will try to unreserve all containers under a 
user, but actually it will only unreserve at most one container to allocate a 
new container. Do you think is it fine to change the logic to be:

When (continousReservation-enabled) && (user.usage + required - 
min(max-allocation, user.total-reserved) <=user.limit), assignContainers will 
continue. This will prevent doing impossible allocation when user reserved lots 
of containers. (As same as queue reservation checking).

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3434
>                 URL: https://issues.apache.org/jira/browse/YARN-3434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to