[ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-6818:
--------------------------------
    Labels: release-blocker  (was: blocker)

> User limit per partition is not honored in branch-2.7 >=
> --------------------------------------------------------
>
>                 Key: YARN-6818
>                 URL: https://issues.apache.org/jira/browse/YARN-6818
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.4
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>              Labels: release-blocker
>         Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}    
> if (Resources
>         .greaterThan(resourceCalculator, clusterResource,
>             user.getUsed(label),
>             limit)) {
>       // if enabled, check to see if could we potentially use this node 
> instead
>       // of a reserved node if the application has reserved containers
>       if (this.reservationsContinueLooking) {
>         if (Resources.lessThanOrEqual(
>             resourceCalculator,
>             clusterResource,
>             Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
>             limit)) {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("User " + userName + " in queue " + getQueueName()
>                 + " will exceed limit based on reservations - " + " consumed: 
> "
>                 + user.getUsed() + " reserved: "
>                 + application.getCurrentReservation() + " limit: " + limit);
>           }
>           Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>           // we can only acquire a new container if we unreserve first since 
> we ignored the
>           // user limit. Choose the max of user limit or what was previously 
> set by max
>           // capacity.
>           
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>               clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>               amountNeededToUnreserve));
>           return true;
>         }
>       }
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("User " + userName + " in queue " + getQueueName()
>             + " will exceed limit - " + " consumed: "
>             + user.getUsed() + " limit: " + limit);
>       }
>       return false;
>     }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}      if (this.reservationsContinueLooking 
> && checkReservations
>           && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to