[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487802#comment-14487802 ]
Wangda Tan commented on YARN-3434: ---------------------------------- [~tgraves], you're right. But I'm wondering why this could happen: When continousReservation enabled, it will do check in assignContainer: {code} if (reservationsContinueLooking && rmContainer == null) { // we could possibly ignoring parent queue capacity limits when // reservationsContinueLooking is set. // If we're trying to reserve a container here, not container will be // unreserved for reserving the new one. Check limits again before // reserve the new container if (!checkLimitsToReserve(clusterResource, application, capability)) { return Resources.none(); } } {code} When continousReservation disabled, assignContainers will ensure user-limit will not be violated. My point is, *user-limit and queue max capacity are all checked before reserve new container*. And allocation from reserved container will unreserve before continue. So I think in your case, https://issues.apache.org/jira/browse/YARN-3434?focusedCommentId=14485834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14485834: job-2 cannot reserve 25 * 12 GB containers. Did I miss anything? And I've a question about continous reservation checking behavior, may or may not related to this issue: Now it will try to unreserve all containers under a user, but actually it will only unreserve at most one container to allocate a new container. Do you think is it fine to change the logic to be: When (continousReservation-enabled) && (user.usage + required - min(max-allocation, user.total-reserved) <=user.limit), assignContainers will continue. This will prevent doing impossible allocation when user reserved lots of containers. (As same as queue reservation checking). > Interaction between reservations and userlimit can result in significant ULF > violation > -------------------------------------------------------------------------------------- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.6.0 > Reporter: Thomas Graves > Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)