[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358708#comment-14358708 ] Nathan Roberts commented on YARN-3298: -- I agree. Let's not change anything for the time being. If YARN-2113 requires some tweaking in this area, we can do it at that time. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357564#comment-14357564 ] Wangda Tan commented on YARN-3298: -- [~nroberts], I think I got your point now. Yes, as you said, if we enforce the limit (used + requred = user-limit), and don't change the user-limit computation, queue cannot over its configured capacity. Originally, this ticket trying to solve the jitter problem when we have the YARN-2069. However, YARN-2069 will only take effect when queue becomes over-satisfied, at that time, CS will not give queue more resources. So the jitter won't happen actually. Jitter will happen when we have YARN-2113 (preemption will happen to balance usage between users when queue doesn't over its capacity), at that time, user-limit enforcement should be done. Basically, I agree with your method, which is {{current_capacity = max(queue.used,queue.capacity)+now_required}}, it can solve the queue cannot over its configured capacity problem, but it seems not necessary at least for now. We can delay this change until YARN-2113 is required. Thoughts? Thanks, Wangda User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355030#comment-14355030 ] Nathan Roberts commented on YARN-3298: -- If you have a prototype patch, please post it since that will make the proposal crystal clear. The issue I raised at https://issues.apache.org/jira/browse/YARN-3298?focusedCommentId=14353053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353053 doesn't happen today. I think it will happen if we try to be precisely strict with user-limit. It's also not about small amount of resource cannot be used in a queue - It's everything between capacity and max-capacity, which can be a large percentage of the cluster. In my mind. - I would be ok with changing current-capacity = max(queue.used,queue.capacity)+now-required; because I think it's more consistent. Not strictly necessary though, just an improvement. - I don't see an overwhelming reason to make user-limit a precisely enforced hard-limit. Currently, users can't get beyond it by very much, and that seems ok to me. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353053#comment-14353053 ] Nathan Roberts commented on YARN-3298: -- Thanks [~leftnoteasy] for the additional detail. Maybe I should just wait for the patch, but here's the case I'm worried about. queue.used is just under queue.capacity, so current-capacity = queue.capacity. two users in the queue, both have same used resources user-limit will be slightly less than (queue-capacity/2). (so user-limit can be extremely close to user.usage) user.usage + required might now be slightly greater than user-limit. If that happens, it seems like we'll be unable to cross the capacity threshold. Once above capacity, I think it will work, but crossing that threshold might be hard. Seems like current-capacity should be calculated as: {code} current-capacity = max(queue.used,queue.capacity)+now-required; {code} User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353553#comment-14353553 ] Wangda Tan commented on YARN-3298: -- Hi [~nroberts], If I understand what you meant correctly, maybe we can just relax when user.used user.limit (instead of user.used + now_required = user.limit), which can solve the problem you mentioned. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353852#comment-14353852 ] Wangda Tan commented on YARN-3298: -- [~nroberts], As you mentioned, it is mostly as same as what we have today, and I think it cannot solve the jitter problem. What I really want to say is enforce the limit. To solve small amount of resource cannot be used in a queue problem which you mentioned in https://issues.apache.org/jira/browse/YARN-3298?focusedCommentId=14353053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353053, setting user-limit a little bit higher should solve the problem also. (like from 50 to 51). Sounds like a plan? User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353833#comment-14353833 ] Nathan Roberts commented on YARN-3298: -- [~leftnoteasy], won't that be extremely close to what it is today? If so, then does it really solve the jitter issue you originally cited? Just to make sure I'm in-sync with your proposed direction, this is the code you're thinking about modifying, correct? {code} // Note: We aren't considering the current request since there is a fixed // overhead of the AM, but it's a check, not a = check, so... if (Resources .greaterThan(resourceCalculator, clusterResource, user.getConsumedResourceByLabel(label), limit)) { {code} User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351171#comment-14351171 ] Wangda Tan commented on YARN-3298: -- Hi [~nroberts], In your example, each user can still get 5x (after my proposal). *According to how user-limit get computed:* (my proposal doesn't change this part) current-capacity = queue.used + now-required (assume queue's usage is more than queue's capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) I realized maybe you mis-understood user-limit to be user-limit option only, but actually what I meant is the above equation :). bq. What I see user limit doing is controlling which of the actively requesting applications are getting newly available resources. Basically, making it so that the queue can grow to 10x in the above example while trying to make sure that each of the users within the queue are getting equal shares of capacity. This will be enforced, each user will allow to use more than queue's minimum share, and can grow up get equal share of capacity when user-limit and user-limit-factor is properly set. The only difference is, in the past, each user can get (5x + 1 container resource), but after this patch, each user can get = 5x resource. Does this make sense to you? User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit, queue will continue to allocate container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351178#comment-14351178 ] Wangda Tan commented on YARN-3298: -- Updated description a little bit to make it less confused. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350614#comment-14350614 ] Nathan Roberts commented on YARN-3298: -- Hi Wangda. I'm a little concerned about this proposal. I think userlimit has been acting this way for a long time so a change could have a very significant impact on how queues behave. If I'm understanding the proposal correctly, a queue that is configured with minimum_user_limit_percent=50, capacity=x, max_capacity=10x, user_limit_factor=5 2 active users would not be able to get the queue above x. Please correct me if that's not the case. Assuming that is the case, I'm not sure that's what we want. What I see user limit doing is controlling which of the actively requesting applications are getting newly available resources. Basically, making it so that the queue can grow to 10x in the above example while trying to make sure that each of the users within the queue are getting equal shares of capacity. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit, queue will continue to allocate container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)