[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068147#comment-14068147 ]
Wangda Tan commented on YARN-1198: ---------------------------------- I've just taken a look at all sub tasks of this JIRA, I'm wondering if we should define what is the "headroom" first. In previous YARN, including YARN-1198 the headroom is defined as "the maximum resource of an application can get". And in YARN-2008, the headroom is defined as "the available resource of an application can get", because we already considered used resource of sibling queues. I'm afraid if we need add a new field like "guaranteed headroom" of an application consider its absolute capacity (not maximum capacity) and user-limits, etc. We may keep both of them because, - The maximum resource is not always achievible because sum of maximum resource of leaf queues may excess cluster resource. - With preemption, resource beyond guaranteed resource will be likely preempted. It should be consider as a temporary resource. And with this, AM can, - Using "guaranteed headroom" to allocate resource which will not be preempted. - Using "maximum headroom" to try to allocate resource beyond its guaranteed headroom. And in my humble opinion, the "available resource of an application can get" doesn't make a lot of sense here, and may cause some backward-compatible problems as well. Because in a dynamic cluster, the number can change rapidly, it is possible that a cluster is fulfilled by another application just happens one second after the AM got the "available headroom". And also, this field can not solve the deadlock problem as well, a malicious application can ask much more resource of this, or a careless developer totally ignore this field. The only valid solution in my head is putting such logic into scheduler side, and enforce resource usage by preemption policy. Any thoughts? [~jlowe], [~cwelch] Thanks, Wangda > Capacity Scheduler headroom calculation does not work as expected > ----------------------------------------------------------------- > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > Attachments: YARN-1198.1.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)