Wangda Tan commented on YARN-1198:

I've just taken a look at all sub tasks of this JIRA, I'm wondering if we 
should define what is the "headroom" first.
In previous YARN, including YARN-1198 the headroom is defined as "the maximum 
resource of an application can get".
And in YARN-2008, the headroom is defined as "the available resource of an 
application can get", because we already considered used resource of sibling 

I'm afraid if we need add a new field like "guaranteed headroom" of an 
application consider its absolute capacity (not maximum capacity) and 
user-limits, etc. We may keep both of them because,
- The maximum resource is not always achievible because sum of maximum resource 
of leaf queues may excess cluster resource.
- With preemption, resource beyond guaranteed resource will be likely 
preempted. It should be consider as a temporary resource.

And with this, AM can,
- Using "guaranteed headroom" to allocate resource which will not be preempted.
- Using "maximum headroom" to try to allocate resource beyond its guaranteed 

And in my humble opinion, the "available resource of an application can get" 
doesn't make a lot of sense here, and may cause some backward-compatible 
problems as well. Because in a dynamic cluster, the number can change rapidly, 
it is possible that a cluster is fulfilled by another application just happens 
one second after the AM got the "available headroom".
And also, this field can not solve the deadlock problem as well, a malicious 
application can ask much more resource of this, or a careless developer totally 
ignore this field. The only valid solution in my head is putting such logic 
into scheduler side, and enforce resource usage by preemption policy.

Any thoughts? [~jlowe], [~cwelch]


> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>                 Key: YARN-1198
>                 URL: https://issues.apache.org/jira/browse/YARN-1198
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-1198.1.patch
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations

This message was sent by Atlassian JIRA

Reply via email to