Craig Welch commented on YARN-2925:

So, a few things things:

I can see how we could get stale values for usedResources during the 
getHeadroom call - the usedResources component of headroom calculation was 
added after the initial change to do final computation on demand and the 
potential impact of that not thought through all the way, clearly.  A comment 
there to the effect that there is an additional entry point thread wise to 
consider would have helped (& I think we should add it with this change for the 

There is synchronization on each active application during computeUserLimit*, 
and updates to usedResources occur there, and the getHeadroom call is also 
synchronized on the application, so there is a high degree of likelihood of 
reasonable consistency in practice, but no guarantee as far as I can see.  
Fortunately, the low level value fields of Resource are "ints", so the worst 
that can happen is that we are working with stale values (not "values which 
were never set/random int values"...), but we don't want to be working with 
stale values as that will mean we could get incorrect results on occasion.

At the moment the consistent lock point between the allocate() where headroom 
is retrieved and the compute... in the queue is only the application, which is 
not sufficient to cover those some items which are further shared - 
usedResource (which you were looking at), the user's resource (consumedResource 
on the User object associated with the application) and clusterResource (which 
need to be dealt with too as far as I can see - though it is probably rarely an 
issue in practice, should fix it now too).

All that said, some thoughts on the particulars of your patch:

1.  I think we should combine the QueueHeadroomInfo and the QueueResourceInfo 
types - I didn't get to it before you posted your patch, but the purpose of 
QueueHeadroomInfo was already to provide a finer grained lock point for the 
very reasons outlined here, and it is doing so for certain items, but it 
doesn't catch them all or do it in a way which is 100% safe as you've noted.  I 
don't think it matters which one we end up  but we don't need both and they are 
both trying to accomplish the same thing, so we should combine them.

2.  I don't think the patch as it stands will resolve the issue as it is just 
asserting consistency for access to the resource members of QueueResourceInfo - 
the problem is the calculation of headroom which makes use of their values 
which could occur concurrently with the addTo or subtractFrom operations after 
the references have been pulled from the accessor.   The lock consistency will 
need to also encompass the calculation which occurs in getHeadroom(User user, 
Resource queueMaxCap,
      Resource clusterResource, Resource userLimit) in LeafQueue to be 

Also, "usedResources" does not appear to be used in either AbstractCSQueue or 
ParentQueue for anything except debug logging - and as applications can only 
run in leaf queues, I believe it should be relocated back to be a member of 
LeafQueue.  Given that, I think that the QueueHeadroomXYZ class should live in 
the LeafQueue for simplicity/encapsulation.  Finally, the clusterResource also 
needs to be handled here, as well as the user's consumed resource 

I also don't think that the read/write lock complexity is called for here, I 
don't think it's helping anything and will make the code/understanding the code 
more complex, we should just synchronize on the QueueXYZInfo instance which 
survives the combination of QHeadroomInfo and QResourceInfo

Given all that, I think the full fix would be:

Combine QHeadroom* and QueueResource* into one class who's per-queue- instance 
lives in LeafQueue
relocate UsedResources toLeafQueue (in the QHead/Resource* instance)
move the final getHeadroom calculation into that same class, synchronize it.
move the add/subtract to/from for usedResources into a synchronized method of 
that class (much as you have), and do the same for the user consumed resource 
values (and in a sense cluster, see below)

clusterResource - 

I think this one also needs to be handled - the instance of this which is used 
should be copied when it is set inside a synchronized method of the 
QHeadroom|Resource* class which we end up with - setting of the clusterResource 
appears to happen consistently when it changes (via updateClusterResource so it 
should be kept sufficiently up to date with the cluster resource and by copying 
it and then we don't have to try to lock the scheduler which is where it is be 
modified and which would be too intrusive.

[~leftnoteasy] does this make sense?

> Internal fields in LeafQueue access should be protected when accessed from 
> FiCaSchedulerApp to calculate Headroom
> -----------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2925
>                 URL: https://issues.apache.org/jira/browse/YARN-2925
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-2925.1.patch
> Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before 
> sending back Allocation response to AM.
> Headroom calculation is happened in LeafQueue side, uses fields like used 
> resource, etc. But it is not protected by any lock of LeafQueue, so it might 
> be corrupted is someone else is editing it.

This message was sent by Atlassian JIRA

Reply via email to