[
https://issues.apache.org/jira/browse/YARN-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236461#comment-14236461
]
Craig Welch commented on YARN-2925:
-----------------------------------
So, a few things things:
I can see how we could get stale values for usedResources during the
getHeadroom call - the usedResources component of headroom calculation was
added after the initial change to do final computation on demand and the
potential impact of that not thought through all the way, clearly. A comment
there to the effect that there is an additional entry point thread wise to
consider would have helped (& I think we should add it with this change for the
future...).
There is synchronization on each active application during computeUserLimit*,
and updates to usedResources occur there, and the getHeadroom call is also
synchronized on the application, so there is a high degree of likelihood of
reasonable consistency in practice, but no guarantee as far as I can see.
Fortunately, the low level value fields of Resource are "ints", so the worst
that can happen is that we are working with stale values (not "values which
were never set/random int values"...), but we don't want to be working with
stale values as that will mean we could get incorrect results on occasion.
At the moment the consistent lock point between the allocate() where headroom
is retrieved and the compute... in the queue is only the application, which is
not sufficient to cover those some items which are further shared -
usedResource (which you were looking at), the user's resource (consumedResource
on the User object associated with the application) and clusterResource (which
need to be dealt with too as far as I can see - though it is probably rarely an
issue in practice, should fix it now too).
All that said, some thoughts on the particulars of your patch:
1. I think we should combine the QueueHeadroomInfo and the QueueResourceInfo
types - I didn't get to it before you posted your patch, but the purpose of
QueueHeadroomInfo was already to provide a finer grained lock point for the
very reasons outlined here, and it is doing so for certain items, but it
doesn't catch them all or do it in a way which is 100% safe as you've noted. I
don't think it matters which one we end up but we don't need both and they are
both trying to accomplish the same thing, so we should combine them.
2. I don't think the patch as it stands will resolve the issue as it is just
asserting consistency for access to the resource members of QueueResourceInfo -
the problem is the calculation of headroom which makes use of their values
which could occur concurrently with the addTo or subtractFrom operations after
the references have been pulled from the accessor. The lock consistency will
need to also encompass the calculation which occurs in getHeadroom(User user,
Resource queueMaxCap,
Resource clusterResource, Resource userLimit) in LeafQueue to be
successful
Also, "usedResources" does not appear to be used in either AbstractCSQueue or
ParentQueue for anything except debug logging - and as applications can only
run in leaf queues, I believe it should be relocated back to be a member of
LeafQueue. Given that, I think that the QueueHeadroomXYZ class should live in
the LeafQueue for simplicity/encapsulation. Finally, the clusterResource also
needs to be handled here, as well as the user's consumed resource
(user.getTotalConsumedResources())
I also don't think that the read/write lock complexity is called for here, I
don't think it's helping anything and will make the code/understanding the code
more complex, we should just synchronize on the QueueXYZInfo instance which
survives the combination of QHeadroomInfo and QResourceInfo
Given all that, I think the full fix would be:
Combine QHeadroom* and QueueResource* into one class who's per-queue- instance
lives in LeafQueue
relocate UsedResources toLeafQueue (in the QHead/Resource* instance)
move the final getHeadroom calculation into that same class, synchronize it.
move the add/subtract to/from for usedResources into a synchronized method of
that class (much as you have), and do the same for the user consumed resource
values (and in a sense cluster, see below)
clusterResource -
I think this one also needs to be handled - the instance of this which is used
should be copied when it is set inside a synchronized method of the
QHeadroom|Resource* class which we end up with - setting of the clusterResource
appears to happen consistently when it changes (via updateClusterResource so it
should be kept sufficiently up to date with the cluster resource and by copying
it and then we don't have to try to lock the scheduler which is where it is be
modified and which would be too intrusive.
[~leftnoteasy] does this make sense?
> Internal fields in LeafQueue access should be protected when accessed from
> FiCaSchedulerApp to calculate Headroom
> -----------------------------------------------------------------------------------------------------------------
>
> Key: YARN-2925
> URL: https://issues.apache.org/jira/browse/YARN-2925
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-2925.1.patch
>
>
> Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before
> sending back Allocation response to AM.
> Headroom calculation is happened in LeafQueue side, uses fields like used
> resource, etc. But it is not protected by any lock of LeafQueue, so it might
> be corrupted is someone else is editing it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)