[
https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242298#comment-15242298
]
Tao Jie commented on YARN-3126:
-------------------------------
I think this issue is quite common, and we have met the same problem.
The root cause is that when we should make the max-limitation check in
assignment, we should compare *current usage* + *resource to assign* with *max
resource limit*. However when have resource to assign to a queue, we know only
*current resource usage* and *max resource limit*, we don't know *resource to
assign* until we assign resource to an appAttempt.
This patch seems add a additional check(checkQueueResourceLimit) on *leaf
queue* then assign to AppAttempt, but *parent queue* resource usage may still
over max resource limit.
Also we already have *FSQueue.assignContainerPreCheck* for max resource limit.
If we add a new check, the former one seems to be unnecessary here.
[~kasha], would like to hear your thoughts.
> FairScheduler: queue's usedResource is always more than the maxResource limit
> -----------------------------------------------------------------------------
>
> Key: YARN-3126
> URL: https://issues.apache.org/jira/browse/YARN-3126
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.3.0
> Environment: hadoop2.3.0. fair scheduler. spark 1.1.0.
> Reporter: Xia Hu
> Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources
> Fix For: trunk-win
>
> Attachments: resourcelimit-02.patch, resourcelimit-test.patch,
> resourcelimit.patch
>
>
> When submitting spark application(both spark-on-yarn-cluster and
> spark-on-yarn-cleint model), the queue's usedResources assigned by
> fairscheduler always can be more than the queue's maxResources limit.
> And by reading codes of fairscheduler, I suppose this issue happened because
> of ignore to check the request resources when assign Container.
> Here is the detail:
> 1. choose a queue. In this process, it will check if queue's usedResource is
> bigger than its max, with assignContainerPreCheck.
> 2. then choose a app in the certain queue.
> 3. then choose a container. And here is the question, there is no check
> whether this container would make the queue sources over its max limit. If a
> queue's usedResource is 13G, the maxResource limit is 16G, then a container
> which asking for 4G resources may be assigned successful.
> This problem will always happen in spark application, cause we can ask for
> different container resources in different applications.
> By the way, I have already use the patch from YARN-2083.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)