[
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602326#comment-16602326
]
Wangda Tan commented on YARN-8513:
----------------------------------
And btw, I found a comment in LeafQueue:
{code:java}
private void updateCurrentResourceLimits(
ResourceLimits currentResourceLimits, Resource clusterResource) {
// TODO: need consider non-empty node labels when resource limits supports
// node labels
// Even if ParentQueue will set limits respect child's max queue capacity,
// but when allocating reserved container, CapacityScheduler doesn't do
// this. So need cap limits by queue's max capacity here.
this.cachedResourceLimitsForHeadroom =
new ResourceLimits(currentResourceLimits.getLimit());
Resource queueMaxResource = getEffectiveMaxCapacityDown(
RMNodeLabelsManager.NO_LABEL, minimumAllocation);
this.cachedResourceLimitsForHeadroom.setLimit(Resources.min(
resourceCalculator, clusterResource, queueMaxResource,
currentResourceLimits.getLimit()));
}{code}
I can remember a little bit when I wrote the code: YARN-3243 fixed an issue
which ParentQueue's max capacity could be violated. I didn't consider node
label max capacity because at that time per-queue per-label capacities support
has some issues. I believe the issue should be fixed in later patches, but it
is worth to check if we need any other fixes.
[~Card], does this happen when node label is being used or not?
> CapacityScheduler infinite loop when queue is near fully utilized
> -----------------------------------------------------------------
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, yarn
> Affects Versions: 3.1.0, 2.9.1
> Environment: Ubuntu 14.04.5 and 16.04.4
> YARN is configured with one label and 5 queues.
> Reporter: Chen Yufei
> Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log,
> jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log,
> yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log,
> yarn3-resourcemanager.log, yarn3-top
>
>
> ResourceManager does not respond to any request when queue is near fully
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM
> restart, it can recover running jobs and start accepting new ones.
>
> Seems like CapacityScheduler is in an infinite loop printing out the
> following log messages (more than 25,000 lines in a second):
>
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=0.99816763
> absoluteUsedCapacity=0.99816763 used=<memory:16170624, vCores:1577>
> cluster=<memory:29441544, vCores:5792>}}
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
> assignedContainer application attempt=appattempt_1530619767030_1652_000001
> container=null
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
> clusterResource=<memory:29441544, vCores:5792> type=NODE_LOCAL
> requestedPartition=}}
>
> I encounter this problem several times after upgrading to YARN 2.9.1, while
> the same configuration works fine under version 2.7.3.
>
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a
> similar problem.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]