Tamas Domok created YARN-11641:
----------------------------------
Summary: Can't update a queue hierarchy in absolute mode when the
configured capacities are zero
Key: YARN-11641
URL: https://issues.apache.org/jira/browse/YARN-11641
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 3.4.0
Reporter: Tamas Domok
Assignee: Tamas Domok
h2. Error symptoms
It is not possible to modify a queue hierarchy in absolute mode when the parent
or every child queue of the parent has 0 min resource configured.
{noformat}
2024-01-05 15:38:59,016 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager:
Initialized queue: root.a.c
2024-01-05 15:38:59,016 ERROR
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception
thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute
minResource is used, we must make sure both parent and child all use absolute
minResource
{noformat}
h2. Reproduction
capacity-scheduler.xml
{code:xml}
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,a</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>[memory=40960, vcores=16]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>[memory=1024, vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>[memory=1024, vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.capacity</name>
<value>[memory=0, vcores=0]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
<value>[memory=39936, vcores=15]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.queues</name>
<value>b,c</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.b.capacity</name>
<value>[memory=0, vcores=0]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name>
<value>[memory=39936, vcores=15]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.c.capacity</name>
<value>[memory=0, vcores=0]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name>
<value>[memory=39936, vcores=15]</value>
</property>
</configuration>
{code}
{code:xml}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sched-conf>
<update-queue>
<queue-name>root.a</queue-name>
<params>
<entry>
<key>capacity</key>
<value>[memory=1024,vcores=1]</value>
</entry>
<entry>
<key>maximum-capacity</key>
<value>[memory=39936,vcores=15]</value>
</entry>
</params>
</update-queue>
</sched-conf>
{code}
{code}
$ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml
http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn
Failed to re-init queues : Parent=root.a: When absolute minResource is used, we
must make sure both parent and child all use absolute minResource
{code}
h2. Root cause
setChildQueues is called during reinit, where:
{code:java}
void setChildQueues(Collection<CSQueue> childQueues) throws IOException {
writeLock.lock();
try {
boolean isLegacyQueueMode =
queueContext.getConfiguration().isLegacyQueueMode();
if (isLegacyQueueMode) {
QueueCapacityType childrenCapacityType =
getCapacityConfigurationTypeForQueues(childQueues);
QueueCapacityType parentCapacityType =
getCapacityConfigurationTypeForQueues(ImmutableList.of(this));
if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE
|| parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) {
// We don't allow any mixed absolute + {weight, percentage} between
// children and parent
if (childrenCapacityType != parentCapacityType && !this.getQueuePath()
.equals(CapacitySchedulerConfiguration.ROOT)) {
throw new IOException("Parent=" + this.getQueuePath()
+ ": When absolute minResource is used, we must make sure both "
+ "parent and child all use absolute minResource");
}
{code}
The parent or childrenCapacityType will be considered as PERCENTAGE, because
getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:
{code:java}
if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
.equals(Resources.none())) {
absoluteMinResSet = true;
{code}
h2. Possible fixes
Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues using
the capacityVector:
{code:java}
for (CSQueue queue : queues) {
for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
Set<QueueCapacityVector.ResourceUnitCapacityType> definedCapacityTypes =
queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes();
if (definedCapacityTypes.size() == 1) {
QueueCapacityVector.ResourceUnitCapacityType next =
definedCapacityTypes.iterator().next();
if (Objects.requireNonNull(next) == PERCENTAGE) {
percentageIsSet = true;
diagMsg.append("{Queue=").append(queue.getQueuePath()).append(",
label=").append(nodeLabel)
.append(" uses percentage mode}. ");
} else if (next ==
QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) {
absoluteMinResSet = true;
diagMsg.append("{Queue=").append(queue.getQueuePath()).append(",
label=").append(nodeLabel)
.append(" uses absolute mode}. ");
} else if (next ==
QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) {
weightIsSet = true;
diagMsg.append("{Queue=").append(queue.getQueuePath()).append(",
label=").append(nodeLabel)
.append(" uses weight mode}. ");
}
} else if (definedCapacityTypes.size() > 1) {
mixedIsSet = true;
diagMsg.append("{Queue=").append(queue.getQueuePath()).append(",
label=").append(nodeLabel)
.append(" uses mixed mode}. ");
}
}
}
{code}
Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:
{code:java}
- if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
- .equals(Resources.none())) {
+ if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(),
nodeLabel)) {
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]