[ 
https://issues.apache.org/jira/browse/YARN-11641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Domok updated YARN-11641:
-------------------------------
    Description: 
h2. Error symptoms

It is not possible to modify a queue hierarchy in absolute mode when the parent 
or every child queue of the parent has 0 min resource configured.

{noformat}
2024-01-05 15:38:59,016 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager:
 Initialized queue: root.a.c
2024-01-05 15:38:59,016 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute 
minResource is used, we must make sure both parent and child all use absolute 
minResource
{noformat}

h2. Reproduction

capacity-scheduler.xml
{code:xml}
<?xml version="1.0"?>
<configuration>
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,a</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.capacity</name>
    <value>[memory=40960, vcores=16]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>[memory=1024, vcores=1]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>[memory=1024, vcores=1]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.queues</name>
    <value>b,c</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.b.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.c.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
</configuration>
{code}

updatequeue.xml
{code:xml}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sched-conf>
<update-queue>
  <queue-name>root.a</queue-name>
  <params>
    <entry>
      <key>capacity</key>
      <value>[memory=1024,vcores=1]</value>
    </entry>
    <entry>
      <key>maximum-capacity</key>
      <value>[memory=39936,vcores=15]</value>
    </entry>
  </params>
</update-queue>
</sched-conf>
{code}

{code}
$ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml 
http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn
Failed to re-init queues : Parent=root.a: When absolute minResource is used, we 
must make sure both parent and child all use absolute minResource
{code}

h2. Root cause

setChildQueues is called during reinit, where:

{code:java}
  void setChildQueues(Collection<CSQueue> childQueues) throws IOException {
    writeLock.lock();
    try {
      boolean isLegacyQueueMode = 
queueContext.getConfiguration().isLegacyQueueMode();
      if (isLegacyQueueMode) {
        QueueCapacityType childrenCapacityType =
            getCapacityConfigurationTypeForQueues(childQueues);
        QueueCapacityType parentCapacityType =
            getCapacityConfigurationTypeForQueues(ImmutableList.of(this));

        if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE
            || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) {
          // We don't allow any mixed absolute + {weight, percentage} between
          // children and parent
          if (childrenCapacityType != parentCapacityType && !this.getQueuePath()
              .equals(CapacitySchedulerConfiguration.ROOT)) {
            throw new IOException("Parent=" + this.getQueuePath()
                + ": When absolute minResource is used, we must make sure both "
                + "parent and child all use absolute minResource");
          }
{code}

The parent or childrenCapacityType will be considered as PERCENTAGE, because 
getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:

{code:java}
        if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
            .equals(Resources.none())) {
          absoluteMinResSet = true;
{code}

(It only happens in legacy queue mode.)

h2. Possible fixes

Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues using 
the capacityVector:
{code:java}
    for (CSQueue queue : queues) {
      for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
        Set<QueueCapacityVector.ResourceUnitCapacityType> definedCapacityTypes =
            
queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes();
        if (definedCapacityTypes.size() == 1) {
          QueueCapacityVector.ResourceUnitCapacityType next = 
definedCapacityTypes.iterator().next();
          if (Objects.requireNonNull(next) == PERCENTAGE) {
            percentageIsSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses percentage mode}. ");
          } else if (next == 
QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) {
            absoluteMinResSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses absolute mode}. ");
          } else if (next == 
QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) {
            weightIsSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses weight mode}. ");
          }
        } else if (definedCapacityTypes.size() > 1) {
          mixedIsSet = true;
          diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
              .append(" uses mixed mode}. ");
        }
      }
    }
{code}

Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:
{code:java}
-        if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
-            .equals(Resources.none())) {
+        if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), 
nodeLabel)) {
{code}

  was:
h2. Error symptoms

It is not possible to modify a queue hierarchy in absolute mode when the parent 
or every child queue of the parent has 0 min resource configured.

{noformat}
2024-01-05 15:38:59,016 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager:
 Initialized queue: root.a.c
2024-01-05 15:38:59,016 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute 
minResource is used, we must make sure both parent and child all use absolute 
minResource
{noformat}

h2. Reproduction

capacity-scheduler.xml
{code:xml}
<?xml version="1.0"?>
<configuration>
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,a</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.capacity</name>
    <value>[memory=40960, vcores=16]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>[memory=1024, vcores=1]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>[memory=1024, vcores=1]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.queues</name>
    <value>b,c</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.b.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.c.capacity</name>
    <value>[memory=0, vcores=0]</value>
  </property>
  <property>
    <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name>
    <value>[memory=39936, vcores=15]</value>
  </property>
</configuration>
{code}

{code:xml}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sched-conf>
<update-queue>
  <queue-name>root.a</queue-name>
  <params>
    <entry>
      <key>capacity</key>
      <value>[memory=1024,vcores=1]</value>
    </entry>
    <entry>
      <key>maximum-capacity</key>
      <value>[memory=39936,vcores=15]</value>
    </entry>
  </params>
</update-queue>
</sched-conf>
{code}

{code}
$ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml 
http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn
Failed to re-init queues : Parent=root.a: When absolute minResource is used, we 
must make sure both parent and child all use absolute minResource
{code}

h2. Root cause

setChildQueues is called during reinit, where:

{code:java}
  void setChildQueues(Collection<CSQueue> childQueues) throws IOException {
    writeLock.lock();
    try {
      boolean isLegacyQueueMode = 
queueContext.getConfiguration().isLegacyQueueMode();
      if (isLegacyQueueMode) {
        QueueCapacityType childrenCapacityType =
            getCapacityConfigurationTypeForQueues(childQueues);
        QueueCapacityType parentCapacityType =
            getCapacityConfigurationTypeForQueues(ImmutableList.of(this));

        if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE
            || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) {
          // We don't allow any mixed absolute + {weight, percentage} between
          // children and parent
          if (childrenCapacityType != parentCapacityType && !this.getQueuePath()
              .equals(CapacitySchedulerConfiguration.ROOT)) {
            throw new IOException("Parent=" + this.getQueuePath()
                + ": When absolute minResource is used, we must make sure both "
                + "parent and child all use absolute minResource");
          }
{code}

The parent or childrenCapacityType will be considered as PERCENTAGE, because 
getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:

{code:java}
        if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
            .equals(Resources.none())) {
          absoluteMinResSet = true;
{code}

h2. Possible fixes

Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues using 
the capacityVector:
{code:java}
    for (CSQueue queue : queues) {
      for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
        Set<QueueCapacityVector.ResourceUnitCapacityType> definedCapacityTypes =
            
queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes();
        if (definedCapacityTypes.size() == 1) {
          QueueCapacityVector.ResourceUnitCapacityType next = 
definedCapacityTypes.iterator().next();
          if (Objects.requireNonNull(next) == PERCENTAGE) {
            percentageIsSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses percentage mode}. ");
          } else if (next == 
QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) {
            absoluteMinResSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses absolute mode}. ");
          } else if (next == 
QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) {
            weightIsSet = true;
            diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
                .append(" uses weight mode}. ");
          }
        } else if (definedCapacityTypes.size() > 1) {
          mixedIsSet = true;
          diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
label=").append(nodeLabel)
              .append(" uses mixed mode}. ");
        }
      }
    }
{code}

Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:
{code:java}
-        if (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
-            .equals(Resources.none())) {
+        if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), 
nodeLabel)) {
{code}


> Can't update a queue hierarchy in absolute mode when the configured 
> capacities are zero
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-11641
>                 URL: https://issues.apache.org/jira/browse/YARN-11641
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.4.0
>            Reporter: Tamas Domok
>            Assignee: Tamas Domok
>            Priority: Major
>
> h2. Error symptoms
> It is not possible to modify a queue hierarchy in absolute mode when the 
> parent or every child queue of the parent has 0 min resource configured.
> {noformat}
> 2024-01-05 15:38:59,016 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager:
>  Initialized queue: root.a.c
> 2024-01-05 15:38:59,016 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
> thrown when modifying configuration.
> java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute 
> minResource is used, we must make sure both parent and child all use absolute 
> minResource
> {noformat}
> h2. Reproduction
> capacity-scheduler.xml
> {code:xml}
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>yarn.scheduler.capacity.root.queues</name>
>     <value>default,a</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.capacity</name>
>     <value>[memory=40960, vcores=16]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.default.capacity</name>
>     <value>[memory=1024, vcores=1]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
>     <value>[memory=1024, vcores=1]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.capacity</name>
>     <value>[memory=0, vcores=0]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
>     <value>[memory=39936, vcores=15]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.queues</name>
>     <value>b,c</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.b.capacity</name>
>     <value>[memory=0, vcores=0]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name>
>     <value>[memory=39936, vcores=15]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.c.capacity</name>
>     <value>[memory=0, vcores=0]</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name>
>     <value>[memory=39936, vcores=15]</value>
>   </property>
> </configuration>
> {code}
> updatequeue.xml
> {code:xml}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <sched-conf>
> <update-queue>
>   <queue-name>root.a</queue-name>
>   <params>
>     <entry>
>       <key>capacity</key>
>       <value>[memory=1024,vcores=1]</value>
>     </entry>
>     <entry>
>       <key>maximum-capacity</key>
>       <value>[memory=39936,vcores=15]</value>
>     </entry>
>   </params>
> </update-queue>
> </sched-conf>
> {code}
> {code}
> $ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml 
> http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn
> Failed to re-init queues : Parent=root.a: When absolute minResource is used, 
> we must make sure both parent and child all use absolute minResource
> {code}
> h2. Root cause
> setChildQueues is called during reinit, where:
> {code:java}
>   void setChildQueues(Collection<CSQueue> childQueues) throws IOException {
>     writeLock.lock();
>     try {
>       boolean isLegacyQueueMode = 
> queueContext.getConfiguration().isLegacyQueueMode();
>       if (isLegacyQueueMode) {
>         QueueCapacityType childrenCapacityType =
>             getCapacityConfigurationTypeForQueues(childQueues);
>         QueueCapacityType parentCapacityType =
>             getCapacityConfigurationTypeForQueues(ImmutableList.of(this));
>         if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE
>             || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) {
>           // We don't allow any mixed absolute + {weight, percentage} between
>           // children and parent
>           if (childrenCapacityType != parentCapacityType && 
> !this.getQueuePath()
>               .equals(CapacitySchedulerConfiguration.ROOT)) {
>             throw new IOException("Parent=" + this.getQueuePath()
>                 + ": When absolute minResource is used, we must make sure 
> both "
>                 + "parent and child all use absolute minResource");
>           }
> {code}
> The parent or childrenCapacityType will be considered as PERCENTAGE, because 
> getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here:
> {code:java}
>         if 
> (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
>             .equals(Resources.none())) {
>           absoluteMinResSet = true;
> {code}
> (It only happens in legacy queue mode.)
> h2. Possible fixes
> Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues 
> using the capacityVector:
> {code:java}
>     for (CSQueue queue : queues) {
>       for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>         Set<QueueCapacityVector.ResourceUnitCapacityType> 
> definedCapacityTypes =
>             
> queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes();
>         if (definedCapacityTypes.size() == 1) {
>           QueueCapacityVector.ResourceUnitCapacityType next = 
> definedCapacityTypes.iterator().next();
>           if (Objects.requireNonNull(next) == PERCENTAGE) {
>             percentageIsSet = true;
>             diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
> label=").append(nodeLabel)
>                 .append(" uses percentage mode}. ");
>           } else if (next == 
> QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) {
>             absoluteMinResSet = true;
>             diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
> label=").append(nodeLabel)
>                 .append(" uses absolute mode}. ");
>           } else if (next == 
> QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) {
>             weightIsSet = true;
>             diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
> label=").append(nodeLabel)
>                 .append(" uses weight mode}. ");
>           }
>         } else if (definedCapacityTypes.size() > 1) {
>           mixedIsSet = true;
>           diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", 
> label=").append(nodeLabel)
>               .append(" uses mixed mode}. ");
>         }
>       }
>     }
> {code}
> Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.:
> {code:java}
> -        if 
> (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel)
> -            .equals(Resources.none())) {
> +        if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), 
> nodeLabel)) {
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to