[jira] [Created] (YARN-11608) QueueCapacityVectorInfo NPE when accesible labels config is used

Benjamin Teke (Jira) Fri, 03 Nov 2023 12:02:13 -0700

Benjamin Teke created YARN-11608:
------------------------------------

             Summary: QueueCapacityVectorInfo NPE when accesible labels config 
is used
                 Key: YARN-11608
                 URL: https://issues.apache.org/jira/browse/YARN-11608
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.4.0
            Reporter: Benjamin Teke
            Assignee: Benjamin Teke



YARN-11514 extended the REST API to contain CapacityVectors for each configured 
node label. There is an edgecase however: during the initialization the each 
queue's capacities map will be filled with 0 capacities for the unconfigured, 
but accessible labels (where there is no configured capacity for the label, 
however the queue has access to it based on the accessible-node-labels 
property). A very basic example configuration for this is the following:

{code:java}
"yarn.scheduler.capacity.root.queues": "a, b"
 "yarn.scheduler.capacity.root.a.capacity": "50");
 "yarn.scheduler.capacity.root.a.accessible-node-labels": "root-a-default-label"
 "yarn.scheduler.capacity.root.a.maximum-capacity": "50"
 "yarn.scheduler.capacity.root.b.capacity": "50"
{code}

root.a has access to root-a-default-label, however there is no configured 
capacity for it. The capacityVectors are parsed based on the configuredCapacity 
map (created from the "accessible-node-labels.<label>.capacity" configs). When 
the scheduler info is requested the capacityVectors are collected per label, 
and the labels used for this are the keySet of the capacity map:

{code:java}
    for (String partitionName : capacities.getExistingNodeLabels()) {
      QueueCapacityVector queueCapacityVector = 
          queue.getConfiguredCapacityVector(partitionName);
      queueCapacityVectorInfo = queueCapacityVector == null ?
              new QueueCapacityVectorInfo(new QueueCapacityVector()) :
              new 
QueueCapacityVectorInfo(queue.getConfiguredCapacityVector(partitionName));
{code}

{code:java}
public Set<String> getExistingNodeLabels() {
    readLock.lock();
    try {
      return new HashSet<String>(capacitiesMap.keySet());
    } finally {
      readLock.unlock();
    }
  }
{code}

If the capacitiesMap contains entries that are not "configured", this will 
result in an NPE, breaking the UI and the REST API:

{code:java}
INTERNAL_SERVER_ERROR
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacityVectorInfo.<init>(QueueCapacityVectorInfo.java:39)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacitiesInfo.<init>(QueueCapacitiesInfo.java:61)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.populateQueueCapacities(CapacitySchedulerLeafQueueInfo.java:108)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerQueueInfo.<init>(CapacitySchedulerQueueInfo.java:137)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.<init>(CapacitySchedulerLeafQueueInfo.java:66)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.getQueues(CapacitySchedulerInfo.java:197)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.<init>(CapacitySchedulerInfo.java:94)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:399)
{code}

There is no need to create capacityVectors for the unconfigured labels, so a 
null check should solve this issue on the API side.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11608) QueueCapacityVectorInfo NPE when accesible labels config is used

Reply via email to