Benjamin Teke created YARN-11608: ------------------------------------ Summary: QueueCapacityVectorInfo NPE when accesible labels config is used Key: YARN-11608 URL: https://issues.apache.org/jira/browse/YARN-11608 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.4.0 Reporter: Benjamin Teke Assignee: Benjamin Teke
YARN-11514 extended the REST API to contain CapacityVectors for each configured node label. There is an edgecase however: during the initialization the each queue's capacities map will be filled with 0 capacities for the unconfigured, but accessible labels (where there is no configured capacity for the label, however the queue has access to it based on the accessible-node-labels property). A very basic example configuration for this is the following: {code:java} "yarn.scheduler.capacity.root.queues": "a, b" "yarn.scheduler.capacity.root.a.capacity": "50"); "yarn.scheduler.capacity.root.a.accessible-node-labels": "root-a-default-label" "yarn.scheduler.capacity.root.a.maximum-capacity": "50" "yarn.scheduler.capacity.root.b.capacity": "50" {code} root.a has access to root-a-default-label, however there is no configured capacity for it. The capacityVectors are parsed based on the configuredCapacity map (created from the "accessible-node-labels.<label>.capacity" configs). When the scheduler info is requested the capacityVectors are collected per label, and the labels used for this are the keySet of the capacity map: {code:java} for (String partitionName : capacities.getExistingNodeLabels()) { QueueCapacityVector queueCapacityVector = queue.getConfiguredCapacityVector(partitionName); queueCapacityVectorInfo = queueCapacityVector == null ? new QueueCapacityVectorInfo(new QueueCapacityVector()) : new QueueCapacityVectorInfo(queue.getConfiguredCapacityVector(partitionName)); {code} {code:java} public Set<String> getExistingNodeLabels() { readLock.lock(); try { return new HashSet<String>(capacitiesMap.keySet()); } finally { readLock.unlock(); } } {code} If the capacitiesMap contains entries that are not "configured", this will result in an NPE, breaking the UI and the REST API: {code:java} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacityVectorInfo.<init>(QueueCapacityVectorInfo.java:39) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacitiesInfo.<init>(QueueCapacitiesInfo.java:61) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.populateQueueCapacities(CapacitySchedulerLeafQueueInfo.java:108) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerQueueInfo.<init>(CapacitySchedulerQueueInfo.java:137) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.<init>(CapacitySchedulerLeafQueueInfo.java:66) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.getQueues(CapacitySchedulerInfo.java:197) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.<init>(CapacitySchedulerInfo.java:94) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:399) {code} There is no need to create capacityVectors for the unconfigured labels, so a null check should solve this issue on the API side. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org