[
https://issues.apache.org/jira/browse/YARN-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shilun Fan updated YARN-11608:
------------------------------
Target Version/s: 3.4.0
> QueueCapacityVectorInfo NPE when accesible labels config is used
> ----------------------------------------------------------------
>
> Key: YARN-11608
> URL: https://issues.apache.org/jira/browse/YARN-11608
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.4.0
> Reporter: Benjamin Teke
> Assignee: Benjamin Teke
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
>
> YARN-11514 extended the REST API to contain CapacityVectors for each
> configured node label. There is an edgecase however: during the
> initialization the each queue's capacities map will be filled with 0
> capacities for the unconfigured, but accessible labels (where there is no
> configured capacity for the label, however the queue has access to it based
> on the accessible-node-labels property). A very basic example configuration
> for this is the following:
> {code:java}
> "yarn.scheduler.capacity.root.queues": "a, b"
> "yarn.scheduler.capacity.root.a.capacity": "50");
> "yarn.scheduler.capacity.root.a.accessible-node-labels":
> "root-a-default-label"
> "yarn.scheduler.capacity.root.a.maximum-capacity": "50"
> "yarn.scheduler.capacity.root.b.capacity": "50"
> {code}
> root.a has access to root-a-default-label, however there is no configured
> capacity for it. The capacityVectors are parsed based on the
> configuredCapacity map (created from the
> "accessible-node-labels.<label>.capacity" configs). When the scheduler info
> is requested the capacityVectors are collected per label, and the labels used
> for this are the keySet of the capacity map:
> {code:java}
> for (String partitionName : capacities.getExistingNodeLabels()) {
> QueueCapacityVector queueCapacityVector =
> queue.getConfiguredCapacityVector(partitionName);
> queueCapacityVectorInfo = queueCapacityVector == null ?
> new QueueCapacityVectorInfo(new QueueCapacityVector()) :
> new
> QueueCapacityVectorInfo(queue.getConfiguredCapacityVector(partitionName));
> {code}
> {code:java}
> public Set<String> getExistingNodeLabels() {
> readLock.lock();
> try {
> return new HashSet<String>(capacitiesMap.keySet());
> } finally {
> readLock.unlock();
> }
> }
> {code}
> If the capacitiesMap contains entries that are not "configured", this will
> result in an NPE, breaking the UI and the REST API:
> {code:java}
> INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacityVectorInfo.<init>(QueueCapacityVectorInfo.java:39)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacitiesInfo.<init>(QueueCapacitiesInfo.java:61)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.populateQueueCapacities(CapacitySchedulerLeafQueueInfo.java:108)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerQueueInfo.<init>(CapacitySchedulerQueueInfo.java:137)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.<init>(CapacitySchedulerLeafQueueInfo.java:66)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.getQueues(CapacitySchedulerInfo.java:197)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.<init>(CapacitySchedulerInfo.java:94)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:399)
> {code}
> There is no need to create capacityVectors for the unconfigured labels, so a
> null check should solve this issue on the API side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]