Sunil Kalva created YARN-11449:
----------------------------------

             Summary: Fix Intermittent NPE while getting node labels for queue
                 Key: YARN-11449
                 URL: https://issues.apache.org/jira/browse/YARN-11449
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
            Reporter: Sunil Kalva
            Assignee: Sunil Kalva


NPE is thrown in yarn client when trying to check on queue status.



Partial stack trace:

Caused by: 
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException

at java.util.AbstractCollection.addAll(AbstractCollection.java:343)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getNodeLabelsForQueue(AbstractCSQueue.java:961)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getQueueConfigurations(AbstractCSQueue.java:528)

 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getQueueInfo(AbstractCSQueue.java:494)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:472)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:1048)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getResourceUsageReport(FiCaSchedulerApp.java:1041)

at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:408)

at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:142)

at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:954)

at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:761)

at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)

at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)

at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507)



Issue originates at

[https://git.corp.linkedin.com:1367/plugins/gitiles/hadoop/hadoop/+/li-2.10.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java#960]

because of the lack of Null Check for this.getAccessibleNodeLabels()
which can be null because

[https://git.corp.linkedin.com:1367/plugins/gitiles/hadoop/hadoop/+/li-2.10.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#598]

when reading CapacitySchedularConfiguration and the following property is not 
set

yarn.scheduler.capacity.<queuename>.accessible-node-labels
In such cases, RM usually handles by checking the same property for the queue's 
parent. However, in the above cases, even that turned out to be unset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to