[
https://issues.apache.org/jira/browse/YARN-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704057#comment-17704057
]
Tamas Domok commented on YARN-11461:
------------------------------------
Before the fix:
{code}
2023-03-23 10:10:29,507 WARN org.apache.hadoop.ipc.Server: IPC Server handler
40 on 8032, call Call#15 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.submitApplication from
172.27.19.2:55866
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.determineMissingParents(CapacitySchedulerQueueManager.java:574)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getPermissionsForDynamicQueue(CapacitySchedulerQueueManager.java:612)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:425)
at
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:334)
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:666)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:290)
at
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:611)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
{code}
After the fix:
{code}
2023-03-23 10:22:41,998 ERROR
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Could not auto-create leaf queue ABC.ag due to :
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerDynamicEditException:
Could not auto-create queue ABC.ag parent queue does not exist.
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.determineMissingParents(CapacitySchedulerQueueManager.java:588)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createAutoQueue(CapacitySchedulerQueueManager.java:675)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:511)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:898)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:962)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1920)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:170)
at
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
2023-03-23 10:22:42,000 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating
application application_1679566843032_0002 with final state: FAILED
{code}
{code}
23/03/23 10:22:42 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException:
Failed to submit application_1679566843032_0002 to YARN : Application
application_1679566843032_0002 submitted by user systest to unknown queue:
ABC.ag
{code}
> NPE in determineMissingParents when the queue is invalid
> --------------------------------------------------------
>
> Key: YARN-11461
> URL: https://issues.apache.org/jira/browse/YARN-11461
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.4.0
> Reporter: Tamas Domok
> Assignee: Tamas Domok
> Priority: Major
>
> When trying to submit an app to an invalid queue (e.g.: invalid.queue) the RM
> crashes with a NPE.
> Reproduction:
> {code}
> spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G
> --queue invalid.queue spark-examples.jar 100
> {code}
> Stacktrace:
> {code}
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.determineMissingParents(CapacitySchedulerQueueManager.java:574)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getPermissionsForDynamicQueue(CapacitySchedulerQueueManager.java:612)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:425)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:334)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:666)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:290)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:611)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]