[ https://issues.apache.org/jira/browse/YARN-11776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan resolved YARN-11776. ------------------------------- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0, 3.4.2 (was: 3.4.2) Resolution: Fixed > Handle NPE in the RMDelegationTokenIdentifier if localServiceAddress is null > ---------------------------------------------------------------------------- > > Key: YARN-11776 > URL: https://issues.apache.org/jira/browse/YARN-11776 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.4.1 > Reporter: Abhey Rana > Assignee: Abhey Rana > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Original Estimate: 96h > Remaining Estimate: 96h > > We observed in our production environment that the jobs submitted with a RM > delegation token were continually failing after the RM failover took place. > Upon further investigation we figured out the following Stack Trace as the > culprit - > {code:java} > 2025-02-24 11:23:21,511 WARN [DelegationTokenRenewer #400699] > security.DelegationTokenRenewer - Unable to add the application to the > delegation token renewer. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:144) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:97) > at org.apache.hadoop.security.token.Token.renew(Token.java:500) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:661) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:658) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:519) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750){code} > We anticipate that there's an issue the way localServiceAddress is > instantiated due to the internal network issue. > However, In our humble opinoin we should add a null check for this variable. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org