[ 
https://issues.apache.org/jira/browse/YARN-11776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11776.
-------------------------------
       Fix Version/s: 3.5.0
        Hadoop Flags: Reviewed
    Target Version/s: 3.5.0, 3.4.2  (was: 3.4.2)
          Resolution: Fixed

> Handle NPE in the RMDelegationTokenIdentifier if localServiceAddress is null
> ----------------------------------------------------------------------------
>
>                 Key: YARN-11776
>                 URL: https://issues.apache.org/jira/browse/YARN-11776
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.4.1
>            Reporter: Abhey Rana
>            Assignee: Abhey Rana
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> We observed in our production environment that the jobs submitted with a RM 
> delegation token were continually failing after the RM failover took place.
> Upon further investigation we figured out the following Stack Trace as the 
> culprit -
> {code:java}
> 2025-02-24 11:23:21,511 WARN  [DelegationTokenRenewer #400699] 
> security.DelegationTokenRenewer - Unable to add the application to the 
> delegation token renewer.
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:144)
>       at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:97)
>       at org.apache.hadoop.security.token.Token.renew(Token.java:500)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:661)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:658)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:657)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:519)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750){code}
> We anticipate that there's an issue the way localServiceAddress is 
> instantiated due to the internal network issue.
> However, In our humble opinoin we should add a null check for this variable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to