[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312904#comment-17312904 ]
Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM: ------------------------------------------------------- cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future<?> future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. was (Author: zhuqi): [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future<?> future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > -------------------------------------------------------------------------------- > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 3.1.0 > Reporter: Sanjay Divgi > Assignee: Umesh Mittal > Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-000004.log > > > On HA cluster we have observed that yarn resource manager fails to add the > application to the delegation token renewer on recovery. > Below is the error: > {code:java} > 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer > (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= > [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: > (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, > issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, > masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]] > 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to > add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org