[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904 ] Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM: --- cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. was (Author: zhuqi): [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904 ] Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM: --- [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. was (Author: zhuqi): [~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099756#comment-17099756 ] Szilard Nemeth edited comment on YARN-8631 at 5/5/20, 10:24 AM: Hi [~umittal], I can't see the junit test attached, just the log and the 001 patch that contains the production changes. It's okay to attach the unit tests as part of YARN-8631.002.patch file. was (Author: snemeth): Hi [~umittal], I can't see the junit test attached, just the log and the 001 patch that contains the production changes. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have observed that yarn resource manager fails to add the > application to the delegation token renewer on recovery. > Below is the error: > {code:java} > 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer > (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= > [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: > (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, > issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, > masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]] > 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to > add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098157#comment-17098157 ] Umesh Mittal edited comment on YARN-8631 at 5/3/20, 11:25 PM: -- Hi [~snemeth] Thanks for looking into this. I have attached JUNIT test, which ensures that the service is stopped in the middle of renewal process and later causing NullPointerException as described by the user. However at this stage JUNIT will result in failure. was (Author: umittal): Hi [~snemeth] Thanks for looking into this. I have attached JUNIT test, which ensures that the service is stopped in the middle of renewal process and later causing NullPointerException as described by the user. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have observed that yarn resource manager fails to add the > application to the delegation token renewer on recovery. > Below is the error: > {code:java} > 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer > (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= > [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: > (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, > issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, > masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]] > 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to > add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095214#comment-17095214 ] Szilard Nemeth edited comment on YARN-8631 at 4/29/20, 7:45 AM: Hi [~umittal], Thanks for working on this. Your analysis makes sense. As a first step, can you also post your JUnit test code along with its logs? Thanks was (Author: snemeth): Hi [~umittal], Thanks for working on this. Your analysis makes sense. Can you also post your JUnit test code along with its logs? Thanks > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have observed that yarn resource manager fails to add the > application to the delegation token renewer on recovery. > Below is the error: > {code:java} > 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer > (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= > [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: > (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, > issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, > masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]] > 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to > add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org