[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312904#comment-17312904
 ] 

Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
-------------------------------------------------------

cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
    DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
    if (isServiceStarted) {
      Future<?> future =
          renewerService.submit(new DelegationTokenRenewerRunnable(evt));
      futures.put(evt, future);
    } else {
      pendingEventQueue.add(evt);
    }
  } finally {
    serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
    DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
        (DelegationTokenRenewerAppSubmitEvent) evt;
    handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
    DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
        (DelegationTokenRenewerAppRecoverEvent) evt;
    handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
      DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
    DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
    DelegationTokenRenewerAppRecoverEvent event) {
  try {
    // Setup tokens for renewal during recovery
    DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
    LOG.warn("Unable to add the application to the delegation token"
        + " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 


was (Author: zhuqi):
[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
    DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
    if (isServiceStarted) {
      Future<?> future =
          renewerService.submit(new DelegationTokenRenewerRunnable(evt));
      futures.put(evt, future);
    } else {
      pendingEventQueue.add(evt);
    }
  } finally {
    serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
    DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
        (DelegationTokenRenewerAppSubmitEvent) evt;
    handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
    DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
        (DelegationTokenRenewerAppRecoverEvent) evt;
    handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
      DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
    DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
    DelegationTokenRenewerAppRecoverEvent event) {
  try {
    // Setup tokens for renewal during recovery
    DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
    LOG.warn("Unable to add the application to the delegation token"
        + " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> --------------------------------------------------------------------------------
>
>                 Key: YARN-8631
>                 URL: https://issues.apache.org/jira/browse/YARN-8631
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.1.0
>            Reporter: Sanjay Divgi
>            Assignee: Umesh Mittal
>            Priority: Blocker
>         Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-000004.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to