[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904
 ] 

Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
---

cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 


was (Author: zhuqi):
[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA 

[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904
 ] 

Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
---

[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 


was (Author: zhuqi):
[~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have 

[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2020-05-05 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099756#comment-17099756
 ] 

Szilard Nemeth edited comment on YARN-8631 at 5/5/20, 10:24 AM:


Hi [~umittal],
I can't see the junit test attached, just the log and the 001 patch that 
contains the production changes.
It's okay to attach the unit tests as part of YARN-8631.002.patch file.


was (Author: snemeth):
Hi [~umittal],
I can't see the junit test attached, just the log and the 001 patch that 
contains the production changes.

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2020-05-03 Thread Umesh Mittal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098157#comment-17098157
 ] 

Umesh Mittal edited comment on YARN-8631 at 5/3/20, 11:25 PM:
--

Hi [~snemeth]

Thanks for looking into this.

I have attached JUNIT test, which ensures that the service is stopped in the 
middle of renewal process and later causing NullPointerException as described 
by the user.

However at this stage JUNIT will result in failure.

 

 


was (Author: umittal):
Hi [~snemeth]

Thanks for looking into this.

I have attached JUNIT test, which ensures that the service is stopped in the 
middle of renewal process and later causing NullPointerException as described 
by the user.

 

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2020-04-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095214#comment-17095214
 ] 

Szilard Nemeth edited comment on YARN-8631 at 4/29/20, 7:45 AM:


Hi [~umittal],
Thanks for working on this. 
Your analysis makes sense.

As a first step, can you also post your JUnit test code along with its logs? 
Thanks


was (Author: snemeth):
Hi [~umittal],
Thanks for working on this. 
Your analysis makes sense.

Can you also post your JUnit test code along with its logs? 
Thanks

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org