[
https://issues.apache.org/jira/browse/YARN-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885577#comment-17885577
]
ASF GitHub Bot commented on YARN-11719:
---------------------------------------
hadoop-yetus commented on PR #7077:
URL: https://github.com/apache/hadoop/pull/7077#issuecomment-2380640345
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 16m 57s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include
any new or modified tests. Please justify why no new tests are needed for this
patch. Also please list what manual steps were performed to verify this patch.
|
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 44m 48s | | trunk passed |
| +1 :green_heart: | compile | 1m 3s | | trunk passed with JDK
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 |
| +1 :green_heart: | compile | 0m 56s | | trunk passed with JDK
Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| +1 :green_heart: | checkstyle | 0m 56s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 2s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 0s | | trunk passed with JDK
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 |
| +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK
Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| +1 :green_heart: | spotbugs | 2m 0s | | trunk passed |
| +1 :green_heart: | shadedclient | 36m 26s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 0m 50s | | the patch passed |
| +1 :green_heart: | compile | 0m 55s | | the patch passed with JDK
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 |
| +1 :green_heart: | javac | 0m 55s | | the patch passed |
| +1 :green_heart: | compile | 0m 48s | | the patch passed with JDK
Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| +1 :green_heart: | javac | 0m 48s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 0m 41s |
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7077/1/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
|
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
The patch generated 1 new + 32 unchanged - 0 fixed = 33 total (was 32) |
| +1 :green_heart: | mvnsite | 0m 50s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 45s | | the patch passed with JDK
Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 |
| +1 :green_heart: | javadoc | 0m 42s | | the patch passed with JDK
Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| +1 :green_heart: | spotbugs | 2m 0s | | the patch passed |
| +1 :green_heart: | shadedclient | 35m 39s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 109m 46s | |
hadoop-yarn-server-resourcemanager in the patch passed. |
| +1 :green_heart: | asflicense | 0m 38s | | The patch does not
generate ASF License warnings. |
| | | 259m 19s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.47 ServerAPI=1.47 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7077/1/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/7077 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux c52bb8302dea 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5
20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 79eb1bd99287b784ec6b7cc44cf9fa22c1cea2bb |
| Default Java | Private Build-1.8.0_422-8u422-b05-1~20.04-b05 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
|
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7077/1/testReport/ |
| Max. process+thread count | 924 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
|
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7077/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> The job is stuck in the new state.
> ----------------------------------
>
> Key: YARN-11719
> URL: https://issues.apache.org/jira/browse/YARN-11719
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.3.1
> Reporter: zeekling
> Priority: Major
> Labels: pull-request-available
>
> After I restarted the router in the production environment, several jobs
> remained in the new state. and i found related log here.
>
> {code:java}
> 2024-08-30 00:12:41,380 | WARN | DelegationTokenRenewer #667 | Unable to add
> the application to the delegation token renewer. |
> DelegationTokenRenewer.java:1215
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,
> Service: ha-hdfs:nsfed, Ident: (token for admintest: HDFS_DELEGATION_TOKEN
> owner=admintest@9FCE074E_691F_480F_98F5_58C1CA310829.COM, renewer=mapred,
> realUser=, issueDate=1724947875776, maxDate=1725552675776,
> sequenceNumber=156, masterKeyId=116)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:641)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$2200(DelegationTokenRenewer.java:86)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1211)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1188)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)Caused by:
> java.io.InterruptedIOException: Retry interrupted
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:141)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:112)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
> at com.sun.proxy.$Proxy96.renewDelegationToken(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:849)
> at org.apache.hadoop.security.token.Token.renew(Token.java:498)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:771)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:768)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1890)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:767)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:627)
>
> ... 8 more
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:135)
> ... 20 more
> 2024-08-30 00:12:41,380 | WARN | DelegationTokenRenewer #667 |
> AsyncDispatcher thread interrupted | AsyncDispatcher.java:437
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1233)
> at
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:434)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1221)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1188)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> 2024-08-30 00:12:41,381 | WARN | DelegationTokenRenewer #667 | Caught
> exception in thread DelegationTokenRenewer #667: | ExecutorHelper.java:63
> java.util.concurrent.CancellationException
> at java.util.concurrent.FutureTask.report(FutureTask.java:121)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:48)
> at
> org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor.afterExecute(HadoopThreadPoolExecutor.java:90)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
> {code}
>
> params:
> yarn.resourcemanager.delegation-token-renewer.thread-time=80S
> dfs.client.socket-timeout=60S
> When the Router is restarted, RM is renewing the token. At this time, the
> token renewal will try multiple times, and it will sleep for a while between
> each retry. After more than 80 seconds, the token renewal thread will be
> interrupted by the following code
>
> {code:java}
> DelegationTokenRenewerEvent evt = dtrf.getEvt();
> Future<?> future = dtrf.getFuture();
> try {
> future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS);
> } catch (TimeoutException e) {
> // Cancel thread and retry the same event in case of timeout.
> if (!future.isDone() && !future.isCancelled()) {
> future.cancel(true);
> if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) {
> renewalTimer.schedule(
> getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt),
> tokenRenewerThreadRetryInterval);
> } else {
> LOG.info(
> "Exhausted max retry attempts {} in token renewer "
> + "thread for {}",
> tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId());
> }
> }
> } catch (Exception e) {
> LOG.info("Problem in submitting renew tasks in token renewer "
> + "thread.", e);
> } {code}
> After the interruption, it will be captured by the following code, and the
> interruption will be re-triggered, and an exception will be thrown. The renew
> token operation fails, and the state machine of the job needs to change from
> new to rejected.
> {code:java}
> try {
> Thread.sleep(retryInfo.delay);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> if (LOG.isDebugEnabled()) {
> LOG.debug("Interrupted while waiting to retry", e);
> }
> InterruptedIOException intIOE = new InterruptedIOException(
> "Retry interrupted");
> intIOE.initCause(e);
> throw intIOE;
> } {code}
> However, since the interrupt signal is re-triggered, the interrupt signal
> will be detected in the following code of AsyncDispatcher.java, resulting in
> the failure of state transition.
> {code:java}
> try {
> eventQueue.put(event);
> } catch (InterruptedException e) {
> if (!stopped) {
> LOG.warn("AsyncDispatcher thread interrupted", e);
> }
> // Need to reset drained flag to true if event queue is empty,
> // otherwise dispatcher will hang on stop.
> drained = eventQueue.isEmpty();
> throw new YarnRuntimeException(e);
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]