[
https://issues.apache.org/jira/browse/YARN-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019264#comment-18019264
]
ASF GitHub Bot commented on YARN-11819:
---------------------------------------
github-actions[bot] commented on PR #7682:
URL: https://github.com/apache/hadoop/pull/7682#issuecomment-3272752049
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Request a HDFS delegation token refresh even at
> DelegationTokenRenewerAppSubmitEvent
> ------------------------------------------------------------------------------------
>
> Key: YARN-11819
> URL: https://issues.apache.org/jira/browse/YARN-11819
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.3.6, 3.5.0, 3.4.2
> Reporter: Abhey Rana
> Assignee: Abhey Rana
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> We observed in our production environment that the jobs submitted with a RM
> delegation token were continually failing to start.
> Upon further investigation we figured out the following Stack Trace as the
> culprit -
> {code:java}
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,
> Service: ha-hdfs:prod-EMPTY-hbase4a, Ident: (token for xyz:
> HDFS_DELEGATION_TOKEN
> owner=hbase/[email protected], renewer=xyz,
> realUser=, issueDate=1744651400720, maxDate=1745256200720,
> sequenceNumber=2575348, masterKeyId=790)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:533)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){code}
> We took a look at the code and figured out that as part of the
> handleAppSubmitEvent we do catch the IOException and do request a
> DelegationTokenRefresh but we only do that for
> DelegationTokenRenewerAppRecoverEvent
> Code Pointer -
> {code:java}
> if (ioe instanceof SecretManager.InvalidToken
> && dttr.maxDate < Time.now()
> && evt instanceof DelegationTokenRenewerAppRecoverEvent
> && token.getKind().equals(HDFS_DELEGATION_KIND)) {
> LOG.info("Failed to renew hdfs token " + dttr
> + " on recovery as it expired, requesting new hdfs token
> for "
> + applicationId + ", user=" + evt.getUser(), ioe);
> requestNewHdfsDelegationTokenAsProxyUser(
> Arrays.asList(applicationId), evt.getUser(),
> evt.shouldCancelAtEnd());
> continue;
> }{code}
> The idea is to add a or statement in the event check.
> evt instanceof DelegationTokenRenewerAppRecoverEvent || evt instanceof
> DelegationTokenRenewerAppSubmitEvent
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]