kyungwan nam created YARN-10305:
-----------------------------------
Summary: Lost system-credentials when restarting RM
Key: YARN-10305
URL: https://issues.apache.org/jira/browse/YARN-10305
Project: Hadoop YARN
Issue Type: Bug
Reporter: kyungwan nam
Assignee: kyungwan nam
System-credentials introduced in YARN-2704, it makes it to keep the
long-running apps.
I’ve met a situation where system-credentials lost when restarting RM.
Since then, if an app’s AM is stopped, restarting AM will be failed because NMs
do not have HDFS delegation token which is needed for resource localization.
The app has a couple of delegation token including timeline-server token and
HDFS delegation token.
When restarting RM, RM will request a new HDFS delegation token for an app that
was submitted long ago. (It's fixed by YARN-5098)
But, If an app has a couple of delegation token and an exception occur in the
token processed first, the next tokens are not processed.
I think that’s why lost system-credentials.
Here are RM’s logs at the time of restarting RM.
{code}
2020-05-19 14:25:05,712 WARN security.DelegationTokenRenewer
(DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to
add the application to the delegation token renewer on recovery.
java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
Service: 10.1.1.1:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=test-admin,
renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258,
sequenceNumber=2193, masterKeyId=340)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:503)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: HTTP status [403], message
[org.apache.hadoop.security.token.SecretManager$InvalidToken: yarn tried to
renew an expired token (TIMELINE_DELEGATION_TOKEN owner=test-admin,
renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258,
sequenceNumber=2193, masterKeyId=340) max expiration date: 2020-04-16
10:26:03,258+0900 currentTime: 2020-05-19 14:25:05,700+0900]
at
org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:166)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:319)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:235)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:437)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334)
at
org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250)
at
org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
at org.apache.hadoop.security.token.Token.renew(Token.java:512)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:629)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:626)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:625)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:489)
... 6 more
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]