[ 
https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989378#comment-16989378
 ] 

Eric Yang commented on YARN-9956:
---------------------------------

Thank you for the patch, [~prabhujoseph].  Overall, the patch looks good.

The failed unit test looks a bit concerning.  I am not sure how it is related 
to the changes.  Can you confirm this is not an issue?


> Improve connection error message for YARN ApiServerClient
> ---------------------------------------------------------
>
>                 Key: YARN-9956
>                 URL: https://issues.apache.org/jira/browse/YARN-9956
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Yang
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-9956-001.patch, YARN-9956-002.patch
>
>
> In HA environment, yarn.resourcemanager.webapp.address configuration is 
> optional.  ApiServiceClient may produce confusing error message like this:
> {code}
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host1.example.com:8090
> 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
> host2.example.com:8090
> 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - LOOKING_UP_SERVER)
>       at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
>       at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
>       at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
>       at java.base/java.security.AccessController.doPrivileged(Native Method)
>       at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>       at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
> Caused by: KrbException: Server not found in Kerberos database (7) - 
> LOOKING_UP_SERVER
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
>       at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
>       at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
>       at 
> java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
>       at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
>       ... 15 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>       at 
> java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
>       at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
>       at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
>       ... 21 more
> 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
> java.io.IOException: java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
>       at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
> Caused by: java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
>       ... 6 more
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - LOOKING_UP_SERVER)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
>       at java.base/java.security.AccessController.doPrivileged(Native Method)
>       at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>       ... 8 more
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
>       at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
>       at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
>       at 
> java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
>       at 
> org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
>       ... 12 more
> Caused by: KrbException: Server not found in Kerberos database (7) - 
> LOOKING_UP_SERVER
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
>       at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
>       at 
> java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
>       at 
> java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
>       at 
> java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
>       ... 15 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>       at 
> java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
>       at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
>       at 
> java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
>       at 
> java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
>       ... 21 more
> {code}
> When getRMWebAddress fail to connect to either resource manager hosts, it 
> will fall back to use the yarn-default.xml value 0.0.0.0, and attempt to 
> acquire TGS for HTTP/0.0.0.0, which produces the error shown here.  It would 
> be better to avoid trying to use yarn.resourcemanager.webapp.address as 
> fallback for RM host lookup in HA enabled cluster.
> In this particular cluster, contacting to host1.example.com and 
> host2.example.com failed due to the same reason that self signed server 
> certificate does not have a valid self-signed CA certificate to verify.  This 
> caused the failure in the first place.  It would be nice if the error message 
> is more verbose to identify the first error than producing error on the 
> fallback logic which makes no sense to user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to