[ https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989378#comment-16989378 ]
Eric Yang commented on YARN-9956: --------------------------------- Thank you for the patch, [~prabhujoseph]. Overall, the patch looks good. The failed unit test looks a bit concerning. I am not sure how it is related to the changes. Can you confirm this is not an issue? > Improve connection error message for YARN ApiServerClient > --------------------------------------------------------- > > Key: YARN-9956 > URL: https://issues.apache.org/jira/browse/YARN-9956 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Eric Yang > Assignee: Prabhu Joseph > Priority: Major > Attachments: YARN-9956-001.patch, YARN-9956-002.patch > > > In HA environment, yarn.resourcemanager.webapp.address configuration is > optional. ApiServiceClient may produce confusing error message like this: > {code} > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host1.example.com:8090 > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host2.example.com:8090 > 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at > java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) > at > java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) > ... 21 more > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: > java.io.IOException: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > ... 6 more > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > ... 8 more > Caused by: GSSException: No valid credentials provided (Mechanism level: > Server not found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > ... 12 more > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at > java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) > at > java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) > ... 21 more > {code} > When getRMWebAddress fail to connect to either resource manager hosts, it > will fall back to use the yarn-default.xml value 0.0.0.0, and attempt to > acquire TGS for HTTP/0.0.0.0, which produces the error shown here. It would > be better to avoid trying to use yarn.resourcemanager.webapp.address as > fallback for RM host lookup in HA enabled cluster. > In this particular cluster, contacting to host1.example.com and > host2.example.com failed due to the same reason that self signed server > certificate does not have a valid self-signed CA certificate to verify. This > caused the failure in the first place. It would be nice if the error message > is more verbose to identify the first error than producing error on the > fallback logic which makes no sense to user. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org