Eric Yang created YARN-9956:
-------------------------------

             Summary: Improve connection error message for YARN ApiServerClient
                 Key: YARN-9956
                 URL: https://issues.apache.org/jira/browse/YARN-9956
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Eric Yang


In HA environment, yarn.resourcemanager.webapp.address configuration is 
optional.  ApiServiceClient may produce confusing error message like this:

{code}
19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
host1.example.com:8090
19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: 
host2.example.com:8090
19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
GSSException: No valid credentials provided (Mechanism level: Server not found 
in Kerberos database (7) - LOOKING_UP_SERVER)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
Caused by: KrbException: Server not found in Kerberos database (7) - 
LOOKING_UP_SERVER
        at 
java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
        at 
java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
        at 
java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
        at 
java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
        at 
java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
        at 
java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
        ... 15 more
Caused by: KrbException: Identifier doesn't match expected value (906)
        at 
java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
        at 
java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
        at 
java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
        at 
java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
        ... 21 more
19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
Caused by: java.lang.reflect.UndeclaredThrowableException
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
        ... 6 more
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Server not found 
in Kerberos database (7) - LOOKING_UP_SERVER)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
        ... 8 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Server 
not found in Kerberos database (7) - LOOKING_UP_SERVER)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
        ... 12 more
Caused by: KrbException: Server not found in Kerberos database (7) - 
LOOKING_UP_SERVER
        at 
java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
        at 
java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
        at 
java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
        at 
java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
        at 
java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
        at 
java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
        ... 15 more
Caused by: KrbException: Identifier doesn't match expected value (906)
        at 
java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
        at 
java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
        at 
java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
        at 
java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
        ... 21 more
{code}

When getRMWebAddress fail to connect to either resource manager hosts, it will 
fall back to use the yarn-default.xml value 0.0.0.0, and attempt to acquire TGS 
for HTTP/0.0.0.0, which produces the error shown here.  It would be better to 
avoid trying to use yarn.resourcemanager.webapp.address as fallback for RM host 
lookup in HA enabled cluster.

In this particular cluster, contacting to host1.example.com and 
host2.example.com failed due to the same reason that self signed server 
certificate does not have a valid self-signed CA certificate to verify.  This 
caused the failure in the first place.  It would be nice if the error message 
is more verbose to identify the first error than producing error on the 
fallback logic which makes no sense to user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to