As someone advised me to do so, I made a kinit with the keytabs on all the hosts where HBase component are running, and restarted HDFS and HBase, as when restarting HBase the first time I had an error linked to hdfs keytab.
Once this is done, the errors are gone for some time, but I would bet my paycheck that tomorrow I'll have the same errors again. Therefore, this is most certainly related to a Kerberos expiration, but why doesn't HBase try to renew the ticket which seems to be expired ? As this is highly linked to Ambari deployment of Kerberos, I added the corresponding mailing list to the discussion, hoping that someone may have a clear idea on how to solve this problem. Thanks in advance for your help, Loïc Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-09-01 10:15 GMT+02:00 Loïc Chanel <[email protected]>: > But how could the credentials be invalid, as they were created and managed > only by Ambari ? > Also I tried to connect manually with the keytab, and it works : > > kinit -k -t /etc/security/keytabs/hbase.service.keytab > hbase/[email protected] > [root@vm-regionserver /]# klist > Ticket cache: FILE:/tmp/krb5cc_0 > Default principal: hbase/[email protected] > > Valid starting Expires Service principal > 09/01/15 10:02:18 09/02/15 10:02:18 krbtgt/[email protected] > renew until 09/01/15 10:02:18 > > But I still have the errors in HBase RegionServer logs : > > 2015-09-01 10:04:41,616 DEBUG [regionserver60020] > security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos > principal name is hbase/[email protected] > 2015-09-01 10:04:41,617 WARN [regionserver60020] ipc.RpcClient: Couldn't > setup connection for hbase/[email protected] to > hbase/[email protected] > 2015-09-01 10:04:41,618 WARN [regionserver60020] > regionserver.HRegionServer: error telling master we are up > com.google.protobuf.ServiceException: java.io.IOException: Couldn't setup > connection for hbase/[email protected] to hbase/[email protected] > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1739) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1777) > at > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2114) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:877) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.IOException: Couldn't setup connection for > hbase/[email protected] to hbase/[email protected] > > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:869) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Unknown Source) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:841) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:951) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1094) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1061) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1516) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1724) > ... 5 more > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(Unknown > Source) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:943) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:940) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Unknown Source) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:940) > ... 9 more > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Unknown > Source) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Unknown Source) > at > sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Unknown Source) > at sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown > Source) > at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source) > at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source) > ... 19 more > 2015-09-01 10:04:41,619 WARN [regionserver60020] > regionserver.HRegionServer: reportForDuty failed; sleeping and then > retrying. > > So I don't see what I could check or change to make these errors disappear. > Is there something I'm missing ? > > Thanks, > > > Loïc > > > Loïc CHANEL > Engineering student at TELECOM Nancy > Trainee at Worldline - Villeurbanne > > 2015-08-31 19:20 GMT+02:00 Ted Yu <[email protected]>: > >> Hi, >> The keytab you used seems to be headless keytab. >> Here is the sample output from klist when keytab for hbase services is >> used: >> >> klist >> Ticket cache: FILE:/tmp/krb5cc_1002 >> Default principal: hbase/[email protected] >> >> Valid starting Expires Service principal >> 31/08/2015 17:19 01/09/2015 17:19 krbtgt/[email protected] >> renew until 31/08/2015 17:19 >> >> FYI >> >> On Fri, Aug 21, 2015 at 12:44 AM, Loïc Chanel < >> [email protected]> >> wrote: >> >> > Sorry if I didn't mention that, but yeah, I ran kinit before invoking >> hbase >> > shell, and klists command says that my user has a ticket. >> > [root@host /]# klist >> > Ticket cache: FILE:/tmp/krb5cc_0 >> > Default principal: testuser@REALM >> > >> > Valid starting Expires Service principal >> > 08/21/15 09:39:33 08/22/15 09:39:33 krbtgt/REALM@REALM >> > renew until 08/21/15 09:39:33 >> > >> > >> > Loïc CHANEL >> > Engineering student at TELECOM Nancy >> > Trainee at Worldline - Villeurbanne >> > >> > 2015-08-21 6:12 GMT+02:00 anil gupta <[email protected]>: >> > >> > > Did you run kinit command before invoking "hbase shell"? What does >> klist >> > > command says? >> > > >> > > On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel < >> > [email protected] >> > > > >> > > wrote: >> > > >> > > > By the way, as this may help to find my issue, I just tested typing >> > > *whoami >> > > > *in HBase shell : this returned me exactly what it should : >> > > > testuser@REALM (auth:KERBEROS) >> > > > groups: nobody, toast >> > > > >> > > > Loïc CHANEL >> > > > Engineering student at TELECOM Nancy >> > > > Trainee at Worldline - Villeurbanne >> > > > >> > > > 2015-08-20 15:17 GMT+02:00 Loïc Chanel < >> [email protected]>: >> > > > >> > > > > Nothing more with your option :/ >> > > > > >> > > > > Loïc CHANEL >> > > > > Engineering student at TELECOM Nancy >> > > > > Trainee at Worldline - Villeurbanne >> > > > > >> > > > > 2015-08-20 15:04 GMT+02:00 Loïc Chanel < >> [email protected] >> > >: >> > > > > >> > > > >> I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2. >> > > > >> I have unlimited strength JCE installed. >> > > > >> >> > > > >> I'll try to have more clues with this option. >> > > > >> >> > > > >> Loïc CHANEL >> > > > >> Engineering student at TELECOM Nancy >> > > > >> Trainee at Worldline - Villeurbanne >> > > > >> >> > > > >> 2015-08-20 14:58 GMT+02:00 Ted Yu <[email protected]>: >> > > > >> >> > > > >>> Which hbase / hadoop release are you using ? >> > > > >>> >> > > > >>> Running with -Dsun.security.krb5.debug=true will provide more >> clue. >> > > > >>> >> > > > >>> Do you have unlimited strength JCE installed ? >> > > > >>> >> > > > >>> Cheers >> > > > >>> >> > > > >>> On Thu, Aug 20, 2015 at 5:46 AM, Loïc Chanel < >> > > > >>> [email protected]> >> > > > >>> wrote: >> > > > >>> >> > > > >>> > Hi all, >> > > > >>> > >> > > > >>> > Since I kerberized my cluster, it seems like I can't use HBase >> > > > anymore >> > > > >>> ... >> > > > >>> > For example, executing create 'toto','titi' on HBase shell >> > results >> > > > in >> > > > >>> the >> > > > >>> > printing of this line endlessly : >> > > > >>> > WARN [main] security.UserGroupInformation: Not attempting to >> > > > re-login >> > > > >>> > since the last re-login was attempted less than 600 seconds >> > before. >> > > > >>> > >> > > > >>> > And nothing else happens. >> > > > >>> > I tried to restart HDFS and HBase, and to re-generate >> credentials >> > > and >> > > > >>> > keytabs, but nothing changed. >> > > > >>> > As for the logs, they are not very explicits, as the only >> thing >> > > they >> > > > >>> say >> > > > >>> > (and keep saying) is : >> > > > >>> > >> > > > >>> > 2015-08-20 13:50:12,697 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: Created SASL server with mechanism = GSSAPI >> > > > >>> > 2015-08-20 13:50:12,698 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: Have read input token of size 650 for >> processing >> > by >> > > > >>> > saslServer.evaluateResponse() >> > > > >>> > 2015-08-20 13:50:12,704 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: Will send token of size 108 from saslServer. >> > > > >>> > 2015-08-20 13:50:12,706 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: Have read input token of size 0 for processing >> by >> > > > >>> > saslServer.evaluateResponse() >> > > > >>> > 2015-08-20 13:50:12,707 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: Will send token of size 32 from saslServer. >> > > > >>> > 2015-08-20 13:50:12,708 DEBUG [RpcServer.reader=2,port=60000] >> > > > >>> > ipc.RpcServer: RpcServer.listener,port=60000: DISCONNECTING >> > client >> > > > >>> > 192.168.6.148:43014 because read count=-1. Number of active >> > > > >>> connections: 3 >> > > > >>> > >> > > > >>> > Do anyone has an idea about where this might come from, or >> how to >> > > > >>> solve it >> > > > >>> > ? Because I couldn't find much documentation about this. >> > > > >>> > Thanks in advance for your help ! >> > > > >>> > >> > > > >>> > >> > > > >>> > Loïc >> > > > >>> > >> > > > >>> > Loïc CHANEL >> > > > >>> > Engineering student at TELECOM Nancy >> > > > >>> > Trainee at Worldline - Villeurbanne >> > > > >>> > >> > > > >>> >> > > > >> >> > > > >> >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Thanks & Regards, >> > > Anil Gupta >> > > >> > >> > >
