Hey Wei, Can you try adding "-Dsun.security.krb5.debug=true" to regionserver jvm opts and see if it prints something before crash?
- Bharath On Tue, Mar 11, 2014 at 6:35 PM, Wei Tan <[email protected]> wrote: > Thanks Ted. Yes our team looked at the doc you pointed out and: > > The key here is "every several hours" - so we can rule out 1) valid > kerberos ticket ~ klist shows a valid ticket > , 2) [0] does not have our error message ~ link password / keytab / clocks > / realm is not incorrect ~ all these errors on this page seem to be for > "does not work at all" conditions... not a "fails every randomly long > amount of time" > 3) we don't have this "problematic combination of components" listed... > but again - this is a work / no work dichotomy... > > > Thanks, > Wei > > --------------------------------- > Wei Tan, PhD > Research Staff Member > IBM T. J. Watson Research Center > http://researcher.ibm.com/person/us-wtan > > > > From: Ted Yu <[email protected]> > To: "[email protected]" <[email protected]>, > Date: 03/10/2014 05:31 PM > Subject: Re: Occasional GSSException that brings down region server > > > > Have you looked at > http://hbase.apache.org/book.html#trouble.client.security.rpc ? > > > On Mon, Mar 10, 2014 at 2:26 PM, Wei Tan <[email protected]> wrote: > > > Hi, > > > > We are running a HBase cluster in these settings and with kerberos > > enabled. > > HBase: 0.96.1.1 > > Zookeeper: 3.4.5 > > Hadoop: 1.1.1 > > > > > > We constantly put data into HBase and every several hours we get the > error > > below on a random region server; this error arises and the region server > > kills itself. > > > > ERROR: > > 2014-02-28 09:32:39,755 ERROR > [hconnection-0x116987ad-shared--pool1378-t9] > > security.UserGroupInformation: PriviledgedActionException > > as:XXXXXXXX@DOMAIN cause:javax.security.sasl.SaslException: GSS initiate > > failed [Caused by GSSException: No valid credentials provided (Mechanism > > level: The ticket isn't for us (35) - BAD TGS SERVER NAME)] > > > > > > > > We also tried with multiple version of kdc - all the way up to latest > > 1.12.1 - still see this error. What is weird is that most put gets > > processed successfully until this error occurs and kills the RS. > > > > Thanks, > > Wei > > --------------------------------- > > Wei Tan, PhD > > Research Staff Member > > IBM T. J. Watson Research Center > > http://researcher.ibm.com/person/us-wtan > > -- Bharath Vissapragada <http://www.cloudera.com>
