Hi,

we have a Kerberos secured cluster and currently facing issues with Ambari 
Metrics.
After starting Ambari Metrics everythin is fine but after a couple of days we 
get alerts from Ambari like this:

NameNode Service RPC Processing Latency (Hourly)
Unable to retrieve metrics from the Ambari Metrics service.

When I check the logs oft he Metrics Collector I can find entries like:

2018-03-28 11:19:47,013 WARN org.apache.hadoop.security.UserGroupInformation: 
Exception encountered while running the renewal command for 
amshbase/[email protected]<mailto:amshbase/[email protected]>.
 (TGT end time:1522228847000, renewalFailures: 
org.apache.hadoop.metrics2.lib.MutableGaugeInt@388f50cd,renewalFailuresTotal<mailto:org.apache.hadoop.metrics2.lib.MutableGaugeInt@388f50cd,renewalFailuresTotal>:
 
org.apache.hadoop.metrics2.lib.MutableGaugeLong@7d8dc9b8<mailto:org.apache.hadoop.metrics2.lib.MutableGaugeLong@7d8dc9b8>)
ExitCodeException exitCode=1: kinit: KDC can't fulfill requested option while 
renewing credentials

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:954)
        at org.apache.hadoop.util.Shell.run(Shell.java:855)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1163)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:1257)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:1239)
        at 
org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:987)
        at java.lang.Thread.run(Thread.java:745)
2018-03-28 11:19:47,014 ERROR org.apache.hadoop.security.UserGroupInformation: 
TGT is expired. Aborting renew thread for 
amshbase/[email protected]<mailto:amshbase/[email protected]>.

In the following I then see aggregation errors:

2018-03-28 11:27:08,188 INFO TimelineClusterAggregatorMinute: Started Timeline 
aggregator thread @ Wed Mar 28 11:27:08 CEST 2018
2018-03-28 11:27:08,189 INFO TimelineClusterAggregatorMinute: Skipping 
aggregation function not owned by this instance.
2018-03-28 11:27:08,205 ERROR TimelineMetricHostAggregatorHourly: Exception 
during aggregating metrics.
java.sql.SQLTimeoutException: Operation timed out.
        at 
org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:364)
        at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
        at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:831)

So this seems to be related to Kerberos. When I check the log oft he KDC there 
is not much info:

Mar 28 11:19:47 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (8 etypes {18 17 
20 19 16 23 25 26}) 10.11.1.21: TICKET NOT RENEWABLE: authtime 0,  
amshbase/[email protected]<mailto:amshbase/[email protected]>
 for krbtgt/[email protected]<mailto:krbtgt/[email protected]>, KDC 
can't fulfill requested option
...
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): AS_REQ (4 etypes {18 17 
16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, 
amshbase/[email protected]<mailto:amshbase/[email protected]>
 for krbtgt/[email protected]<mailto:krbtgt/[email protected]>
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (4 etypes {18 17 
16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, 
amshbase/[email protected]<mailto:amshbase/[email protected]>
 for nn/[email protected]<mailto:nn/[email protected]>

When I check the principal 
amshbase/[email protected]<mailto:amshbase/[email protected]>
 in the KDC I get the following:

Principal: 
amshbase/[email protected]<mailto:amshbase/[email protected]>
Expiration date: [never]
Last password change: Mo Mär 19 11:24:05 CET 2018
Password expiration date: [never]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Mo Mär 19 11:24:05 CET 2018 
(admin/[email protected]<mailto:admin/[email protected]>)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 2
Key: vno 1, aes256-cts-hmac-sha1-96
Key: vno 1, aes128-cts-hmac-sha1-96
MKey: vno 1
Attributes:
Policy: [none]

Ist hat normal? Maximum renewable life is set to 0 so ticket renewal is not 
possible. But that is also true for all other principals in the KDC and all 
other services work normally.
This is the content of krb5.conf:

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = PSIORI.COM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  default_ccache_name = /tmp/krb5cc_%{uid}
  #default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
  #default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

[domain_realm]
  .cl.psiori.com = PSIORI.COM
  cl.psiori.com = PSIORI.COM

[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log

[realms]
  PSIORI.COM = {
    admin_server = sql.cl.psiori.com
    kdc = sql.cl.psiori.com
  }

I have not applied any changes to the kdc.conf so it has the default content:

[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88

[realms]
EXAMPLE.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal 
des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal 
camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal 
des-cbc-crc:normal
}

Is there any misconfiguration?
When I restart the service then everything is fine again (for some time).

Any suggestions or help is very welcome.

Best regards,
Alex

Reply via email to