Sumit and others,

Our level 1 server support team has identified 107 servers that dropped out
of the domain in Aug.    By far, that's their biggest burden with sssd --
the automatic machine account renewal.

Over the long weekend, our team ran a report that identified any pingable
candidates that (according to AD) had a passwordLastSet age between 31 and
40 days.  These would be our interesting candidates;  candidates > 40 days
would not be of interest to us because AD would have locked the account.

We identified 13 candidates today.    From our various research, so far we
have determined 8 categories of such sssd "automatic machine account
renewal" failure.

1.       Some SE cloned VM and renamed hostname, IP address, rejoined AD.
Old <HOSTNAME>$ entries early in /etc/krb5.keytab file and adcli update
grabs first entry in /etc/krb5.keytab with $ at end of it.

2.       CPU spiked to 100% for 30 days.

3.       Polkit service not running.

4.       msDS-KeyVersionNumber in AD set to one more than KVNO in local
/etc/krb5.keytab file.  passwordLastSet Set to 30 days past last timestamp
in local /etc/krb5.keytab file.  IOW, sssd called adcli update after 30
days.  Adcli update updated AD, not local /etc/krb5.keytab file.

5.       DNS firewall problems.  Specifically, DNS TCP port 53 blocked, so
adcli update could not find Kerberos servers (_kerberos._
tcp.AMER.COMPANY.COM) or LDAP servers (_ldap._tcp.AMER.COMPANY.COM).

6.       SELinux enabled;  adcli not allowed to update /etc/krb5.keytab
file (from Sumit).

7.       Time sync too far out for adcli update to successfully do an
update on machine account.

8.       /var filesystem:  Input/Output errors.

By far, today the most common category is #4.  It amounted to 9 of the 13
candidates today.  Category #7 was another 2 candidates today.

So by far, it's category #4 we want to drill down into -- if we can
eliminate that,  we've strongly decreased the sssd burden.  Also, we think
we can put pro-active monitoring in place for category #3 and #7.

Our old commerical AD integration product didn't depend on polkit/dbus.  So
categories #3 and #4 are new for sssd.  If we can understand #4 and
proactively monitor for #3,  we can reduce the sssd burden to that of the
former product.

Category #4 appears to occur randomly -- no rhyme or reason.  Also we have
not found any repeat offenders -- so it's very hard to track down.

We plan to turn on sssd debug_level 7 (on that one sssd [domain/xxx] stanza
only). Debug level 7 is  min level to get verbose output from adcli
update.   We  know that turning on debug level 9 on all sssd stanzas (nss,
pam, ifp, [domain/xxx]) fills /var/log filesystem to 100% in a few days.

Spike

On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz <[email protected]> wrote:

>
>
> On 9/6/21 4:49 AM, Sumit Bose wrote:
> > Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
> >>
> >> On 9/2/21 12:49 AM, Sumit Bose wrote:
> >>> The reason is that 'kinit -k' constructs the principal by calling
> >>> gethostname() or similar, adding the 'host/' prefix and the realm. But
> >>> by default this principal in AD is only a service principal can cannot
> >>> be used to request a TGT as kinit does. AD only allows user principals
> >>> for request a TGT and this is by default '[email protected]'. If the
> >>> userPrincipalName attribute is set, this principal given here is
> allowed
> >>> as well.
> >>>
> >>
> >> This raises a couple of questions. Because of AD's flat address space,
> we
> >> use a host naming convention in AD as a sort of low rent namespacing;
> so,
> >> for example, for this host the college is cns and the research group
> cryo,
> >> so the AD hostname is cns-cryo-ross1$
> >>
> >> However,
> >>
> >> # hostname
> >> rossmann.biosci.utexas.edu
> >>
> >>
> >> which is easier for the users to remember for ssh purposes.  We set
> >>
> >>    ad_hostname = cns-cryo-ross1.austin.utexas.edu
> >>
> >> in /etc/sssd/sssd.conf.
> >>
> >> But I just checked, and kinit does not use ad_hostname, so I have to
> run it
> >> as
> >>
> >>    kinit -k -R cns-cryo-ross1$
> >>
> >> The question is, then what does use the ad_hostname key/value pair?
> >>
> >> Next, the kinit example provided by Spike was `kinit -k` -- we always
> run
> >> `kinit -k -R`
> >>
> >> -R renews the TGT, which is what I thought is the thing set to expire
> in AD
> >> that needs to be periodically renewed.  What's the purpose of running
> `kinit
> >> -k` without the -R?
> >
> > Hi,
> >
> > there are two different things.
> >
> > First, there are the host keys in the keytab which are equivalent to a
> > user password. Those keys are renewed by 'adcli update' if they are
> > older then 30 days, similar as you would renew you user password if the
> > AD DC tells you to do it.
> >
> > Second, with those keys you can request a Kerberos TGT
> >
> >      kinit -k 'shortname$'
> >
>
> I thought, based on the kinit man page, that the -k flag is just an
> ordinary ticket request and that you need to add the -R flag to request
> a TGT.  What you're saying is it also renews the TGT?
>
> OTOH I thought `kinit -k` was updating the computer account password on
> the domain controller, but that doesn't seem to be the case, in which
> case I'm not even sure what the purpose of an ordinary (non-TGT) ticket
> is if you're not requesting automatic login to some specifically
> requested service.
>
> Also, just to make sure I'm clear on this, the "renew until" doesn't
> change because this is based on the computer account password
> expiration, and further that sssd runs `adcli update` for you
> periodically?  How often, by the way?
>
>
> > as you can do with your user password:
> >
> >      kinit user@REALM
> >      Password for user@REALM
> >
> > This TGT has a lifetime and it might have a renewal time as well:
> >
> > # klist
> > Ticket cache: KCM:0:69840
> > Default principal: [email protected]
> >
> > Valid starting       Expires              Service principal
> > 09/06/2021 09:39:28  09/06/2021 19:39:28  krbtgt/[email protected]
> >          renew until 09/07/2021 09:39:24
> >
> >
> > In the example above the TGT will expire at '09/06/2021 19:39:28' but
> > can be renewed until '09/07/2021 09:39:24'. This means that if you call
> >
> >      kinit -R
> >
> > before '09/06/2021 19:39:28' you will get a fresh TGT without entering
> > your password. The new TGT will have a new lifetime but 'renew until'
> > will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work
> > anymore and you have to enter your password again. It does not matter
> > here if the TGT was originally requested with a keytab with 'kinit -k'
> > or with plain 'kinit' and a password.
> >
> > However, since the keytab is present in the file system calling
> >
> >      kinit -k 'shortname$'
> >
> > will always get a fresh TGT without manual intervention. So in case you
> > have a valid keytab this is even more flexible than 'kinit -R'
> >
> > HTH
> >
> > bye,
> > Sumit
> >
> >>
> >> _______________________________________________
> >> sssd-users mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
> >> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> >> List Archives:
> https://lists.fedorahosted.org/archives/list/[email protected]
> >> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
> > _______________________________________________
> > sssd-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
> https://lists.fedorahosted.org/archives/list/[email protected]
> > Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
> >>> This message is from an external sender. Learn more about why this <<
> >>> matters at https://links.utexas.edu/rtyclf.                        <<
> _______________________________________________
> sssd-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedorahosted.org/archives/list/[email protected]
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
>
_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to