Am Tue, Sep 07, 2021 at 01:59:34PM -0500 schrieb Spike White:
> Sumit and others,
> 
> Our level 1 server support team has identified 107 servers that dropped out
> of the domain in Aug.    By far, that's their biggest burden with sssd --
> the automatic machine account renewal.
> 
> Over the long weekend, our team ran a report that identified any pingable
> candidates that (according to AD) had a passwordLastSet age between 31 and
> 40 days.  These would be our interesting candidates;  candidates > 40 days
> would not be of interest to us because AD would have locked the account.
> 
> We identified 13 candidates today.    From our various research, so far we
> have determined 8 categories of such sssd "automatic machine account
> renewal" failure.
> 
> 1.       Some SE cloned VM and renamed hostname, IP address, rejoined AD.
> Old <HOSTNAME>$ entries early in /etc/krb5.keytab file and adcli update
> grabs first entry in /etc/krb5.keytab with $ at end of it.
> 
> 2.       CPU spiked to 100% for 30 days.
> 
> 3.       Polkit service not running.

Hi,

adcli does not use polkit or DBus. realmd is using polkit to make it
possible for non-root user to join a domain but not adcli. So I would
expect that there should be a different reason on those systems.

> 
> 4.       msDS-KeyVersionNumber in AD set to one more than KVNO in local
> /etc/krb5.keytab file.  passwordLastSet Set to 30 days past last timestamp
> in local /etc/krb5.keytab file.  IOW, sssd called adcli update after 30
> days.  Adcli update updated AD, not local /etc/krb5.keytab file.


> 
> 5.       DNS firewall problems.  Specifically, DNS TCP port 53 blocked, so
> adcli update could not find Kerberos servers (_kerberos._
> tcp.AMER.COMPANY.COM) or LDAP servers (_ldap._tcp.AMER.COMPANY.COM).

Are you using hardcoded server names in sssd.conf in this case? Because
otherwise SSSD would have issues as well. Additionally SSSD should use
adcli's --domain-controller option with the current AD DC SSSD is
talking to.

> 
> 6.       SELinux enabled;  adcli not allowed to update /etc/krb5.keytab
> file (from Sumit).

This should be fixed by updating the selinux-policy.

> 
> 7.       Time sync too far out for adcli update to successfully do an
> update on machine account.

Would it be possible to run ntpd or chrony?

> 
> 8.       /var filesystem:  Input/Output errors.
> 
> By far, today the most common category is #4.  It amounted to 9 of the 13
> candidates today.  Category #7 was another 2 candidates today.
> 
> So by far, it's category #4 we want to drill down into -- if we can
> eliminate that,  we've strongly decreased the sssd burden.  Also, we think
> we can put pro-active monitoring in place for category #3 and #7.
> 
> Our old commerical AD integration product didn't depend on polkit/dbus.  So
> categories #3 and #4 are new for sssd.  If we can understand #4 and
> proactively monitor for #3,  we can reduce the sssd burden to that of the
> former product.
> 
> Category #4 appears to occur randomly -- no rhyme or reason.  Also we have
> not found any repeat offenders -- so it's very hard to track down.

So far I'm aware of two reasons for this. One is that AD returns a
temporary error which confuses libkrb5 on the client so that adcli
thinks the renewal failed but in the end it was successful on AD so that
there are a renewal in AD but not on the client. This issue lead to the
fix in libkrb5 to always use TCP for kpasswd operations.

The second are local permissions which didn't allow adcli called by SSSD
to update the keytab file. This might be triggered by SELinux (#6).
Another reason might be that SSSD is not running as root.

But permissions would make the issue appear again on the same host, so I
guess this might not be the reason in your case.

So it might still be some unexpected reply from AD which should not be
treated as an error. Do you by chance have read-only domain controllers
(RODCs)?

bye,
Sumit

> 
> We plan to turn on sssd debug_level 7 (on that one sssd [domain/xxx] stanza
> only). Debug level 7 is  min level to get verbose output from adcli
> update.   We  know that turning on debug level 9 on all sssd stanzas (nss,
> pam, ifp, [domain/xxx]) fills /var/log filesystem to 100% in a few days.
> 
> Spike
> 
> On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz <pgo...@math.utexas.edu> wrote:
> 
> >
> >
> > On 9/6/21 4:49 AM, Sumit Bose wrote:
> > > Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
> > >>
> > >> On 9/2/21 12:49 AM, Sumit Bose wrote:
> > >>> The reason is that 'kinit -k' constructs the principal by calling
> > >>> gethostname() or similar, adding the 'host/' prefix and the realm. But
> > >>> by default this principal in AD is only a service principal can cannot
> > >>> be used to request a TGT as kinit does. AD only allows user principals
> > >>> for request a TGT and this is by default 'SHORT$@AD.REALM'. If the
> > >>> userPrincipalName attribute is set, this principal given here is
> > allowed
> > >>> as well.
> > >>>
> > >>
> > >> This raises a couple of questions. Because of AD's flat address space,
> > we
> > >> use a host naming convention in AD as a sort of low rent namespacing;
> > so,
> > >> for example, for this host the college is cns and the research group
> > cryo,
> > >> so the AD hostname is cns-cryo-ross1$
> > >>
> > >> However,
> > >>
> > >> # hostname
> > >> rossmann.biosci.utexas.edu
> > >>
> > >>
> > >> which is easier for the users to remember for ssh purposes.  We set
> > >>
> > >>    ad_hostname = cns-cryo-ross1.austin.utexas.edu
> > >>
> > >> in /etc/sssd/sssd.conf.
> > >>
> > >> But I just checked, and kinit does not use ad_hostname, so I have to
> > run it
> > >> as
> > >>
> > >>    kinit -k -R cns-cryo-ross1$
> > >>
> > >> The question is, then what does use the ad_hostname key/value pair?
> > >>
> > >> Next, the kinit example provided by Spike was `kinit -k` -- we always
> > run
> > >> `kinit -k -R`
> > >>
> > >> -R renews the TGT, which is what I thought is the thing set to expire
> > in AD
> > >> that needs to be periodically renewed.  What's the purpose of running
> > `kinit
> > >> -k` without the -R?
> > >
> > > Hi,
> > >
> > > there are two different things.
> > >
> > > First, there are the host keys in the keytab which are equivalent to a
> > > user password. Those keys are renewed by 'adcli update' if they are
> > > older then 30 days, similar as you would renew you user password if the
> > > AD DC tells you to do it.
> > >
> > > Second, with those keys you can request a Kerberos TGT
> > >
> > >      kinit -k 'shortname$'
> > >
> >
> > I thought, based on the kinit man page, that the -k flag is just an
> > ordinary ticket request and that you need to add the -R flag to request
> > a TGT.  What you're saying is it also renews the TGT?
> >
> > OTOH I thought `kinit -k` was updating the computer account password on
> > the domain controller, but that doesn't seem to be the case, in which
> > case I'm not even sure what the purpose of an ordinary (non-TGT) ticket
> > is if you're not requesting automatic login to some specifically
> > requested service.
> >
> > Also, just to make sure I'm clear on this, the "renew until" doesn't
> > change because this is based on the computer account password
> > expiration, and further that sssd runs `adcli update` for you
> > periodically?  How often, by the way?
> >
> >
> > > as you can do with your user password:
> > >
> > >      kinit user@REALM
> > >      Password for user@REALM
> > >
> > > This TGT has a lifetime and it might have a renewal time as well:
> > >
> > > # klist
> > > Ticket cache: KCM:0:69840
> > > Default principal: administra...@child.ad.vm
> > >
> > > Valid starting       Expires              Service principal
> > > 09/06/2021 09:39:28  09/06/2021 19:39:28  krbtgt/child.ad...@child.ad.vm
> > >          renew until 09/07/2021 09:39:24
> > >
> > >
> > > In the example above the TGT will expire at '09/06/2021 19:39:28' but
> > > can be renewed until '09/07/2021 09:39:24'. This means that if you call
> > >
> > >      kinit -R
> > >
> > > before '09/06/2021 19:39:28' you will get a fresh TGT without entering
> > > your password. The new TGT will have a new lifetime but 'renew until'
> > > will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work
> > > anymore and you have to enter your password again. It does not matter
> > > here if the TGT was originally requested with a keytab with 'kinit -k'
> > > or with plain 'kinit' and a password.
> > >
> > > However, since the keytab is present in the file system calling
> > >
> > >      kinit -k 'shortname$'
> > >
> > > will always get a fresh TGT without manual intervention. So in case you
> > > have a valid keytab this is even more flexible than 'kinit -R'
> > >
> > > HTH
> > >
> > > bye,
> > > Sumit
> > >
> > >>
> > >> _______________________________________________
> > >> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> > >> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> > >> Fedora Code of Conduct:
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > >> List Archives:
> > https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> > >> Do not reply to spam on the list, report it:
> > https://pagure.io/fedora-infrastructure
> > > _______________________________________________
> > > sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> > > To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> > > Fedora Code of Conduct:
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > > List Archives:
> > https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> > > Do not reply to spam on the list, report it:
> > https://pagure.io/fedora-infrastructure
> > >>> This message is from an external sender. Learn more about why this <<
> > >>> matters at https://links.utexas.edu/rtyclf.                        <<
> > _______________________________________________
> > sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> > To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> > Fedora Code of Conduct:
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
> > https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> > Do not reply to spam on the list, report it:
> > https://pagure.io/fedora-infrastructure
> >

> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to