Hi Spike, I have once seen such an issue on RHEL7. It was caused by a wrong SELinux context on /etc/krb5.keytab file. That is, SSSD updated the password in AD, attempted to update /etc/krb5.keytab, and SELinux denied this attempt. Audit log will contain a denied entry if that is the case. Maybe it will help you.
Kind regards, Grigory Trenin чт, 7 окт. 2021 г. в 20:02, Spike White <spikewhit...@gmail.com>: > FYI -- update on this situation. > > AD DC logs no help. They show the exact same response sent back to a good > machine account password renewal as for a failed renewal. > > One of the AD administrators have identified a particular AD DC NIC > teaming configuration that they state has caused problems with Kerberos on > the past. It's on a small percentage of their AD DCs and they will work to > correct. They will keep us apprised as to update. > > I'm skeptical that's the underlying root cause -- for two reasons: > 1. If Kerberos was sensitive to this, it should affect all Kerberos > operations (Kerberos auth, etc.) and not just the kpasswd operations. > 2. This is not occurring on our older RHEL6 and RHEL7 builds AD integrated > via our older commercial AD integration product. It's occurring only on > our sssd-integrated builds. > > At this point, we're turned off debug level 7 (it was filling up our > /var/log filesystems and we have the verbose adcli update output from at > least two failed clients). We're going to take the alternate suggestion > of setting ad_maximum_machine_account_password_age to 0 (disabling sssd > from updating password) and run a cron job to do 'adcli update'. > > We're wrapping this adcli_update with tcpdump to get the exact kpasswd > request/response packets, as well as wrapping with KRB5_TRACE. > > We want to call adcli update exactly as sssd calls it. > From SOURCES/sssd-2.4.0/src/providers/ad/ad_machine_pw_renewal.c, this > appears to be how sssd calls external program /usr/sbin/adcli to do its > adcli update: > > /usr/sbin/adcli update --verbose --domain=$AD_DOMAIN > --host-keytab=/etc/krb5.keytab --host-fqdn=$FQDN > --computer-password-lifetime=30 > > because we aren't doing any Samba stuff. Is that the correct invocation? > We'll set computer-password-lifetime lower, say to 7. Because we want to > see examples more frequently, to find failed updates. > > BTW, the packet capture on a successful machine account password renewal > is only 8K, so that very targeted debug will not swamp our /var/log or /tmp > filesystems. > > Spike > > On Wed, Aug 25, 2021 at 10:32 AM Spike White <spikewhit...@gmail.com> > wrote: > >> Sssd experts, >> >> *Short summary: * How can we troubleshoot sssd’s ‘Automatic Kerberos >> Host Keytab Renewal’ process? We have ~0.4% of our Linux servers >> dropping off the AD domain monthly. >> >> *Longer explanation:* >> >> Over the past two years, we have on-boarded sssd as our Linux AD >> integration component. Largely displacing a former commercial product that >> did the same. >> >> We have about ~20K Linux servers that are sssd-enabled. A mix of RHEL6, >> RHEL7, RHEL8, Oracle Linux 6, 7 and 8. We have ~7K Linux servers still on >> the old commercial product. (For certain edge-case scenarios, such as >> DMZs, the commercial product works better.) >> >> Our AD forest is a single AD forest, with 4 regional child domains. All >> with transitive trust. Sssd auto-discovers parent domain and all 4 child >> domains, no problem – whenever it’s adcli joined to its regional local >> domain. >> >> Why are I writing this? >> >> Because we are researching an ongoing problem reported by L1 server ops. >> About 70 – 80 sssd-enabled Linux servers / month drop off the domain. Out >> of our current sssd-enabled population of ~20K server, that’s not >> horrible. But still it should be better. (Our former commercial product >> did better.) >> >> It’s not limited to one particular OS, OS version, build location or >> region. We have surveyed; it seems to occur randomly among all OS >> versions, regions and locations. >> >> To be clear, it’s extremely likely that this behavior arising from some >> subtle misconfiguration on our part – not from any sssd or adcli or >> Kerberos bug. We have a couple of configuration improvements we’re >> pursuing. (Kerberos max ticket lifetime mismatch between AD and >> /etc/krb5.conf file for instance.) >> >> We are taking sssd’s default settings for >> ad_maximum_machine_account_password_age and >> ad_machine_account_password_renewal_opts. So after 30 days, sssd will >> attempt daily to renew the host Kerberos keytab file. It should re-attempt >> daily if not renewed. By company policy, our AD disables any machine >> accounts that have not renewed their credentials in 40 days. So when we >> find servers that have dropped off the domain, it’s because they have not >> renewed their AD machine accounts in 40 days. >> >> We have SR’s open with our OS vendors (Redhat and Oracle respectively) >> for months now. To no great help. (They gave a few suggestions, but none >> panned out.) >> >> We thought we were hitting this bug: >> >> https://github.com/SSSD/sssd/issues/4762 >> >> But packet captures proved that adcli update is using TCP on RHEL7/8. >> Thus, this might be a potential problem, but only on RHEL6. (BTW >> ‘udp_preference_limit = 0’ doesn’t force use of TCP for the kpasswd >> invocation in RHEL6 – it still uses UDP. Thus, the recommended work-around >> for this bug doesn’t work.) >> >> So that isn’t our underlying problem. >> >> We’re at a loss now – as you can see, we’re grasping at straws. >> >> How can we troubleshoot sssd’s ‘automatic Kerberos Host keytab renewal’ >> process? Whenever we inspect a particular server it works. We can’t run >> all sssd clients at debug level 9; it fills up /var/log filesystem after a >> few days of that. We’re interested in troubleshooting that one particular >> sssd process on all clients; not all parts of sssd. >> >> Other than a steep learning curve (on our part), obscure situations (like >> DMZ auto-discovery of AD controllers) and exotic scenarios (like above), >> we’re quite happy with our 2 yr journey of direct AD integration with >> sssd. Obviously, the troubleshooting tools on RHEL6 are very minimal. >> But certainly, overall the quality of sssd on RHEL7/8 is excellent. AD >> integration has innumerable devils in the details; I’m amazed that sssd >> performs as well as it does against our multi-domain forest. >> >> Spike >> >> PS the problem with sssd auto-discovery of AD controllers in DMZs has >> been fixed in a recent sssd release. The better discovery algorithm was >> implemented – same one used by Windows clients and commercial products. >> It’s just that recent sssd version is not on RHEL7 or 8. >> >> >> >> >> _______________________________________________ > sssd-users mailing list -- sssd-users@lists.fedorahosted.org > To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure >
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure