Grigory, It's quite likely that it's something client-related like that. But I know it's not exactly that; we turn off SELinux. in the verbose log of adcli update on a failed renewal, it says:
! Cannot change computer password: Authentication error adcli: updating membership with domain amer.dell.com failed: Cannot change computer password: Authentication error While on a good renewal, the verbose adcli output says: * Changed computer password * kvno incremented to 110 Sumit informs me that this output: ! Cannot change computer password: Authentication error means that adcli update received a response back from the AD DC that it's interpreting as a failed attempt to change the computer password. But I don't what local components get traversed between the network and adcli (is it routed over dbus or polkit for instance, so if /tmp or /var is 100% full is that a problem?) Spike On Thu, Oct 7, 2021 at 12:15 PM Grigory Trenin <gtre...@gmail.com> wrote: > Hi Spike, > > I have once seen such an issue on RHEL7. It was caused by a wrong SELinux > context on /etc/krb5.keytab file. That is, SSSD updated the password in AD, > attempted to update /etc/krb5.keytab, and SELinux denied this attempt. > Audit log will contain a denied entry if that is the case. Maybe it will > help you. > > > Kind regards, > Grigory Trenin > > чт, 7 окт. 2021 г. в 20:02, Spike White <spikewhit...@gmail.com>: > >> FYI -- update on this situation. >> >> AD DC logs no help. They show the exact same response sent back to a >> good machine account password renewal as for a failed renewal. >> >> One of the AD administrators have identified a particular AD DC NIC >> teaming configuration that they state has caused problems with Kerberos on >> the past. It's on a small percentage of their AD DCs and they will work to >> correct. They will keep us apprised as to update. >> >> I'm skeptical that's the underlying root cause -- for two reasons: >> 1. If Kerberos was sensitive to this, it should affect all Kerberos >> operations (Kerberos auth, etc.) and not just the kpasswd operations. >> 2. This is not occurring on our older RHEL6 and RHEL7 builds AD >> integrated via our older commercial AD integration product. It's occurring >> only on our sssd-integrated builds. >> >> At this point, we're turned off debug level 7 (it was filling up our >> /var/log filesystems and we have the verbose adcli update output from at >> least two failed clients). We're going to take the alternate suggestion >> of setting ad_maximum_machine_account_password_age to 0 (disabling sssd >> from updating password) and run a cron job to do 'adcli update'. >> >> We're wrapping this adcli_update with tcpdump to get the exact kpasswd >> request/response packets, as well as wrapping with KRB5_TRACE. >> >> We want to call adcli update exactly as sssd calls it. >> From SOURCES/sssd-2.4.0/src/providers/ad/ad_machine_pw_renewal.c, this >> appears to be how sssd calls external program /usr/sbin/adcli to do its >> adcli update: >> >> /usr/sbin/adcli update --verbose --domain=$AD_DOMAIN >> --host-keytab=/etc/krb5.keytab --host-fqdn=$FQDN >> --computer-password-lifetime=30 >> >> because we aren't doing any Samba stuff. Is that the correct >> invocation? We'll set computer-password-lifetime lower, say to 7. >> Because we want to see examples more frequently, to find failed updates. >> >> BTW, the packet capture on a successful machine account password renewal >> is only 8K, so that very targeted debug will not swamp our /var/log or /tmp >> filesystems. >> >> Spike >> >> On Wed, Aug 25, 2021 at 10:32 AM Spike White <spikewhit...@gmail.com> >> wrote: >> >>> Sssd experts, >>> >>> *Short summary: * How can we troubleshoot sssd’s ‘Automatic Kerberos >>> Host Keytab Renewal’ process? We have ~0.4% of our Linux servers >>> dropping off the AD domain monthly. >>> >>> *Longer explanation:* >>> >>> Over the past two years, we have on-boarded sssd as our Linux AD >>> integration component. Largely displacing a former commercial product that >>> did the same. >>> >>> We have about ~20K Linux servers that are sssd-enabled. A mix of RHEL6, >>> RHEL7, RHEL8, Oracle Linux 6, 7 and 8. We have ~7K Linux servers still on >>> the old commercial product. (For certain edge-case scenarios, such as >>> DMZs, the commercial product works better.) >>> >>> Our AD forest is a single AD forest, with 4 regional child domains. All >>> with transitive trust. Sssd auto-discovers parent domain and all 4 child >>> domains, no problem – whenever it’s adcli joined to its regional local >>> domain. >>> >>> Why are I writing this? >>> >>> Because we are researching an ongoing problem reported by L1 server >>> ops. About 70 – 80 sssd-enabled Linux servers / month drop off the >>> domain. Out of our current sssd-enabled population of ~20K server, that’s >>> not horrible. But still it should be better. (Our former commercial >>> product did better.) >>> >>> It’s not limited to one particular OS, OS version, build location or >>> region. We have surveyed; it seems to occur randomly among all OS >>> versions, regions and locations. >>> >>> To be clear, it’s extremely likely that this behavior arising from some >>> subtle misconfiguration on our part – not from any sssd or adcli or >>> Kerberos bug. We have a couple of configuration improvements we’re >>> pursuing. (Kerberos max ticket lifetime mismatch between AD and >>> /etc/krb5.conf file for instance.) >>> >>> We are taking sssd’s default settings for >>> ad_maximum_machine_account_password_age and >>> ad_machine_account_password_renewal_opts. So after 30 days, sssd will >>> attempt daily to renew the host Kerberos keytab file. It should re-attempt >>> daily if not renewed. By company policy, our AD disables any machine >>> accounts that have not renewed their credentials in 40 days. So when we >>> find servers that have dropped off the domain, it’s because they have not >>> renewed their AD machine accounts in 40 days. >>> >>> We have SR’s open with our OS vendors (Redhat and Oracle respectively) >>> for months now. To no great help. (They gave a few suggestions, but none >>> panned out.) >>> >>> We thought we were hitting this bug: >>> >>> https://github.com/SSSD/sssd/issues/4762 >>> >>> But packet captures proved that adcli update is using TCP on RHEL7/8. >>> Thus, this might be a potential problem, but only on RHEL6. (BTW >>> ‘udp_preference_limit = 0’ doesn’t force use of TCP for the kpasswd >>> invocation in RHEL6 – it still uses UDP. Thus, the recommended work-around >>> for this bug doesn’t work.) >>> >>> So that isn’t our underlying problem. >>> >>> We’re at a loss now – as you can see, we’re grasping at straws. >>> >>> How can we troubleshoot sssd’s ‘automatic Kerberos Host keytab renewal’ >>> process? Whenever we inspect a particular server it works. We can’t run >>> all sssd clients at debug level 9; it fills up /var/log filesystem after a >>> few days of that. We’re interested in troubleshooting that one particular >>> sssd process on all clients; not all parts of sssd. >>> >>> Other than a steep learning curve (on our part), obscure situations >>> (like DMZ auto-discovery of AD controllers) and exotic scenarios (like >>> above), we’re quite happy with our 2 yr journey of direct AD integration >>> with sssd. Obviously, the troubleshooting tools on RHEL6 are very >>> minimal. But certainly, overall the quality of sssd on RHEL7/8 is >>> excellent. AD integration has innumerable devils in the details; I’m >>> amazed that sssd performs as well as it does against our multi-domain >>> forest. >>> >>> Spike >>> >>> PS the problem with sssd auto-discovery of AD controllers in DMZs has >>> been fixed in a recent sssd release. The better discovery algorithm was >>> implemented – same one used by Windows clients and commercial products. >>> It’s just that recent sssd version is not on RHEL7 or 8. >>> >>> >>> >>> >>> _______________________________________________ >> sssd-users mailing list -- sssd-users@lists.fedorahosted.org >> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org >> Fedora Code of Conduct: >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: >> https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org >> Do not reply to spam on the list, report it: >> https://pagure.io/fedora-infrastructure >> > _______________________________________________ > sssd-users mailing list -- sssd-users@lists.fedorahosted.org > To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure >
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure