Hi Spike,

I have once seen such an issue on RHEL7. It was caused by a wrong SELinux
context on /etc/krb5.keytab file. That is, SSSD updated the password in AD,
attempted to update /etc/krb5.keytab, and SELinux denied this attempt.
Audit log will contain a denied entry if that is the case. Maybe it will
help you.


Kind regards,
Grigory Trenin

чт, 7 окт. 2021 г. в 20:02, Spike White <spikewhit...@gmail.com>:

> FYI -- update on this situation.
>
> AD DC logs no help.  They show the exact same response sent back to a good
> machine account password renewal as for a failed renewal.
>
> One of the AD administrators have identified a particular AD DC NIC
> teaming configuration that they state has caused problems with Kerberos on
> the past.  It's on a small percentage of their AD DCs and they will work to
> correct.  They will keep us apprised as to update.
>
> I'm skeptical that's the underlying root cause -- for two reasons:
> 1.  If Kerberos was sensitive to this, it should affect all Kerberos
> operations  (Kerberos auth, etc.) and not just the kpasswd operations.
> 2. This is not occurring on our older RHEL6 and RHEL7 builds AD integrated
> via our older commercial AD integration product.  It's occurring only on
> our sssd-integrated builds.
>
> At this point, we're turned off debug level 7 (it was filling up our
> /var/log filesystems and we have the verbose adcli update output from at
> least two failed clients).   We're going to take the alternate suggestion
> of setting ad_maximum_machine_account_password_age to 0 (disabling sssd
> from updating password) and run a cron job to do 'adcli update'.
>
> We're wrapping this adcli_update with tcpdump to get the exact kpasswd
> request/response packets, as well as wrapping with KRB5_TRACE.
>
> We want to call adcli update exactly as sssd calls it.
> From SOURCES/sssd-2.4.0/src/providers/ad/ad_machine_pw_renewal.c, this
> appears to be how sssd calls external program /usr/sbin/adcli to do its
> adcli update:
>
>       /usr/sbin/adcli update --verbose --domain=$AD_DOMAIN
> --host-keytab=/etc/krb5.keytab --host-fqdn=$FQDN
> --computer-password-lifetime=30
>
> because we aren't doing any Samba stuff.  Is that the correct invocation?
>  We'll set computer-password-lifetime lower, say to 7.  Because we want to
> see examples more frequently, to find failed updates.
>
> BTW, the packet capture on a successful machine account password renewal
> is only 8K, so that very targeted debug will not swamp our /var/log or /tmp
> filesystems.
>
> Spike
>
> On Wed, Aug 25, 2021 at 10:32 AM Spike White <spikewhit...@gmail.com>
> wrote:
>
>> Sssd experts,
>>
>> *Short summary: * How can we troubleshoot sssd’s ‘Automatic Kerberos
>> Host Keytab Renewal’ process?    We have ~0.4%  of our Linux servers
>> dropping off the AD domain monthly.
>>
>> *Longer explanation:*
>>
>> Over the past two years, we have on-boarded sssd as our Linux AD
>> integration component.  Largely displacing a former commercial product that
>> did the same.
>>
>> We have about ~20K Linux servers that are sssd-enabled.  A mix of RHEL6,
>> RHEL7, RHEL8, Oracle Linux 6, 7 and 8.   We have ~7K Linux servers still on
>> the old commercial product.  (For certain edge-case scenarios, such as
>> DMZs, the commercial product works better.)
>>
>> Our AD forest is a single AD forest, with 4 regional child domains.  All
>> with transitive trust.  Sssd auto-discovers parent domain and all 4 child
>> domains, no problem – whenever it’s adcli joined to its regional local
>> domain.
>>
>> Why are I writing this?
>>
>> Because we are researching an ongoing problem reported by L1 server ops.
>> About 70 – 80 sssd-enabled Linux servers / month drop off the domain.  Out
>> of our current sssd-enabled population of ~20K server, that’s not
>> horrible.  But still it should be better.  (Our former commercial product
>> did better.)
>>
>> It’s not limited to one particular OS, OS version, build location or
>> region.  We have surveyed; it seems to occur randomly among all OS
>> versions, regions and locations.
>>
>> To be clear, it’s extremely likely that this behavior arising from some
>> subtle misconfiguration on our part – not from any sssd or adcli or
>> Kerberos bug.  We have a couple of configuration improvements we’re
>> pursuing.  (Kerberos max ticket lifetime mismatch between AD and
>> /etc/krb5.conf file for instance.)
>>
>> We are taking sssd’s default settings for
>> ad_maximum_machine_account_password_age and
>> ad_machine_account_password_renewal_opts.   So after 30 days, sssd will
>> attempt daily to renew the host Kerberos keytab file.  It should re-attempt
>> daily if not renewed.  By company policy, our AD disables any machine
>> accounts that have not renewed their credentials in 40 days.   So when we
>> find servers that have dropped off the domain, it’s because they have not
>> renewed their AD machine accounts in 40 days.
>>
>> We have SR’s open with our OS vendors (Redhat and Oracle respectively)
>> for months now.  To no great help.  (They gave a few suggestions, but none
>> panned out.)
>>
>> We thought we were hitting this bug:
>>
>> https://github.com/SSSD/sssd/issues/4762
>>
>> But packet captures proved that adcli update is using TCP on RHEL7/8.
>> Thus, this might be a potential problem, but only on RHEL6.  (BTW
>> ‘udp_preference_limit = 0’ doesn’t force use of TCP for the kpasswd
>> invocation in RHEL6 – it still uses UDP.  Thus, the recommended work-around
>> for this bug doesn’t work.)
>>
>> So that isn’t our underlying problem.
>>
>> We’re at a loss now – as you can see, we’re grasping at straws.
>>
>> How can we troubleshoot sssd’s ‘automatic Kerberos Host keytab renewal’
>> process?  Whenever we inspect a particular server it works.  We can’t run
>> all sssd clients at debug level 9;  it fills up /var/log filesystem after a
>> few days of that.  We’re interested in troubleshooting that one particular
>> sssd process on all clients;  not all parts of sssd.
>>
>> Other than a steep learning curve (on our part), obscure situations (like
>> DMZ auto-discovery of AD controllers) and exotic scenarios (like above),
>> we’re quite happy with our 2 yr journey of direct AD integration with
>> sssd.    Obviously, the troubleshooting tools on RHEL6 are very minimal.
>> But certainly, overall the quality of sssd on RHEL7/8 is excellent.  AD
>> integration has innumerable devils in the details; I’m amazed that sssd
>> performs as well as it does against our multi-domain forest.
>>
>> Spike
>>
>> PS the problem with sssd auto-discovery of AD controllers in DMZs has
>> been fixed in a recent sssd release.  The better discovery algorithm was
>> implemented – same one used by Windows clients and commercial products.
>> It’s just that recent sssd version is not on RHEL7 or 8.
>>
>>
>>
>>
>> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
>
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to