Grigory,

It's quite likely that it's something client-related like that.  But I know
it's not exactly that;  we turn off SELinux.   in the verbose log of adcli
update on a failed renewal, it says:

! Cannot change computer password: Authentication error
adcli: updating membership with domain amer.dell.com failed: Cannot change
computer password: Authentication error

While on a good renewal, the verbose adcli output says:

* Changed computer password
* kvno incremented to 110

Sumit informs me that this output:

            ! Cannot change computer password: Authentication error

means that adcli update received a response back from the AD DC that it's
interpreting as a failed attempt to change the computer password.  But I
don't what local components get traversed between the network and adcli (is
it routed over dbus or polkit for instance, so if /tmp or /var is 100% full
is that a problem?)

Spike

On Thu, Oct 7, 2021 at 12:15 PM Grigory Trenin <gtre...@gmail.com> wrote:

> Hi Spike,
>
> I have once seen such an issue on RHEL7. It was caused by a wrong SELinux
> context on /etc/krb5.keytab file. That is, SSSD updated the password in AD,
> attempted to update /etc/krb5.keytab, and SELinux denied this attempt.
> Audit log will contain a denied entry if that is the case. Maybe it will
> help you.
>
>
> Kind regards,
> Grigory Trenin
>
> чт, 7 окт. 2021 г. в 20:02, Spike White <spikewhit...@gmail.com>:
>
>> FYI -- update on this situation.
>>
>> AD DC logs no help.  They show the exact same response sent back to a
>> good machine account password renewal as for a failed renewal.
>>
>> One of the AD administrators have identified a particular AD DC NIC
>> teaming configuration that they state has caused problems with Kerberos on
>> the past.  It's on a small percentage of their AD DCs and they will work to
>> correct.  They will keep us apprised as to update.
>>
>> I'm skeptical that's the underlying root cause -- for two reasons:
>> 1.  If Kerberos was sensitive to this, it should affect all Kerberos
>> operations  (Kerberos auth, etc.) and not just the kpasswd operations.
>> 2. This is not occurring on our older RHEL6 and RHEL7 builds AD
>> integrated via our older commercial AD integration product.  It's occurring
>> only on our sssd-integrated builds.
>>
>> At this point, we're turned off debug level 7 (it was filling up our
>> /var/log filesystems and we have the verbose adcli update output from at
>> least two failed clients).   We're going to take the alternate suggestion
>> of setting ad_maximum_machine_account_password_age to 0 (disabling sssd
>> from updating password) and run a cron job to do 'adcli update'.
>>
>> We're wrapping this adcli_update with tcpdump to get the exact kpasswd
>> request/response packets, as well as wrapping with KRB5_TRACE.
>>
>> We want to call adcli update exactly as sssd calls it.
>> From SOURCES/sssd-2.4.0/src/providers/ad/ad_machine_pw_renewal.c, this
>> appears to be how sssd calls external program /usr/sbin/adcli to do its
>> adcli update:
>>
>>       /usr/sbin/adcli update --verbose --domain=$AD_DOMAIN
>> --host-keytab=/etc/krb5.keytab --host-fqdn=$FQDN
>> --computer-password-lifetime=30
>>
>> because we aren't doing any Samba stuff.  Is that the correct
>> invocation?   We'll set computer-password-lifetime lower, say to 7.
>> Because we want to see examples more frequently, to find failed updates.
>>
>> BTW, the packet capture on a successful machine account password renewal
>> is only 8K, so that very targeted debug will not swamp our /var/log or /tmp
>> filesystems.
>>
>> Spike
>>
>> On Wed, Aug 25, 2021 at 10:32 AM Spike White <spikewhit...@gmail.com>
>> wrote:
>>
>>> Sssd experts,
>>>
>>> *Short summary: * How can we troubleshoot sssd’s ‘Automatic Kerberos
>>> Host Keytab Renewal’ process?    We have ~0.4%  of our Linux servers
>>> dropping off the AD domain monthly.
>>>
>>> *Longer explanation:*
>>>
>>> Over the past two years, we have on-boarded sssd as our Linux AD
>>> integration component.  Largely displacing a former commercial product that
>>> did the same.
>>>
>>> We have about ~20K Linux servers that are sssd-enabled.  A mix of RHEL6,
>>> RHEL7, RHEL8, Oracle Linux 6, 7 and 8.   We have ~7K Linux servers still on
>>> the old commercial product.  (For certain edge-case scenarios, such as
>>> DMZs, the commercial product works better.)
>>>
>>> Our AD forest is a single AD forest, with 4 regional child domains.  All
>>> with transitive trust.  Sssd auto-discovers parent domain and all 4 child
>>> domains, no problem – whenever it’s adcli joined to its regional local
>>> domain.
>>>
>>> Why are I writing this?
>>>
>>> Because we are researching an ongoing problem reported by L1 server
>>> ops.  About 70 – 80 sssd-enabled Linux servers / month drop off the
>>> domain.  Out of our current sssd-enabled population of ~20K server, that’s
>>> not horrible.  But still it should be better.  (Our former commercial
>>> product did better.)
>>>
>>> It’s not limited to one particular OS, OS version, build location or
>>> region.  We have surveyed; it seems to occur randomly among all OS
>>> versions, regions and locations.
>>>
>>> To be clear, it’s extremely likely that this behavior arising from some
>>> subtle misconfiguration on our part – not from any sssd or adcli or
>>> Kerberos bug.  We have a couple of configuration improvements we’re
>>> pursuing.  (Kerberos max ticket lifetime mismatch between AD and
>>> /etc/krb5.conf file for instance.)
>>>
>>> We are taking sssd’s default settings for
>>> ad_maximum_machine_account_password_age and
>>> ad_machine_account_password_renewal_opts.   So after 30 days, sssd will
>>> attempt daily to renew the host Kerberos keytab file.  It should re-attempt
>>> daily if not renewed.  By company policy, our AD disables any machine
>>> accounts that have not renewed their credentials in 40 days.   So when we
>>> find servers that have dropped off the domain, it’s because they have not
>>> renewed their AD machine accounts in 40 days.
>>>
>>> We have SR’s open with our OS vendors (Redhat and Oracle respectively)
>>> for months now.  To no great help.  (They gave a few suggestions, but none
>>> panned out.)
>>>
>>> We thought we were hitting this bug:
>>>
>>> https://github.com/SSSD/sssd/issues/4762
>>>
>>> But packet captures proved that adcli update is using TCP on RHEL7/8.
>>> Thus, this might be a potential problem, but only on RHEL6.  (BTW
>>> ‘udp_preference_limit = 0’ doesn’t force use of TCP for the kpasswd
>>> invocation in RHEL6 – it still uses UDP.  Thus, the recommended work-around
>>> for this bug doesn’t work.)
>>>
>>> So that isn’t our underlying problem.
>>>
>>> We’re at a loss now – as you can see, we’re grasping at straws.
>>>
>>> How can we troubleshoot sssd’s ‘automatic Kerberos Host keytab renewal’
>>> process?  Whenever we inspect a particular server it works.  We can’t run
>>> all sssd clients at debug level 9;  it fills up /var/log filesystem after a
>>> few days of that.  We’re interested in troubleshooting that one particular
>>> sssd process on all clients;  not all parts of sssd.
>>>
>>> Other than a steep learning curve (on our part), obscure situations
>>> (like DMZ auto-discovery of AD controllers) and exotic scenarios (like
>>> above), we’re quite happy with our 2 yr journey of direct AD integration
>>> with sssd.    Obviously, the troubleshooting tools on RHEL6 are very
>>> minimal.  But certainly, overall the quality of sssd on RHEL7/8 is
>>> excellent.  AD integration has innumerable devils in the details; I’m
>>> amazed that sssd performs as well as it does against our multi-domain
>>> forest.
>>>
>>> Spike
>>>
>>> PS the problem with sssd auto-discovery of AD controllers in DMZs has
>>> been fixed in a recent sssd release.  The better discovery algorithm was
>>> implemented – same one used by Windows clients and commercial products.
>>> It’s just that recent sssd version is not on RHEL7 or 8.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
>> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
>> Fedora Code of Conduct:
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives:
>> https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
>> Do not reply to spam on the list, report it:
>> https://pagure.io/fedora-infrastructure
>>
> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
>
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to