On Tue, Aug 31, 2021 at 6:47 PM Spike White <spikewhit...@gmail.com> wrote:
> All, > > OK we have a query we run in AD for machine account passwords for a > certain age. In today's run, 31 - 32 days. Then we verify it's pingable. > > We have found such one such suspicious candidate today (two actually, but > the other Linux server is quite sick). So one good research candidate. > According to both AD and /etc/krb5.keytab file, the machine account > password was last set on 7/29. Today is 8/31, so that would be 32 days. > This 'automatic machine account keytab renewal' background task should > trigger again today. > > sssd service was last started 2 weeks ago and, by all appearances, appears > healthy. sssctl domain-status <domain> shows online, connected to AD > servers (both domain and GC servers).. All logins and group enumerations > working as expected. > > Just now, we dynamically set the debug level to 9 with 'sssctl debug-level > 9'. This particular server is Oracle Linux 8.4, > running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th, 2021. So > -- very recent sssd version. (This problem occurs with both RHEL & OL > 6/7/8, it's just today's candidate happens to be OL8.) > > We can't keep debug level 9 up for a great many days; it swamps the > /var/log filesystem. But we can leave up for a few days. We purposely did > not restart sssd server as we know that would trigger a machine account > renewal. > > Speaking of that -- from Sumit's sssd source code in > ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a > back-end task to call external program /usr/sbin/adcli with certain args. > What string can I look for in which sssd log file (now that I have debug > level 9 enabled) to tell me when this 'adcli update' task (aka 'automatic > machine account keytab renewal') is triggered? > It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400): Task [AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure). Also at this verbosity level `--verbose` should be supplied to adcli itself and I guess output should be captured in sssd_$domain.log as well. I'm not familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages. > > I'm less certain now that we've surveyed our env that this background > 'adcli update' task is the reason behind 70 - 80 servers / month dropping > off the domain. It might be a slight contributor, but I find only a very > few pingable servers with machine account last renewal date between 30 and > 40 days. > > Yes, I can disable this default 30 day automatic update and roll my own > 'adcli update' cron. But that's a mass deployment, to fix what might not > be the problem. I want to verify this is the actual culprit before I take > those drastic steps. > > Spike > >
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure