Justin, if it's https://krbdev.mit.edu/rt/Ticket/Display.html?id=9037 , then it's even more evil to positively prove than dialing up the sssd debug level. The min debug level to get verbose adcli update output is debug level 7. Even running at this debug level for just a few days swamps the /var/log or other filesystem housing /var/log/sssd/*.
You can fine-tune this in sssd.conf with debug_level = 0x0100 , which gives just the desired 'adcli update' verbosity with not much else. And you can tune the default logrotate.d setting to rotate logs more frequently. However, this bug is quite infrequent and the adcli update verbosity is insufficient to determine exactly what's going on. Ultimately, we have to disable the 30-day 'adcli update' from sssd.conf and write our own crontab file to fire off every 3-4 days. In this cron job that called adcli update, we wrapped this manual adcli update with tcpdump to get the raw packet capture. In that way, we were finally able to get a full packet capture and see this race condition. We also call adcli update with KRB5_TRACE enabled, so that we get the full krb5 verbose output. Attached is the simple wrapped adcli update shell script that this cron job calls. We had to push this cron job out to thousands of servers and update the machine accounts passwords every 3-4 days to obtain 2-3 failed client packet captures. That race condition is that infrequent. It occurs on 0.3 - 0.4% of all adcli update invocations. Most all of these ideas we obtained from this sssd mailing list (such as disabling automatic password renewal and running adcli update as a cron job). I'm not convinced that Sebastian's situation is this bug, so Sebastian might be able to get away with debug_level = 0x0100 to see what his bug is. Spike On Wed, Jan 19, 2022 at 9:15 AM Justin Stephenson <[email protected]> wrote: > Hi, > > It sounds like a problem occurs when SSSD executes 'adcli update' to > renew the machine account password, if successful the AD DC computer > object password is updated and the new keys are written to the keytab. > If a failure occurs however it may have caused these two things to go > out of sync. > > You may need to set a high enough 'debug_level' in your > [domain/$domain] section of sssd.conf then check the adcli output > written into the domain logs when the issue happens. > > -Justin > > On Wed, Jan 19, 2022 at 5:40 AM Sebastian Grebe > <[email protected]> wrote: > > > > Hello, > > > > we are getting report from users where they suddenly can‘t authenticate > to their Linux computers anymore. These computers are joint to ore MS > Domain using adcli und sssd. Checking the log reveals that the kerberos > tickets stored in /etc/krb5.keytab do not have the expected KVON. At the > moment we can’t tell what’s causing the issue. It happens only > sporadically. I’m under the impression only computer without permanent > network connection (Laptops) are affected. > > > > The log shows: > > > > Jan 11 09:30:52 lc015564 systemd[1]: Starting System Security Services > Daemon... > > Jan 11 09:30:52 lc015564 sssd[1376]: Starting up > > Jan 11 09:30:52 lc015564 sssd_be[1609]: Starting up > > Jan 11 09:30:52 lc015564 sssd_ifp[1633]: Starting up > > Jan 11 09:30:52 lc015564 systemd[1]: Started System Security Services > Daemon. > > Jan 11 09:30:55 lc015564 sssd_be[1609]: Backend is offline > > Jan 11 09:49:32 lc015564 sssd_be[1609]: Backend is online > > Jan 11 09:49:41 lc015564 krb5_child[6111]: Cannot find key for > [email protected] kvno 11 in keytab > > Jan 11 09:49:41 lc015564 krb5_child[6111]: Cannot find key for > [email protected] kvno 11 in keytab > > Jan 11 09:49:49 lc015564 adcli[6102]: GSSAPI client step 1 > > Jan 11 09:49:49 lc015564 adcli[6102]: GSSAPI client step 1 > > Jan 11 09:49:50 lc015564 adcli[6102]: GSSAPI client step 1 > > Jan 11 10:00:57 lc015564 krb5_child[6838]: Cannot find key for > [email protected] kvno 11 in keytab > > Jan 11 10:00:57 lc015564 krb5_child[6838]: Cannot find key for > [email protected] kvno 11 in keytab > > > > And klist -k shows: > > > > Keytab name: FILE:/etc/krb5.keytab > > KVNO Principal > > ---- > -------------------------------------------------------------------------- > > 10 [email protected] > > 10 [email protected] > > 10 [email protected] > > 10 host/[email protected] > > 10 host/[email protected] > > 10 host/[email protected] > > 10 host/[email protected] > > 10 host/[email protected] > > 10 host/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 10 RestrictedKrbHost/[email protected] > > 9 [email protected] > > 9 [email protected] > > 9 [email protected] > > 9 host/[email protected] > > 9 host/[email protected] > > 9 host/[email protected] > > 9 host/[email protected] > > 9 host/[email protected] > > 9 host/[email protected] > > 9 RestrictedKrbHost/[email protected] > > 9 RestrictedKrbHost/[email protected] > > 9 RestrictedKrbHost/[email protected] > > 9 RestrictedKrbHost/[email protected] > > 9 RestrictedKrbHost/[email protected] > > 9 RestrictedKrbHost/[email protected] > > > > This is a our sssd.conf (it's from o different computer): > > > > [sssd] > > domains = wago.local > > config_file_version = 2 > > services = ifp > > > > [domain/wago.local] > > default_shell = /bin/bash > > fallback_homedir = /home/%d/%u > > cache_credentials = true > > krb5_store_password_if_offline = true > > krb5_realm = WAGO.LOCAL > > krb5_ccname_template = /tmp/krb5cc_%U > > realmd_tags = manages-system joined-with-adcli > > id_provider = ad > > access_provider = ad > > ad_domain = wago.local > > ad_enabled_domains = wago.local > > ad_hostname = lc017547.wago.local > > use_fully_qualified_names = false > > ldap_id_mapping = true > > ldap_user_gecos = displayName > > ldap_use_tokengroups = false > > ldap_search_base = dc=wago,dc=local?subtree? > > ldap_user_search_base = > ou=User,ou=Minden,ou=Germany,dc=wago,dc=local?subtree??ou=User,ou=Administration,dc=wago,dc=local?onelevel?(&(objectClass=user)(cn=a2*))?ou=Service,dc=wago,dc=local?subtree? > > ldap_group_search_base = > cn=Users,dc=wago,dc=local?onelevel?(&(objectClass=group)(cn=Domain > Users))?ou=Groups,ou=Minden,ou=Germany,dc=wago,dc=local?onelevel?(&(objectClass=group)(cn=&01-PC-Support)) > > ldap_netgroup_search_base = cn=Users,dc=wago,dc=local?onelevel? > > ignore_group_members = true > > enumerate = false > > dyndns_update = true > > dyndns_refresh_interval = 7200 > > dyndns_update_ptr = true > > dyndns_server = 10.1.100.2 > > case_sensitive = Preserving > > > > [nss] > > filter_users = root > > filter_groups = root > > > > [pam] > > offline_credentials_expiration = 0 > > offline_failed_login_attempts = 3 > > offline_failed_login_delay = 5 > > > > And the krb5.conf: > > > > [libdefaults] > > ticket_lifetime = 240:00:00 > > renew_lifetime = 240:00:00 > > clock_skew = 300 > > renewable = true > > default_ccache_name = FILE:/tmp/krb5cc_%{uid} > > default_realm = WAGO.LOCAL > > kdc_timesync = 1 > > ccache_type = 4 > > forwardable = true > > proxiable = true > > udp_preference_limit = 1 > > noaddresses = true > > fcc-mit-ticketflags = true > > [realms] > > WAGO.LOCAL = { > > admin_server = 10.1.101.200 > > admin_server = 10.1.100.1 > > admin_server = 10.1.100.253 > > admin_server = 10.1.100.2 > > } > > [domain_realm] > > .wago.local = WAGO.LOCAL > > wago.local = WAGO.LOCAL > > [login] > > krb4_convert = true > > krb4_get_tickets = false > > > > To solve the issue we delete the computer from the domain, delete the > krb5.keytab and rejoin them. > > _______________________________________________ > > sssd-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > https://lists.fedorahosted.org/archives/list/[email protected] > > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure > _______________________________________________ > sssd-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedorahosted.org/archives/list/[email protected] > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure >
#!/bin/bash # # PROGRAM: wrap_adcli_update.sh # DESCRIPTION: To wrap a daily cron that attempts an 'adcli update' with tcpdump so that we can capture the specific kpasswd packets # to try to catch a problematic client (drops off AD domain in 30 days). FQDN=$(hostname --fqdn) # How to get my AD domain (which I'll need later)? On some builds, the # AD domain is stored, in others it’s not. # # This works in all cases; parse the /etc/sssd/sssd.conf file to see which AD domain we were initially configured for. AD_DOMAIN=$(egrep '^\[domain/' /etc/sssd/sssd.conf | head -1 | sed -e 's|^\[domain/||' -e 's/\]//') SUFFIX=`date +"%Y-%m-%d_%H-%M-%S"` # Just in case any lingering creds. We want to call 'adcli update' same way that sssd calls 'adcli update' kdestroy -A > /dev/null 2>&1 export KRB5_TRACE=/tmp/krb5_trace.out.$SUFFIX rpm -q tcpdump > /dev/null 2>&1 || yum -y install tcpdump tcpdump -i any -w /tmp/adcli_update.pcap.$SUFFIX port 88 or port 464 & MY_CHILD_PID=$! sleep 1 # From SOURCES/sssd-2.4.0/src/providers/ad/ad_machine_pw_renewal.c, this appears to be how sssd calls external program /usr/sbin/adcli to do its adcli update. /usr/sbin/adcli update --verbose --domain=$AD_DOMAIN --host-keytab=/etc/krb5.keytab --host-fqdn=$FQDN --computer-password-lifetime=7 > /tmp/adcli_update.out.$SUFFIX 2>&1 unset KRB5_TRACE # apparently, it takes tcpdump a small period of time to get and write # the final packets. So we have to put a sleep in here to give tcpdump # sufficient time to write all packets, else you’ll get a partial # capture. 1 second is far too long, but as short as you can do with # sleep command. sleep 1 kill -TERM $MY_CHILD_PID COPIES=10 ls -1tr /tmp/krb5_trace* | head -n -$COPIES | xargs rm -- ls -1tr /tmp/adcli_update.pcap* | head -n -$COPIES | xargs rm -- ls -1tr /tmp/adcli_update.out* | head -n -$COPIES | xargs rm --
_______________________________________________ sssd-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected] Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
