I have an inherited IPA domain that is a subdomain of an active directory
domain, e.g. ipa.ad1.com as a child of ad1.com. The IPA domain has AD Trust
enabled and a one way domain trust to another AD sub domain, e.g. we want to
use user logins from the AD domain users.ad2.com which is a child domain of
ad2.com. We are also using AD security group from the user.ad2.com domain to
apply group based access control. e.g. we are using simple authentication on
SSSD to limit who can login and using AD groups to define sudo access. This
users domain and AD servers is managed by another team.
Everything was working for some time and then we started seeing intermittent
problems with authentication, a quick restart of the IPA server would resolve
the problem temporarily, but then it would stop again. Even if we could login
using SSH keys the sudo access would not work, it would appear to lose group
membership details.
I have recently updated all of the IPA nodes to RHEL9 and made sure that DNS is
updated correctly.
The sssd.conf configuration on the IPA server looks as follows
[domain/ipa.ad1.com]
debug_level = 6
id_provider = ipa
ipa_server = ipa-3.ipa.ad1.com
ipa_domain = ipa.ad1.com
ipa_hostname = ipa-3.ipa.ad1.com
auth_provider = ipa
chpass_provider = ipa
access_provider = ipa
cache_credentials = True
ldap_tls_cacert = /etc/ipa/ca.crt
krb5_store_password_if_offline = True
sudo_provider = ipa
autofs_provider = ipa
subdomains_provider = ipa
session_provider = ipa
hostid_provider = ipa
ipa_server_mode = True
subdomain_homedir = /home/%u
default_shell = /bin/bash
override_shell = /bin/bash
[sssd]
services = nss, pam, sudo, ifp
domains = ipa.ad1.com
domain_resolution_order = users.ad2.com
[nss]
homedir_substring = /home
[pam]
[sudo]
[autofs]
[ssh]
[pac]
[ifp]
allowed_uids = ipaapi, root
[session_recording]
I have debug level 6 enabled on SSSD and when I check the domain status I see
the following more often than not. The ad2.com forest domains are offline.
They go online and then as soon as someone tries to login again then either
both or just the users.ad2.com domain go offline which causes the login to fail.
ipa.ad1.com Online status: Online
ad1.com Online status: Online
ad2.com Online status: Offline
users.ad2.com Online status: Offline
When I look at the SSSD domain logs I see the following (I have replaced
internal domain names or hostname)
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sbus_senders_lookup] (0x2000):
Looking for identity of sender [sssd.ifp]
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sss_domain_get_state] (0x1000):
Domain ipa.ad1.com is Active
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sss_domain_get_state] (0x1000):
Domain ad1.com is Active
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sss_domain_get_state] (0x1000):
Domain AD2.COM is Active
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sbus_issue_request_done]
(0x0400): sssd.DataProvider.Backend.IsOnline: Success
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sbus_dispatch] (0x4000):
Dispatching.
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [_read_pipe_handler] (0x0400):
[RID#1162] EOF received, client finished
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sdap_get_tgt_recv] (0x0400):
[RID#1162] Child responded: 0 [FILE:/var/lib/sss/db/ccache_AD2.COM], expired on
[1686187555]
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sdap_cli_auth_step] (0x0100):
[RID#1162] expire timeout is 900
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sdap_cli_auth_step] (0x1000):
[RID#1162] the connection will expire at 1686152455
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sasl_bind_send] (0x0100):
[RID#1162] Executing sasl bind mech: GSSAPI, user: AUTH$
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sasl_bind_send] (0x0020):
[RID#1162] ldap_sasl_interactive_bind_s failed (-2)[Local error]
** BACKTRACE DUMP ENDS HERE
*
(2023-06-07 16:25:55): [be[ipa.ad1.com]] [sasl_bind_send] (0x0080): [RID#1162]
Extended failure message: [SASL(-1): generic failure: GSSAPI Error: Unspecified
GSS failure. Minor code may provide more information (Server not found in
Kerberos database)]
(2023-06-07 16:25:55): [be[ipa.ad1.com]] [child_sig_handler] (0x0100):
[RID#1162] child [9519] finished successfully.
(2023-06-07 16:25:55): [be[ipa.ad1.com]] [sdap_cli_connect_recv] (0x0040):
[RID#1162] Unable to establish connection [1432158227]: Authentication Failed
** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING
BACKTRACE:
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [sasl_bind_send] (0x0080):
[RID#1162] Extended failure message: [SASL(-1): generic failure: GSSAPI Error:
Unspecified GSS failure. Minor code may provide more information (Server not
found in Kerberos database)]
* (2023-06-07 16:25:55): [be[ipa.ad1.com]] [child_sig_handler] (0x1000):
[RID#1162] Waiting for child [9519].
* (2023-06-07 16:25:55): [be[ipa.ad1.com]]