Hi all,
Let me introduce the symptoms that triggered this investigation.
First, versions:
- OS: Ubuntu 20.04
- SSSD: 2.2.3-3ubuntu0.7
- Platform: x86_64

Some weeks after starting using a new region/provider we noticed some 
intermittent issue impacting SSH logins (ie: clients got  `Connection closed by 
UNKNOWN port 65535` on `ssh ...`). When looking into it, I was able to 
reproduce it somewhat reliably. It happens after some time of inactivity for 
that user, but if I constantly issue commands (ie: `watch sssctl user-checks 
<user>` or `watch ssh <host> whoami` from a client) the issue wont trigger. If 
I stopped trying out those commands and tried again ~30 minutes later, it would 
fail.

With these symptoms I had these possibilities in mind: some caching issue, some 
authentication expiring, connections pools issues...

I enabled detailed logging and compared logs from a successful request and a 
failed one and I found what seems to be the source of the error from sssd logs 
perspective, but still can't make much sense of it.
Log extracted from sssd_<domain>.log:
```
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [ipa_hbac_rule_info_next] (0x0400): 
Sending request for next search base: [cn=hbac,dc=ipa,dc=corp,<dc 
<domain>>][2][(&(objectclass=ipaHBACRule)(ipaenabledflag=TRUE)(accessRuleType=allow)(|(hostCategory=all)(...)))]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_print_server] (0x2000): 
Searching <ip10.1.10.102:389
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x0400): 
calling ldap_search_ext with 
[(&(objectclass=ipaHBACRule)(ipaenabledflag=TRUE)(accessRuleType=allow)(|(hostCategory=all)(...)][cn=hbac,dc=ipa,dc=corp,<dc
 <domain>>].
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [objectclass]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [cn]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [ipauniqueid]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [ipaenabledflag]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [accessRuleType]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [memberUser]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [userCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [memberService]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [serviceCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [sourceHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [sourceHostCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [externalHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [memberHost]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x1000): 
Requesting attrs: [hostCategory]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_get_generic_ext_step] (0x2000): 
ldap_search_ext called, msgid = 18
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_op_add] (0x2000): New operation 
18 timeout 60                                               
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_process_result] (0x2000): 
Trace: sh[0x55d349f96d00], connected[1], ops[0x55d349ff16e0], 
ldap[0x55d34a10cba0]
(Wed Oct 13 19:40:52 2021) [be[<domain>]] [sdap_process_result] (0x2000): 
Trace: end of ldap_result list
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [sdap_op_timeout] (0x1000): Issuing 
timeout for 18
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [sdap_op_destructor] (0x1000): 
Abandoning operation 18
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [generic_ext_search_handler] 
(0x0040): sdap_get_generic_ext_recv failed [110]: Connection timed out
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [ipa_hbac_rule_info_done] (0x0080): 
Could not retrieve HBAC rules
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [ipa_pam_access_handler_done] 
(0x0020): Unable to fetch HBAC rules [110]: Connection timed out
(Wed Oct 13 19:41:52 2021) [be[<domain>]] [dp_req_done] (0x0400): DP Request 
[PAM Account #21]: Request handler finished [0]: Success
```

I tried using `ldap_enumeration_search_timeout = 180` which updated the timeout 
in (`[sdap_op_add] (0x2000): New operation 18 timeout 60 `) but same result 
(`enumerate` and `subdomain_enumerate` config options are left to their 
defaults, `false` and `none` respectively).

I looked at the IPA servers LDAP logs and saw now related requests regarding 
`ipaHBACRules` so I assume the request didn't even left the server.
I tried tweaking some other caching options like `cached_auth_timeout` and 
`pam_id_timeout` without any impact. Also tried clearing up the caches with 
`sss_cache -E` or `systemctl stop sssd && rm /var/lib/sss/db/* && systemctl 
start sssd` and nothing changed.

This is the error returned by `sssctl user-checks` when it fails:
```
$ sudo sssctl user-checks <user>
user: <user>
action: acct
service: system-auth

SSSD nss user lookup result:
 - user name: <user>
 - user id: <uid>
 - group id: <gid>
 - gecos: <name>
 - home directory: /home/<user>
 - shell: /bin/bash

InfoPipe operation failed. Check that SSSD is running and the InfoPipe 
responder is enabled. Make sure 'ifp' is listed in the 'services' option in 
sssd.conf.InfoPipe User lookup with [<user>] failed.
testing pam_acct_mgmt

pam_acct_mgmt: System error

PAM Environment:
 - no env -
 ```

Any pointer here would be very useful.

Thank you!
  Aitor
_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to