Hi Jakub,

Apologies for the long delay in response as I was dragged away for other 
projects! So my previous (false)theory was a result of sssd_nss not being able 
to see entries that the sssd_be places into the memcache. It would not be able 
to find it's group so it would query any other domains it could see. I've now 
been able to isolate the issue down to an even smaller plausible cause. 

After some digging through logs of an affected machine I've discovered a very 
interesting set of logs. This is quite verbose so bear with me. The command I 
ran was:
$ getent group $GID

Here we see NSS requesting a GID number to be resolved/fetched:
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input ID: 
1000001111
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #187: New 
request 'Group by ID'
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_select_domains] (0x0400): CR 
#187: Performing a multi-domain search
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_domains] (0x0400): CR 
#187: Search will check the cache and check the data provider

First it checks the NSS cache to see if an entry is present in it's cache:
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_set_domain] (0x0400): CR 
#187: Using domain [mydomain.com]
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_send] (0x0400): CR 
#187: Looking up GID:[email protected]
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR 
#187: Checking negative cache for [GID:[email protected]]
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] (0x0400): CR 
#187: [GID:[email protected]] is not present in negative cache
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR 
#187: Looking up [GID:[email protected]] in cache
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR 
#187: Object [GID:[email protected]] was not found in cache

It's not present in the cache according to NSS so it requests the backend to 
search the domain provider for this entry:
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_dp] (0x0400): CR #187: 
Looking up [GID:[email protected]] in data provider
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing 
request for [0x55ee5f5d1b10:2:[email protected]]
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): 
Creating request for [mydomain.com][0x2][BE_REQ_GROUP][idnumber=1000001111:-]
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): 
Entering request [0x55ee5f5d1b10:2:[email protected]]

The domain provider finds the entry on the domain and fills the cache:
(Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sdap_save_group] (0x0400): 
Storing info for group [email protected]
(Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sysdb_store_group] 
(0x1000): The group record of [email protected] did not change, only 
updated the timestamp cache

What's interesting is that the sssd_be tells us that the entry was already 
present, yet nss was unaware of any entries in the cache, nss didn't even say 
(not exact quote) "entry found, needs updating", which after some testing on a 
working machine, is what occurs when nss encounters an entry that is out of 
date. 

Here we see sssd_nss responding to the backend's return of the group entry.
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply 
from Data Provider - DP error code: 0 errno: 0 error message: Success
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR 
#187: Looking up [GID:[email protected]] in cache
(Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR 
#187: Object [GID:[email protected]] was not found in cache

Either the backend is entering data wrongly into the cache or nss is unable to 
read certain entries that are placed in the cache, I would like to think it is 
the former because the issue is so intermittent and for the most part, it works 
correctly. The only workaround we have found so far is stopping sssd.service, 
removing the ldbs from /var/lib/sss/db/ and restarting the service. This allows 
sssd to work correctly and return the groups correctly.

I hope this makes sense,
Thanks,
Jamal
_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]

Reply via email to