[SSSD-users] Re: Issues with SSSD cache on version 1.13.4

Simo Sorce Fri, 21 Sep 2018 11:08:14 -0700

I am probably guilty of introducing this behavior in the original
implementation, and although I believe it is the correct behavior for
UIDs, it is probably suboptimal for GIDs.
I think we should open an issue to deal with this in a better way if
one is not open yet.


Simo.

On Fri, 2018-09-21 at 17:53 +0000, Beale (US), Gareth wrote:
> We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM 
> version.
> 
> Due to the large number of users and groups in our LDAP directory, and the 
> limitations of some legacy Unix systems, we have some large groups that have 
> been broken into "sub-groups" with the same GID but an incremental suffix. I 
> don't believe this is an uncommon solution, and it has worked fine for many 
> years. There are efforts underway to patch some older systems such that they 
> can handle very large groups so that we can collapse these sub-groups, but it 
> is a slow process and there are a lot of servers.
> 
> Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us 
> transition to using SSSD instead of configuring LDAP in /etc/ldap/conf. In 
> the last few weeks we have encountered an issue related to these groups with 
> the same GID. Most of the time, everything works as before, and for instance 
> "getent group" commands using either GID or (sub-group) name return results. 
> However at times those commands return an empty list and the following error 
> appears in the system log:
> 
> sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD 
> will not work correctly.
> (group ID elided in this email per company policy)
> 
> Using sss_cache to expire the entire cache, group cache or specific group 
> from cache has no effect. I understand that this expires the entries, not 
> removes them, but subsequent getent calls do not overwrite what was there, 
> the error persists. Stopping SSSD, removing the cache DB and restarting was 
> effective, but this is not a viable solution in production. Since the problem 
> clears itself eventually (only to come back later) I tried various 
> strategies, one of which was to do a "getent group" on every sub-group, and 
> this does clear the problem (until it returns).
> 
> Since I discovered this issue on SUSE, others in the company have verified 
> that it also appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is 
> still present up to that release, though the above error message does not 
> appear in the messages log. Instead there is an error in the sssd_nss.log:
> 
> [sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects 
> were found when only one was expected!
> 
> Gareth
> 
> Gareth Beale (bemsid: 45600)
> Enterprise High Performance Computing Service
> Application Infrastructure Services
> Global Information Technology Infrastrucure Services
> Need help? http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / 
> 425-234-0911
> 
> _______________________________________________
> sssd-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedorahosted.org/archives/list/[email protected]

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]

[SSSD-users] Re: Issues with SSSD cache on version 1.13.4

Reply via email to