I am probably guilty of introducing this behavior in the original implementation, and although I believe it is the correct behavior for UIDs, it is probably suboptimal for GIDs. I think we should open an issue to deal with this in a better way if one is not open yet.
Simo. On Fri, 2018-09-21 at 17:53 +0000, Beale (US), Gareth wrote: > We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM > version. > > Due to the large number of users and groups in our LDAP directory, and the > limitations of some legacy Unix systems, we have some large groups that have > been broken into "sub-groups" with the same GID but an incremental suffix. I > don't believe this is an uncommon solution, and it has worked fine for many > years. There are efforts underway to patch some older systems such that they > can handle very large groups so that we can collapse these sub-groups, but it > is a slow process and there are a lot of servers. > > Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us > transition to using SSSD instead of configuring LDAP in /etc/ldap/conf. In > the last few weeks we have encountered an issue related to these groups with > the same GID. Most of the time, everything works as before, and for instance > "getent group" commands using either GID or (sub-group) name return results. > However at times those commands return an empty list and the following error > appears in the system log: > > sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD > will not work correctly. > (group ID elided in this email per company policy) > > Using sss_cache to expire the entire cache, group cache or specific group > from cache has no effect. I understand that this expires the entries, not > removes them, but subsequent getent calls do not overwrite what was there, > the error persists. Stopping SSSD, removing the cache DB and restarting was > effective, but this is not a viable solution in production. Since the problem > clears itself eventually (only to come back later) I tried various > strategies, one of which was to do a "getent group" on every sub-group, and > this does clear the problem (until it returns). > > Since I discovered this issue on SUSE, others in the company have verified > that it also appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is > still present up to that release, though the above error message does not > appear in the messages log. Instead there is an error in the sssd_nss.log: > > [sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects > were found when only one was expected! > > Gareth > > Gareth Beale (bemsid: 45600) > Enterprise High Performance Computing Service > Application Infrastructure Services > Global Information Technology Infrastrucure Services > Need help? http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / > 425-234-0911 > > _______________________________________________ > sssd-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedorahosted.org/archives/list/[email protected] -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected]
