On (23/03/16 15:49), Patrick Coleman wrote: >Hi, > >We run sssd to bind a number of machines to LDAP for auth. On a subset >of these machines, we have software that makes several thousand IPv6 >route changes per second. > >Recently, we found that on these hosts the sssd_nss responder process >fails several times a day[1], and will not recover until sssd is >restarted. strace[2] of the main sssd process indicates that sssd is >receiving many, many netlink messages - so many, in fact, that sssd >cannot process them fast enough and is receiving ENOBUFS from >recvmsg(2). > >The messages that are received seem to get forwarded[3] to the sssd >responders over the unix socket and flood them until they fail. > >From what I can see, the netlink code in >src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink >notifications with the aim of detecting things like wifi network >changes. This isn't something we'd find useful on our servers and >seems to have performance implications - is there any easy way of >turning off this functionality in sssd that I've missed? > >We see this issue running sssd 1.11.7. > >Cheers, > >Patrick > > >1. The failures look something like this. I have replaced our sss >domain with "ourdomain" >/var/log/sssd/sssd_nss.log > >(Tue Mar 22 02:58:01 2016) [sssd[nss]] [accept_fd_handler] (0x0100): >Client connected! >(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups] (0x0100): >Requesting info for [systemuser] from [<ALL>] >(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search] >(0x0100): Requesting info for [systemuser@ourdomain] >(Tue Mar 22 02:59:04 2016) [sssd[nss]] >[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information >from Data Provider >Error: 3, 5, (null) The real error is in sssd_$domain.log
neither sssd.log nor sssd_nss.log will help you. @see https://fedorahosted.org/sssd/wiki/Troubleshooting LS _______________________________________________ sssd-users mailing list [email protected] https://lists.fedorahosted.org/admin/lists/[email protected]
