[email protected] wrote: > Full_Name: Helge Milde > Version: slapd 2.4.11 > OS: Debian 5 > URL: > Submission from: (NULL) (81.175.2.164) > > > Hello, > > In the last two days, slapd has randomly stopped working on our server, about > one time every two or three hours. Before this, slapd had been running without > any problems for about two months. slapd.conf has not been touched since then. > > What happens is that slapd stops taking queries (postfix, dovecot, NSS etc. > starts getting timeouts), and it starts using exactly 100% of one core until > we > restart it. > After restart, everything works again (until this happens again). > > It seems to only happen when slapd is under "stress" (we have an 8-core system > with 8G RAM, and so slapd is really never under much "stress"; the biggest CPU > load I've seen is about 40% on one core). The last ldap search before slapd > stops working is always the same: > Oct 2 10:57:38 oi-mail slapd[24550]: conn=1543 op=1 SRCH > base="ou=groups,dc=web,dc=xxxx,dc=net" scope=1 deref=0 > filter="(&(objectClass=posixGroup))" > .. which should return about 1800 entries. Different connections might query > this simultaneously, and it's a query that happens every ~10 seconds on > average. > > I've tried to reproduce this by doing .. > # while [ 1 ]; do ldapsearch -x -H ldaps://.. -b > 'ou=groups,dc=web,dc=xxxx,dc=net' '(&(objectClass=posixAccount))'; done > .. for two minutes without any "luck" :). slapd just uses a bit more CPU, but > never stops working. > > With "loglevel stats stats2", I get nothing out of the logs: > Oct 2 08:42:15 oi-mail slapd[15836]: conn=34798 op=1 ENTRY > dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" > ... time of "crash" (we're mid-search at this point); nothing gets logged > until > we do a restart: ... > Oct 2 08:42:30 oi-mail slapd[15836]: daemon: shutdown requested and > initiated. > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34391 fd=31 closed (slapd shutdown) > ... many entires like the one above, and then I seem to be getting the next > entry from the same connection, right before it closes: ... > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 op=1 ENTRY > dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 fd=81 closed (slapd shutdown) > > I've tried logging more than just stats, but the interesting loglevels also > causes a big performance hit (trace and args), and I'm not really sure which > of > the other loglevels could be helpful. > This is a live ldap server, on a server that also hosts other services, so I > simply can't take the performance hit. > > I also did a strace -p<slapd pid>, which only gives "futex(0x41c499e0, > FUTEX_WAIT," until I restart slapd, but the "restarting routine" could be > useful, so here it is: http://www.pastie.org/639170
strace is useless. Use gdb and get a trace of all running threads when this occurs. http://www.openldap.org/faq/data/cache/59.html -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
