[email protected] wrote: > Full_Name: Nikolai Schupbach > Version: 2.4.31 > OS: FreeBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (202.78.158.60) > > > We are experiencing frequent hangs in slapd. Once hung we can continue to > connect, but all searches will just hang indefinitely until we kill -9 the > slapd > process and restart it. The directory is used for mail routing and we have > been > migrating to it from an existing directory server over the last 3 weeks - we > have noted the busier the directory becomes the more often it hangs (now once > every 2 days). > > We have one master and 10 syncrepl read only replicas - the master is used > mainly for writes and has not hung yet, but most of the replicas have hung at > least once. The replicas receive anywhere between 50 to 300 searches/sec, > while > the master would only get 1/sec. There are 45k entries in the directory. > > We are running: > > FreeBSD 8.3/9.0 x64 > OpenLDAP 2.4.31 > Berkeley DB 4.6.21 > > The old directory we are migrating from has the same load and is also running > OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB > 4.3.29 > and OpenLDAP 2.3.27. > > We have managed to collect db_stat lock information, which indicates the same > issue each time - a write lock on dn2id.bdb.
It's more than that. Your db_stat shows that a single thread has 3 active transactions. This should never happen: 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 8000a85f READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 HELD dn2id.bdb page 768 8000a85f WRITE 2 HELD dn2id.bdb page 1362 8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1353 8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 933 8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 4 HELD dn2id.bdb page 219 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 80001047 WRITE 1 HELD dn2id.bdb page 559 I would first recommend changing from BDB 4.6.21 to some other version. There are no code paths in back-bdb where we would ever return without either committing or aborting the current transactions, so this appears to be a BDB bug, not an OpenLDAP bug. > We have also collected the backtrace for all the threads which I have uploaded > to: > > ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt > > The full db_stat output is located at: > > ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
