Nikolai Schupbach wrote: > Hi Howard, > > Thank you very much for the explanation. What BDB version would you recommend. Obviously I have quite a few options and would like to use a version that is known to be very solid.
I believe 4.7.25 + all 4 of its official patches was pretty stable. http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point I'm no longer tracking BDB revisions since MDB has superior performance while using 1/4 as much RAM and requiring no tuning. > Sincerely, > Nikolai Schupbach > > On 3/09/2012, at 9:45 PM, Howard Chu wrote: > >> [email protected] wrote: >>> Full_Name: Nikolai Schupbach >>> Version: 2.4.31 >>> OS: FreeBSD >>> URL: ftp://ftp.openldap.org/incoming/ >>> Submission from: (NULL) (202.78.158.60) >>> >>> >>> We are experiencing frequent hangs in slapd. Once hung we can continue to >>> connect, but all searches will just hang indefinitely until we kill -9 the >>> slapd >>> process and restart it. The directory is used for mail routing and we have >>> been >>> migrating to it from an existing directory server over the last 3 weeks - we >>> have noted the busier the directory becomes the more often it hangs (now >>> once >>> every 2 days). >>> >>> We have one master and 10 syncrepl read only replicas - the master is used >>> mainly for writes and has not hung yet, but most of the replicas have hung >>> at >>> least once. The replicas receive anywhere between 50 to 300 searches/sec, >>> while >>> the master would only get 1/sec. There are 45k entries in the directory. >>> >>> We are running: >>> >>> FreeBSD 8.3/9.0 x64 >>> OpenLDAP 2.4.31 >>> Berkeley DB 4.6.21 >>> >>> The old directory we are migrating from has the same load and is also >>> running >>> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB >>> 4.3.29 >>> and OpenLDAP 2.3.27. >>> >>> We have managed to collect db_stat lock information, which indicates the >>> same >>> issue each time - a write lock on dn2id.bdb. >> >> It's more than that. Your db_stat shows that a single thread has 3 active >> transactions. This should never happen: >> >> 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 >> 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 >> 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 >> 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 >> 8000a85f READ 1 WAIT dn2id.bdb page 559 >> 8000a85f READ 1 HELD dn2id.bdb page 768 >> 8000a85f WRITE 2 HELD dn2id.bdb page 1362 >> 8000a85f READ 2 HELD dn2id.bdb page 1362 >> 8000a85f WRITE 2 HELD dn2id.bdb page 1353 >> 8000a85f READ 2 HELD dn2id.bdb page 1353 >> 8000a85f WRITE 2 HELD dn2id.bdb page 933 >> 8000a85f READ 1 HELD dn2id.bdb page 933 >> 8000a85f WRITE 4 HELD dn2id.bdb page 219 >> 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 >> 80001047 WRITE 1 HELD dn2id.bdb page 559 >> >> I would first recommend changing from BDB 4.6.21 to some other version. There >> are no code paths in back-bdb where we would ever return without either >> committing or aborting the current transactions, so this appears to be a BDB >> bug, not an OpenLDAP bug. >> >>> We have also collected the backtrace for all the threads which I have >>> uploaded >>> to: >>> >>> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt >>> >>> The full db_stat output is located at: >>> >>> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt >> >> -- >> -- Howard Chu >> CTO, Symas Corp. http://www.symas.com >> Director, Highland Sun http://highlandsun.com/hyc/ >> Chief Architect, OpenLDAP http://www.openldap.org/project/ > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
