Hi Dusan, On Fri, 28 Jun 2013, [email protected] wrote:
> Hi Chu, > > thanks for your quick response. We will try to contact Oracle to advice here. > > However, what is your suggestion here ' try different settings for the > deadlock detector' exactly? > > BTW - certainly, we are patched DB 4.7.25 (all 4 patches)... In case you can update to 2.4.35 you might want to test the mdb backend with your workload. Especially if you can easily reproduce the problem. Greetings Christian > > Regards, > Dusan > > -----Original Message----- > From: Howard Chu [mailto:[email protected]] > Sent: 26. j??na 2013 13:32 > To: Fric, Dusan; [email protected] > Subject: Re: (ITS#7633) Slapd hangs on hdb write lock > > [email protected] wrote: >> Full_Name: Dusan Fric >> Version: 2.4.32 >> OS: RHEL 5.7 x64 >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (88.212.40.139) >> >> >> We are experiencing frequent hangs in slapd. Once hung we usualy >> cannot continue to connect until we kill -9 the slapd process and >> restart it. The directory is used for 2 applications as user eDir and >> we are using it in production over 1 month - we have noted the busier >> the directory becomes the more often it hangs (now twice per week). >> >> We have installed identical configurations on 3 environments, each has >> only one single server (no replication). There are 30k entries in the >> directory (production). >> >> We are running: >> >> RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB >> 4.8.30 (4.7.25) >> >> We were starting with DB 4.8.30, after downgrade to version 4.7.25 >> (according to >> ITS#7378 - >> https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378) >> we are facing the same issue. >> >> The problem occurs when two requests simultaneously try to update >> attributes of a record and in most cases of the same DN value. >> We can easily reproduce it with a java test program running 2 threads >> each connecting to the ldap server and updating the record for a >> particular DN value. >> It need not be the same DN value but DN values which reside on the >> same BDB page. > > Thanks for the traces, but the information you've provided shows that this is > a BDB problem, not an OpenLDAP bug. You'll have to contact Oracle for help. > The BDB deadlock detector is supposed to handle these situations. > > For example in your 4.8.30 lock status there are only 2 conflicting > transactions 80000028 and 80000029 and they are only contending on a single > page. This is one of the most elementary deadlock situations. > > You might try different settings for the deadlock detector, but ultimately > this appears to be a BDB bug and the choice of detector shouldn't matter. > >> Configuration >> >> olc cn=config (part): >> dn: cn=config >> objectClass: olcGlobal >> cn: config >> olcConcurrency: 0 >> olcConnMaxPending: 100 >> olcConnMaxPendingAuth: 1000 >> olcSockbufMaxIncoming: 262143 >> olcSockbufMaxIncomingAuth: 16777215 >> olcThreads: 32 >> >> olc DB hdb config (part): >> dn: olcDatabase={1}hdb,cn=config >> objectClass: olcHdbConfig >> objectClass: olcDatabaseConfig >> olcDatabase: {1}hdb >> olcDbCacheSize: 80000 >> olcDbCheckpoint: 128 5 >> olcDbConfig: {0}set_cachesize 0 268435456 1 >> olcDbConfig: {1}set_lg_max 10485760 >> olcDbConfig: {2}set_lg_bsize 2097152 >> olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog >> olcDbConfig: {4}set_lg_regionmax 262144 >> olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE >> olcDbConfig: {6}set_flags DB_TXN_NOSYNC >> olcDbDirtyRead: FALSE >> olcDbDNcacheSize: 0 >> olcDbIDLcacheSize: 240000 >> olcDbIndex: default eq,sub >> olcDbIndex: objectClass eq >> olcDbIndex: cn eq,sub >> olcDbIndex: uid eq,sub >> olcDbIndex: mail pres,eq >> olcDbIndex: sn eq,sub >> olcDbIndex: member eq >> olcDbLinearIndex: FALSE >> olcDbMode: 0600 >> olcDbNoSync: TRUE >> olcDbSearchStack: 20 >> olcDbShmKey: 0 >> olcLastMod: TRUE >> olcMaxDerefDepth: 15 >> olcMonitoring: TRUE >> olcReadOnly: FALSE >> >> We have managed to collect db_stat lock information, which indicates >> the same issue with DB write locks on both DB versions. >> >> db_stat -C ol (4.8.30) >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Lock REGINFO information: >> Lock Region type >> 5 Region ID >> /apps/DECCLASA-1/data/openldap/__db.005 Region name >> 0x2ab41cdc3000 Region address >> 0x2ab41cdc3138 Region primary address >> 0 Region maximum allocation >> 0 Region allocated >> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest >> REGION_JOIN_OK Region flags >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Locks grouped by lockers: >> Locker Mode Count Status ----------------- Object --------------- >> 1 dd= 0 locks held 1 write locks 0 pid/thread >> 30034/47555767609232 >> 1 READ 1 HELD id2entry.bdb handle 0 >> 2 dd= 0 locks held 0 write locks 0 pid/thread >> 30034/47555767609232 >> 3 dd= 0 locks held 1 write locks 0 pid/thread >> 30034/47555767609232 >> 3 READ 1 HELD dn2id.bdb handle 0 >> 4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 5 dd= 0 locks held 0 write locks 0 pid/thread >> 30034/47555767609232 >> 6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 >> 7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 >> 9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 >> 9 READ 1 HELD objectClass.bdb handle 0 >> a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 >> c READ 1 HELD uid.bdb handle 0 >> d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> 10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 >> 10 READ 1 HELD member.bdb handle 0 >> 11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 >> 13 READ 1 HELD mail.bdb handle 0 >> 14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 >> 18 READ 1 HELD sn.bdb handle 0 >> 19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904 >> 1b READ 1 HELD cn.bdb handle 0 >> 1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> 1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 >> 80000003 dd= 0 locks held 0 write locks 0 pid/thread >> 30034/47555767609232 >> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200 >> 80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904 >> 80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496 >> 80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792 >> 80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 >> 80000026 READ 1 HELD 0x5e948 len: 9 data: >> 0x560x0500000000000000 >> 80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 >> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000 >> 80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496 >> 80000028 WRITE 1 WAIT mail.bdb page 3 >> 80000028 READ 1 HELD mail.bdb page 3 >> 80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792 >> 80000029 WRITE 1 WAIT mail.bdb page 3 >> 80000029 READ 1 HELD mail.bdb page 3 >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Locks grouped by object: >> Locker Mode Count Status ----------------- Object --------------- >> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000 >> >> 1b READ 1 HELD cn.bdb handle 0 >> >> 10 READ 1 HELD member.bdb handle 0 >> >> 9 READ 1 HELD objectClass.bdb handle 0 >> >> c READ 1 HELD uid.bdb handle 0 >> >> 80000026 READ 1 HELD 0x5e948 len: 9 data: >> 0x560x0500000000000000 >> >> 1 READ 1 HELD id2entry.bdb handle 0 >> >> 18 READ 1 HELD sn.bdb handle 0 >> >> 3 READ 1 HELD dn2id.bdb handle 0 >> >> 80000029 READ 1 HELD mail.bdb page 3 >> 80000028 READ 1 HELD mail.bdb page 3 >> 80000029 WRITE 1 WAIT mail.bdb page 3 >> 80000028 WRITE 1 WAIT mail.bdb page 3 >> >> 13 READ 1 HELD mail.bdb handle 0 >> >> db_stat -C ol (4.7.25) >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Lock REGINFO information: >> Lock Region type >> 5 Region ID >> /apps/DECCLASA-1/data/openldap/__db.005 Region name >> 0x2b4a8a577000 Original region address >> 0x2b4a8a577000 Region address >> 0x2b4a8a577138 Region primary address >> 0 Region maximum allocation >> 0 Region allocated >> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest >> REGION_JOIN_OK Region flags >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Locks grouped by lockers: >> Locker Mode Count Status ----------------- Object --------------- >> 1 dd= 0 locks held 1 write locks 0 pid/thread >> 32447/47463753631632 >> 1 READ 1 HELD id2entry.bdb handle 0 >> 2 dd= 0 locks held 0 write locks 0 pid/thread >> 32447/47463753631632 >> 3 dd= 0 locks held 1 write locks 0 pid/thread >> 32447/47463753631632 >> 3 READ 1 HELD dn2id.bdb handle 0 >> 4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 >> 5 dd= 0 locks held 0 write locks 0 pid/thread >> 32447/47463753631632 >> 6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 >> 7 READ 1 HELD objectClass.bdb handle 0 >> 8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 >> c READ 1 HELD cn.bdb handle 0 >> d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 >> f READ 1 HELD uid.bdb handle 0 >> 10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 >> 12 READ 1 HELD member.bdb handle 0 >> 13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 >> 15 READ 1 HELD mail.bdb handle 0 >> 16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 >> 1a READ 1 HELD sn.bdb handle 0 >> 1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 80000003 dd= 0 locks held 0 write locks 0 pid/thread >> 32447/47463753631632 >> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248 >> 80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952 >> 80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656 >> 800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248 >> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000 >> 800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248 >> 800000aa WRITE 1 WAIT mail.bdb page 3 >> 800000aa READ 1 HELD mail.bdb page 1 >> 800000aa READ 1 HELD mail.bdb page 3 >> 800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952 >> 800000ab READ 1 HELD 0x43d58 len: 9 data: >> 0x540x0500000000000000 >> 800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952 >> 800000ac WRITE 1 WAIT mail.bdb page 3 >> 800000ac READ 1 HELD mail.bdb page 1 >> 800000ac READ 1 HELD mail.bdb page 3 >> 800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360 >> 800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064 >> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> Locks grouped by object: >> Locker Mode Count Status ----------------- Object --------------- >> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000 >> >> 1 READ 1 HELD id2entry.bdb handle 0 >> >> 3 READ 1 HELD dn2id.bdb handle 0 >> >> 7 READ 1 HELD objectClass.bdb handle 0 >> >> f READ 1 HELD uid.bdb handle 0 >> >> 1a READ 1 HELD sn.bdb handle 0 >> >> 800000aa READ 1 HELD mail.bdb page 3 >> 800000ac READ 1 HELD mail.bdb page 3 >> 800000aa WRITE 1 WAIT mail.bdb page 3 >> 800000ac WRITE 1 WAIT mail.bdb page 3 >> >> c READ 1 HELD cn.bdb handle 0 >> >> 15 READ 1 HELD mail.bdb handle 0 >> >> 800000aa READ 1 HELD mail.bdb page 1 >> 800000ac READ 1 HELD mail.bdb page 1 >> >> 12 READ 1 HELD member.bdb handle 0 >> >> 800000ab READ 1 HELD 0x43d58 len: 9 data: >> 0x540x0500000000000000 >> >> >> We have also collected the backtrace of threads for both DB versions >> which I have uploaded to: >> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_ >> 20130625.txt >> >> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_ >> 20130625.txt >> >> > > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ > > > -- Christian Kratzer CK Software GmbH Email: [email protected] Wildberger Weg 24/2 Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart Web: http://www.cksoft.de/ Geschaeftsfuehrer: Christian Kratzer
