Rein Tollevik wrote: > On Sun, 30 Mar 2008, [EMAIL PROTECTED] wrote: > >> [EMAIL PROTECTED] wrote: >>> On Sat, 29 Mar 2008, [EMAIL PROTECTED] wrote: >>>> [EMAIL PROTECTED] wrote: > >>>>> I was seeing random failures of the test050-syncrepl-multimaster test. >>>>> One of >>>>> the failures was that it went into a tight loop traversing a circular >>>>> runqueue >>>>> it had managed to create in slapd_rq.task_list. It seems as this was >>>>> caused by >>>>> missing mutex locks around accesses to slapd_rq, which the patch uploaded >>>>> to >>>>> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes. >>>>> >>>>> Before I applied this patch the test failed after being run a few times, >>>>> with it >>>>> it has now passed 100 times and is still counting. >>>> locks in back-bdb/config.c should be pointless, as modifications to the >>>> configuration should only occur while all threads are paused. The rest >>>> makes sort of sense, but I'd leave it to Howard. >> Ignoring the ITS#5403 changes, I don't see anything here that isn't >> config-related, therefore it's all running single-threaded. > > Now that the configuration can be changed dynamically (as this test does) > I find it a bit odd that the config stuff should always be running > single-threaded. But there is obviously much I don't know about the > internals of slapd.
This was discussed at the 2003 OpenLDAP Developers' Day. http://www.openldap.org/conf/odd-wien-2003/proceedings.html Page 7 of my slides touched on it. It's also mentioned again here http://www.openldap.org/lists/openldap-devel/200505/msg00062.html The main point is that the original config stuff assumed that it was only executing during slapd startup, and single-threaded. Putting locks around all of the potential configurable value accesses would have been too much work, so the decision was made to force slapd to be single-threaded whenever writing to the config. I'm guessing that what's happening here is just a broken assumption - suspending the thread pool doesn't in fact freeze everything in slapd. In particular, the slapd listener thread may still wake up for a select() timeout to handle the slapd runqueue. Even though no tasks submitted as a result of this will execute (because they'll just get queued into the suspended thread pool) the runqueue itself is still being manipulated. Given that explanation, your patches make sense. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
