On 11/04/2013 03:11 PM, [email protected] wrote: > [email protected] wrote: >> On 10/15/2013 01:10 PM, [email protected] wrote: >>> It is not the client loop that is multithreading but the ldap server. >>> >>> And it is not a misuse of the API but a problem that may be raised by day t= >>> o day network problems. >>> >>> I've boiled down the problem to a few simple configurations that work (or b= >>> etter, fail ;-) with both 2.4.23 and 2.4.36. A tgz file containing a setup = >>> with start script and testclient is attached. It should be sufficient to re= >>> produce the fault. >>> >>> The problem occurs only if we use session variable substitution in the rwm = >>> overlay, and only if a search is *immediately* (e.g. caused by network loss= >>> and client timeout) followed by an unbind. >>> >> >> I modified the reproducer a bit (the start script) and find out a few things. >> You can find the reproducer I'm using at [1]. >> >> Valgrind's helgrind shows some lock problems in the rwm overlay and also in >> back-ldap and connection.c. After correcting those the issue seems to be >> gone. >> >> You can find helgrind logs at [2] (before the fix) and [3] (after). >> >> Also, ElectricFence reveals some problems [4], which I didn't fix yet. >> >> A fix attempt can be found at [5]. I'm not sure if that is a correct fix, or >> it >> just masked the real issue. But I didn't to manage to reproduce the problem >> after applying it. > > I already explained the problem. The other issues you identified are not > relevant, and your patch is not correct. Reread Followup #4 of this ITS. >
Another take on the fix: http://jsynacek.fedorapeople.org/openldap/its7723/0001-ITS-7723-fix-reference-counting.patch -- Jan Synacek Software Engineer, Red Hat
