On 10/15/2013 01:10 PM, [email protected] wrote: > It is not the client loop that is multithreading but the ldap server. > > And it is not a misuse of the API but a problem that may be raised by day t= > o day network problems. > > I've boiled down the problem to a few simple configurations that work (or b= > etter, fail ;-) with both 2.4.23 and 2.4.36. A tgz file containing a setup = > with start script and testclient is attached. It should be sufficient to re= > produce the fault. > > The problem occurs only if we use session variable substitution in the rwm = > overlay, and only if a search is *immediately* (e.g. caused by network loss= > and client timeout) followed by an unbind. >
I modified the reproducer a bit (the start script) and find out a few things. You can find the reproducer I'm using at [1]. Valgrind's helgrind shows some lock problems in the rwm overlay and also in back-ldap and connection.c. After correcting those the issue seems to be gone. You can find helgrind logs at [2] (before the fix) and [3] (after). Also, ElectricFence reveals some problems [4], which I didn't fix yet. A fix attempt can be found at [5]. I'm not sure if that is a correct fix, or it just masked the real issue. But I didn't to manage to reproduce the problem after applying it. [1] http://jsynacek.fedorapeople.org/openldap/its7723/reproducer/ [2] http://jsynacek.fedorapeople.org/openldap/its7723/results/slapd1-helgrind-broken.log [3] http://jsynacek.fedorapeople.org/openldap/its7723/results/slapd1-helgrind-fixed.log [4] http://jsynacek.fedorapeople.org/openldap/its7723/results/slapd1-broken-efence-gdb.txt [5] http://jsynacek.fedorapeople.org/openldap/its7723/0001-fix-possible-race-conditions.patch Cheers, -- Jan Synacek Software Engineer, Red Hat
