[EMAIL PROTECTED] wrote: > Hm, I hope I have found the race condition that causes this :-) I'm now > running with the patch at the end to see if that solves it, only time will > tell.. > > The race is that between the time selecting on the syncrepl socket is > enabled by the call to connection_client_enable() and the release of the > si_mutex a new message may arrive. If so, the next call to do_syncrepl > may fail in its attempt to trylock the mutex and no-one will re-enable > selecting on it again. My patch delays enabling of the socket until the > mutex has been released, which looks safe to me. Or can the access to > si->si_conn without a lock be a problem?
How about just moving the enable to after the runqueue manipulation is done? Just need to be sure that do_syncrepl() isn't entered again before si->si_conn gets initialized. It also occurs to me that we probably don't even need to manipulate the slapd runqueue in persist mode, when si->si_conn is already set. I.e., in that case we can only have gotten here because of a listener event, and not because of a runqueue schedule. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
