--On Thursday, September 09, 2010 10:43:06 PM -0700 Howard Chu <[email protected]> wrote:
> [email protected] wrote: >> --On Friday, September 03, 2010 01:23:17 AM -0700 Bill >> MacAllister<[email protected]> wrote: >> >> The problem with the database was only coincidental. Restoring the database >> got the failing replica past the problem replication event. >> >> In the replica pool of 6 servers we have seen the problem on there of the >> servers. In thinking about this more it is unlikely that it is a slave >> problem since the slaves have been in use for about 6 weeks and we did >> not see the problem. Only when we changed the master to 2.4.23 did we >> see the problem. I have captured a master debug log of the problem >> event. It is at http://www.stanford.edu/~whm/files/master-debug.txt. >> >> Bill >> > Please try with this patch: > > Index: sasl.c > =================================================================== > RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap/sasl.c,v > retrieving revision 1.79 > diff -u -r1.79 sasl.c > --- sasl.c 13 Apr 2010 20:17:56 -0000 1.79 > +++ sasl.c 10 Sep 2010 05:42:22 -0000 > @@ -733,8 +733,9 @@ > return ret; > } else if ( p->buf_out.buf_ptr != p->buf_out.buf_end ) { > /* partial write? pretend nothing got written */ > - len2 = 0; > p->flags |= LDAP_PVT_SASL_PARTIAL_WRITE; > + sock_errset(EAGAIN); > + len2 = -1; > } > > /* return number of bytes encoded, not written, to ensure Howard, The patched packages where installed last night on the production OpenLDAP master with two of the replicas in the failing state. Once the patched slapd was started the two problem replicas quickly caught up and everything looks good now. Thanks again for your help. Bill -- Bill MacAllister Infrastructure Delivery Group, Stanford University
