Pierangelo Masarati wrote:
Pierangelo Masarati wrote:
No more failures of this kind; however, now I intermittently get
replication failures:
The problem persists (only once in a while). It might still be
connection-related, since the logs of server #3, the proxy that pushes
replication to the consumer, are stuffed with tons of
"connection_read(...): no connection!"
What kind of system are you running on? Linux / multiprocessor?
One of the problems with epoll() on Linux is that it wakes up for HANGUP
events all the time (they are not selectable in the input options; they're
delivered regardless of whether you choose to wait for them or not). This also
means we can't shut the notifications off when we acknowledge/act on them. So
you'll get lots of repeated wakeups for the same hangup event. The new
connection_hangup() function processes these inline for normal connections,
but it still falls into the connection_read thread handling for client
connections, so their normal cleanup handlers can be invoked. If your server
is too busy, it will take a while for the submitted thread to execute, and
then you'll get a lot of these spurious messages.
I've been experimenting with epoll's edge-triggered and oneshot modes, which
would prevent multiple wakeups occurring for the same event. But
unfortunately, when I set that it seems that the events can't be *re-enabled*
when we want them, and so slapd hangs. Still looking at this.
But that's beside the point - you shouldn't be seeing any replication failures
at all, regardless of connection close handling. What else are you seeing now?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/