Re: RE24 connection code reworking

Howard Chu Mon, 26 Jan 2009 13:59:09 -0800

Pierangelo Masarati wrote:

Pierangelo Masarati wrote:

No more failures of this kind; however, now I intermittently get
replication failures:


The problem persists (only once in a while).  It might still be
connection-related, since the logs of server #3, the proxy that pushes
replication to the consumer, are stuffed with tons of
"connection_read(...): no connection!"


What kind of system are you running on? Linux / multiprocessor?

One of the problems with epoll() on Linux is that it wakes up for HANGUPevents all the time (they are not selectable in the input options; they'redelivered regardless of whether you choose to wait for them or not). This alsomeans we can't shut the notifications off when we acknowledge/act on them. Soyou'll get lots of repeated wakeups for the same hangup event. The newconnection_hangup() function processes these inline for normal connections,but it still falls into the connection_read thread handling for clientconnections, so their normal cleanup handlers can be invoked. If your serveris too busy, it will take a while for the submitted thread to execute, andthen you'll get a lot of these spurious messages.

I've been experimenting with epoll's edge-triggered and oneshot modes, whichwould prevent multiple wakeups occurring for the same event. Butunfortunately, when I set that it seems that the events can't be *re-enabled*when we want them, and so slapd hangs. Still looking at this.

But that's beside the point - you shouldn't be seeing any replication failuresat all, regardless of connection close handling. What else are you seeing now?


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: RE24 connection code reworking

Reply via email to