Re: [zeromq-dev] Assertion failure with socket monitor when clients disconnect

Auer, Jens Tue, 11 Oct 2016 01:38:35 -0700

Hi Doron,


I'm not a big fun of monitoring and prefer others solution. I actually think of 
deprecating it,  the implementaion is buggy and actually violate the don't 
share the socket between threads,  which is probably what causing your issue.

That’s quite a shock given that it is part of the stable API. I think in this 
case it should not only be deprecated but removed from API completely. On the 
other hand, something to monitor the connections is probably needed. Our 
clients are very concerned with TCP connections and would always insist on 
logging capabilities of TCP connection state changes. It would also be nice to 
get some statistics from the connections, e.g. number of received bytes or 
messages.

Anyway,  you have two options,  I think you can manage without the monitoring,  
I can try and help you find other solutions.  Another option is to try and not 
listen to all events, maybe this will avoid the sharing violation.

I would like to replace the monitor with something else, but I am not sure how 
to do this given our requirements. Currently, we use the monitor to

-       Log TCP connection events for connect, disconnect, accept, listen and 
reconnection retry events

-       Limit the number of reconnection attempts

Unfortunately this includes disconnect events which are exactly the events 
causing the crashes right now. The other events are probably rare enough that 
there is no problem.

Is there another way to implement this without using the socket monitor? We use 
Router, Dealer, Sub, XPub and Stream sockets. I have full control over the 
protocol between the Router/Dealer, but I cannot change the protocol between 
Pub/Sub and external clients over Stream sockets, so I cannot add control 
messages here.

I think an easy fix for my issues would be to add a mutex to protect the 
monitor socket in socket_base_t. I guess this was not done because it would 
block the thread and probably impact performance, but at least it will work 
correctly and not crash. It should be good enough for our use-case.

An idea for a non-blocking solution would be to have an independent monitor 
class as there are listener and accepter classes which has an inproc SUB socket 
and the PAIR socket. Each ZeroMQ socket would then create a monitor when it is 
created, and each session object would have a PUB socket to broadcast events to 
the monitor. The monitor then forwards events received from individual 
clients/sessions on different IO threads to the PAIR socket where the 
application code can connect.



I have given this some more thought and I think there is an easy solution, but 
it will break the socket monitor API. Instead of using a ZMQ_PAIR where 
applications can connect, it could use a ZMQ_PUB on an inproc socket to public 
events. The application would create a ZMQ_SUB socket and pass its address to 
zmq_socket_monitor just like it is done currently with the PAIR socket. Each 
engine then creates its own Pub socket and connects to the provided SUB (this 
can also happen after engines have been created?). This will create one socket 
per engine and thus there is no multi-threading access to sockets anymore.

This is an API change for zmq_socket_monitor, but it could also replace the 
event filtering by using the event type as the subscription. I would also 
change the event message itself. Right now, it is a 48-bit value consisting of 
a 16-bit event type and a 32-bit value followed by a second frame with the 
socket address as a string. I don’t like the packing of two values and would 
propose to send a three-frame message where each part is in its own frame. The 
event type would be the first part and thus can be used to filter event types. 
I think this fits better in the whole ZeroMQ API. Zmq_socket_monitor’s 
signature would change because the events parameter is removed and old code 
would not compile anymore. I think this is good because it shows that something 
changed and is not compatible anymore. If the signature were identical, it 
would cause runtime-errors because the socket type changed.

Best wishes,

  Jens

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Assertion failure with socket monitor when clients disconnect

Reply via email to