Hi, Background: I have a message broker written with cppzmq implementing the Majordomo protocol. It works really fine, except for one scenario: when a worker crashes during processing. The protocol handles this as no new task is assigned to the dead worker, but the broker never realizes that it lost a worker. In my environment workers die quite often, and this is visible to the broker: tcp link goes down. My problem is that the broker is not aware of such events and effectively leaks worker related objects and provides false stats on available resources (the worker reconnects as a new worker).
Question: Is it possible get the identity of disconnected peers on a ROUTER socket without actually sending a message? There's a dedicated socket for workers in the broker, and there's a monitor attached to it, which reports connection closed events, but I found no way to associate these events with router identity. Is this intentional? I also tired setting the ZMQ_ROUTER_MANDATORY flag, and sending a single frame message consisting of the identity only, but it gets discarded without ever throwing a EHOSTUNREACH error. The only way I could come up with is to send a real (heartbeat) message to a worker which will trigger EHOSTUNREACH for disconnected workers, but it will queue up in busy workers. I wouldn't even consider this as a workaround... Any ideas solve this correctly? Regards, Gyorgy Szekely
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
