Hi, The implemented protocol (ZMQ-RFC 7/MDP) has application level mutual heartbeating between the broker and the worker. And this works fine: both parties detect if the other side dies via missing heartbeats. The problem appears when the worker is assigned a long running job, heartbeating is _disabled_ while the job is being processed (as per 7/MDP specifies). This enables the worker to be single threaded, and avoids typical multithreaded issues (eg. processing thread hangs, heartbeating thread runs; worker in inconsistent state).
When a worker crashes during job processing my application doesn't realize this since no messages are flowing (the broker is waiting for the job result), but the libzmq detects this, as the socket is always closed. My goal is to always keep in sync the number of underlying sockets in libzmq and Worker related objects in my application. I've googled around and found a few libzmq features that would suit my needs: - ZMQ_IDENTITY_FD - this was introduced and shortly removed from the lib - ZMQ_SRCFD - deprecated, but it's exactly what I need! - "Peer-Address" metadata, the recommended replacement for ZMQ_SRCFD, but not suitable for my needs I know fd's should be handled with care (monitor events are asynchronous, fd's get reused), but ZMQ_SRCFD solves my problem with the following ruleset: 1. When a Worker registers (first message over a connection) save the underlying fd - and - 2. Check that this fd is in use by another Worker, if it is: that Worker is dead since libzmq reused its file descriptor 3. If a Worker's fd is in closed state for a longer period (heartbeat expiry time), then it crashed and the fd was not re-used (get this info from monitor) I don't know if this is considered as an ugly hack by hardcore zeromq users, but it looks like a legitimate ZMQ_SRCFD use-case to me. It would be nice if it wasn't removed in the upcoming versions. Any feedback welcome! Regards, Gyorgy On Mon, Feb 13, 2017 at 10:21 PM, Greg Young <[email protected]> wrote: > I believe the term here is application level heartbeats. > > It should also be supported that clients can heartbeat to server. It > is not always that all clients want similar heartbeat timeouts. > > On Mon, Feb 13, 2017 at 4:07 PM, Michal Vyskocil > <[email protected]> wrote: > > Hi, > > > > You can take inspiration from malamute broker > > https://github.com/zeromq/malamute > > > > There clients pings server regularly. The same does MQTT (just it's a > > server, who pings clients). > > > > Sadly malamute is vulnerable to the same problem, that received service > > request may get lost. Solution would be to let client to send a request > > again after timeout, however wasn't yet implemented. > > > > _______________________________________________ > > zeromq-dev mailing list > > [email protected] > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > -- > Studying for the Turing test > _______________________________________________ > zeromq-dev mailing list > [email protected] > https://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
