Hi, I assume the following: When a dealer socket (worker) reconnects to a router socket (broker) due to a transient network issue (reconnection happens on libzmq level), the new connection _always_ gets a new identity in the router socket, and _may_ get a different file descriptor (fd might get reused). Workers don't specify their identity. Is this correct?
If it is, then I can deal with identity->fd associations fine. And yes, you're right about the protocol improvements, I'll consider this option too. Regards, Gyorgy On Tue, Feb 14, 2017 at 9:18 PM, Doron Somech <[email protected]> wrote: > Using srcfd is prolemtic, zeromq handle reconnection and the srcfd might > change. > > To solve your problem I would change the design and continue sending > heartbeat during long job and change to worker to two threads model. > > Alternatively you set a maximum time for a job, after which you consider > the worker dead. If not dead you can handle the reconnection then. > > On Feb 14, 2017 17:33, "Gyorgy Szekely" <[email protected]> wrote: > >> Hi, >> The implemented protocol (ZMQ-RFC 7/MDP) has application level mutual >> heartbeating between the broker and the worker. And this works fine: both >> parties detect if the other side dies via missing heartbeats. The problem >> appears when the worker is assigned a long running job, heartbeating is >> _disabled_ while the job is being processed (as per 7/MDP specifies). This >> enables the worker to be single threaded, and avoids typical multithreaded >> issues (eg. processing thread hangs, heartbeating thread runs; worker in >> inconsistent state). >> >> When a worker crashes during job processing my application doesn't >> realize this since no messages are flowing (the broker is waiting for the >> job result), but the libzmq detects this, as the socket is always closed. >> My goal is to always keep in sync the number of underlying sockets in >> libzmq and Worker related objects in my application. >> >> I've googled around and found a few libzmq features that would suit my >> needs: >> - ZMQ_IDENTITY_FD - this was introduced and shortly removed from the lib >> - ZMQ_SRCFD - deprecated, but it's exactly what I need! >> - "Peer-Address" metadata, the recommended replacement for ZMQ_SRCFD, but >> not suitable for my needs >> >> I know fd's should be handled with care (monitor events are asynchronous, >> fd's get reused), but ZMQ_SRCFD solves my problem with the following >> ruleset: >> 1. When a Worker registers (first message over a connection) save the >> underlying fd >> - and - >> 2. Check that this fd is in use by another Worker, if it is: that Worker >> is dead since libzmq reused its file descriptor >> >> 3. If a Worker's fd is in closed state for a longer period (heartbeat >> expiry time), then it crashed and the fd was not re-used (get this info >> from monitor) >> >> I don't know if this is considered as an ugly hack by hardcore zeromq >> users, but it looks like a legitimate ZMQ_SRCFD use-case to me. It would be >> nice if it wasn't removed in the upcoming versions. >> Any feedback welcome! >> >> Regards, >> Gyorgy >> >> >> >> On Mon, Feb 13, 2017 at 10:21 PM, Greg Young <[email protected]> >> wrote: >> >>> I believe the term here is application level heartbeats. >>> >>> It should also be supported that clients can heartbeat to server. It >>> is not always that all clients want similar heartbeat timeouts. >>> >>> On Mon, Feb 13, 2017 at 4:07 PM, Michal Vyskocil >>> <[email protected]> wrote: >>> > Hi, >>> > >>> > You can take inspiration from malamute broker >>> > https://github.com/zeromq/malamute >>> > >>> > There clients pings server regularly. The same does MQTT (just it's a >>> > server, who pings clients). >>> > >>> > Sadly malamute is vulnerable to the same problem, that received service >>> > request may get lost. Solution would be to let client to send a request >>> > again after timeout, however wasn't yet implemented. >>> > >>> > _______________________________________________ >>> > zeromq-dev mailing list >>> > [email protected] >>> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> >>> -- >>> Studying for the Turing test >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> https://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > > _______________________________________________ > zeromq-dev mailing list > [email protected] > https://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
