On Sun, 2017-09-17 at 12:29 -0400, Bill Torpey wrote: > Luca: > > I hear what you’re saying but … I think I’m talking about a different > situation. > > If I understand your explanation correctly, you’re saying that > setting ZMQ_RECONNECT_IVL to -1 should prevent a disconnected > endpoint from *ever* reconnecting, under any set of circumstances. > > I would read the doc (4.2.2) more like the following (with addition > in *bold*): > > > The ZMQ_RECONNECT_IVL option shall set the initial reconnection > > interval for the specified socket. The reconnection interval is the > > period ØMQ shall wait between attempts to *automatically* reconnect > > disconnected peers when using connection-oriented transports. The > > value -1 means no reconnection. > > > What I’m questioning is the interaction between ZMQ_RECONNECT_IVL == > -1 and the behavior enforced by https://github.com/zeromq/libzmq/iss > ues/788. (Also see here: https://www.mail-archive.com/zeromq- > d...@lists.zeromq.org/msg21484.html). That commit is intended to > prevent *duplicate* connections from the same endpoint, for certain > socket types (e.g., pub/sub), where multiple connections (and their > associated duplicate messages) don’t make sense. > > One scenario I’m concerned about is the one where: > > 1. Endpoint connects to us > 2. Endpoint is disconnected for some reason > 3. Setting ZMQ_RECONNECT_IVL=-1 disables *automatic* > reconnect, so as far as we’re concerned the endpoint is dead > 4. Subsequently the endpoint connects to us again (e.g., > following a restart) > 5. Because we still have a record of the endpoint, we will > refuse the connection — even though the endpoint is dead from our > point of view. In this scenario that endpoint can NEVER reconnect. > > So I get that setting ZMQ_RECONNECT_IVL should prevent us from > reconnecting (automatically) to the disconnected endpoint, but I > don’t see the benefit of preventing that endpoint from actively > reconnecting at a later time. In this case, we’ve essentially > blacklisted that endpoint (forever), and I’m having trouble coming up > with a scenario where that would be intended behavior. > > Does this make sense? Am I missing something here? > > Also, to your point about adding a protocol layer on top of 0MQ — I > would MUCH prefer to let 0MQ handle as much of the underlying > connect/disconnect logic as possible. I’m concerned about the > potential for the protocol’s view of the connection state getting out > of sync with 0MQ’s view (not to mention a bunch of additional work on > the protocol layer, but more about synchronization). > > Thanks for listening ... > > Bill
I see. I guess there's a terminology confusion issue here - when I wrote about connections and disconnections, I meant the automated ones that happen in the background in the I/O thread. But I guess it makes sense that a manual call to zmq_connect should work as expected. A workaround for this behaviour would be for the application to manually call zmq_disconnect before doing a connect to the same endpoint. But it turns out fixing it to automatically do it is not too hard (unless I've made some silly mistake): https://github.com/zeromq/libzmq/pull/2756 > > On Sep 17, 2017, at 6:39 AM, Luca Boccassi <luca.bocca...@gmail.com > > > wrote: > > > > On Sat, 2017-09-16 at 14:34 -0400, Bill Torpey wrote: > > > Hi Luca: > > > > > > Just a gentle reminder to add an issue so this can be tracked (or > > > let > > > me know if you’d prefer that I do that). > > > > > > Thanks! > > > > > > Bill > > > > Thinking about this a bit more, I think it's expected behaviour > > after > > all. From the doc: > > > > "The 'ZMQ_RECONNECT_IVL' option shall set the initial reconnection > > interval for the specified 'socket'. The reconnection interval is > > the > > period 0MQ shall wait between attempts to reconnect disconnected > > peers > > when using connection-oriented transports. The value -1 means no > > reconnection." > > > > So it is working as intended - if a peer goes away, it will never > > be > > reconnected if that option is set. > > > > And it makes sense - in the context of a TCP connection, a dead > > peer is > > a dead peer. If for an application a dead peer might be resurrected > > after X amount of time, there's no way to know that. It needs to be > > handled by the application. > > > > There are various tools you can use: > > > > 1) ZMTP heartbeats - see ZMQ_HEARTBEAT* socket options > > 2) socket monitoring events (including connects and disconnects) - > > see > > zmq_socket_monitor documentation > > 3) Enhance your protocol - call zmq_disconnect(endpoint) on your > > sockets when a particular message is received, or heartbeats are > > missed, or a disconnect event happens. This way when you later call > > zmq_connect(endpoint) and it happens to match a previous, dead > > peer, it > > will work as expected > > > > > > On Sep 2, 2017, at 1:21 PM, Luca Boccassi <luca.boccassi@gmail. > > > > com> > > > > wrote: > > > > > > > > On Sat, 2017-09-02 at 12:02 -0400, Bill Torpey wrote: > > > > > Thanks again, Luca! > > > > > > > > > > For now, I’m going to go with disabling reconnect on the > > > > > “data” > > > > > sockets — that seems to be the best solution for my use case > > > > > (connecting to endpoints that were returned by the peer > > > > > binding > > > > > to an > > > > > unspecified (“wildcard”) port — e.g., "tcp://<interface>:*" > > > > > in > > > > > ZMQ). > > > > > > > > > > This assumes that ZMQ will completely forget about the > > > > > endpoint > > > > > if/when it is disconnected, if it is set not to > > > > > reconnect. Otherwise > > > > > I might run afoul of ZMQ’s silently ignoring connections to > > > > > endpoints > > > > > that it already knows about: https://github.com/zeromq/libzm > > > > > q/is > > > > > sues > > > > > /788 <https://github.com/zeromq/libzmq/issues/788> (e.g., in > > > > > the > > > > > case > > > > > where another process later happens to be assigned the same > > > > > ephemeral > > > > > port). > > > > > > > > > > I’ve done a quick scan of the libzmq code (v4.2.2) and it > > > > > doesn’t > > > > > appear that the endpoint is removed in the case of a > > > > > (terminal) > > > > > disconnect. If you can confirm/deny this behavior, that > > > > > would be > > > > > helpful. Failing that, I guess I’ll need to test this in the > > > > > debugger — any hints on how best to do this would also be > > > > > much > > > > > appreciated. > > > > > > > > > > Regards, > > > > > > > > > > Bill > > > > > > > > Yes it doesn't look like it removes the endpoint - I guess it's > > > > a > > > > corner case that's missed. I'll open an issue. > > > > > > > > BTW all these things are very quick and easy to try with Python > > > > on > > > > Linux. Just install pyzmq, open a python3 terminal and: > > > > > > > > import zmq > > > > ctx = zmq.Context.instance() > > > > rep = ctx.socket(zmq.REP) > > > > rep.bind("tcp://127.0.0.1:12345") > > > > req = ctx.socket(zmq.REQ) > > > > req.connect("tcp://127.0.0.1:12345") > > > > req.send_string("hello") > > > > rep.recv() > > > > rep.send_string("hallo") > > > > req.recv() > > > > rep.unbind("tcp://127.0.0.1:12345") > > > > rep.close() > > > > rep = ctx.socket(zmq.REP) > > > > rep.bind("tcp://127.0.0.1:12345") > > > > req.send_string("hello") > > > > rep.recv() > > > > rep.send_string("hallo") > > > > req.recv() > > > > rep.unbind("tcp://127.0.0.1:12345") > > > > rep.close() > > > > req.close() > > > > rep = ctx.socket(zmq.REP) > > > > rep.bind("tcp://127.0.0.1:12345") > > > > req = ctx.socket(zmq.REQ) > > > > req.setsockopt(zmq.RECONNECT_IVL, > > > > -1)req.connect("tcp://127.0.0.1:12345") > > > > req.send_string("hello") > > > > rep.recv() > > > > rep.send_string("hallo") > > > > req.recv() > > > > rep.unbind("tcp://127.0.0.1:12345") > > > > rep.close() > > > > rep = ctx.socket(zmq.REP) > > > > rep.bind("tcp://127.0.0.1:12345") > > > > req.send_string("hello") > > > > rep.recv() > > > > > > > > This last one won't receive the message > > > > > > > > > > On Sep 1, 2017, at 6:19 PM, Luca Boccassi <luca.boccassi@gm > > > > > > ail. > > > > > > com> > > > > > > wrote: > > > > > > > > > > > > On Fri, 2017-09-01 at 18:03 -0400, Bill Torpey wrote: > > > > > > > Thanks Luca! That was very helpful. > > > > > > > > > > > > > > Although it leads to a couple of other questions: > > > > > > > > > > > > > > - Can I assume that a ZMQ disconnect of a tcp endpoint > > > > > > > would > > > > > > > only > > > > > > > occur if the underlying TCP socket is closed by the OS? > > > > > > > Or > > > > > > > are > > > > > > > there > > > > > > > conditions in which ZMQ will proactively disconnect the > > > > > > > TCP > > > > > > > socket > > > > > > > and try to reconnect? > > > > > > > > > > > > Normally that's the case - you can set up heartbeating with > > > > > > the > > > > > > appropriate options and that will kill a connection if > > > > > > there's > > > > > > no > > > > > > answer > > > > > > > > > > > > > - I see that there is a sockopt (ZMQ_RECONNECT_IVL) that > > > > > > > can > > > > > > > be > > > > > > > set > > > > > > > to -1 to disable reconnection entirely. In my case, the > > > > > > > the > > > > > > > “data” > > > > > > > socket pair will *always* connect to an ephemeral port, > > > > > > > so I > > > > > > > *never* > > > > > > > want to reconnect. Would this be a reasonable option in > > > > > > > my > > > > > > > case, > > > > > > > do > > > > > > > you think? > > > > > > > > > > > > If that makes sense for your application, go for it - in > > > > > > these > > > > > > cases > > > > > > the only way to be sure is to test it and see how it works > > > > > > > > > > > > > - Would there be any interest in a patch that would > > > > > > > disable > > > > > > > reconnects (controlled by sockopt) for ephemeral ports > > > > > > > only? I’m > > > > > > > guessing that reconnecting mostly makes sense with well- > > > > > > > known > > > > > > > ports, > > > > > > > so something like this may be of general interest? > > > > > > > > > > > > If by ephemeral port you mean anything over 1024, then > > > > > > actually > > > > > > in > > > > > > most > > > > > > applications I've seen it's always useful to reconnect, and > > > > > > the > > > > > > existing option should be enough for those cases where it's > > > > > > not > > > > > > desired > > > > > > - we don't want to duplicate functionality > > > > > > > > > > > > > Thanks again! > > > > > > > > > > > > > > Bill > > > > > > > > > > > > > > > On Sep 1, 2017, at 5:30 PM, Luca Boccassi <luca.boccass > > > > > > > > i@gm > > > > > > > > ail. > > > > > > > > com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, 2017-09-01 at 16:59 -0400, Bill Torpey wrote: > > > > > > > > > I'm curious about how ZMQ handles re-connection. I > > > > > > > > > understand > > > > > > > > > that > > > > > > > > > re-connection is supposed to happen "automagically" > > > > > > > > > under > > > > > > > > > the > > > > > > > > > covers, > > > > > > > > > but that poses an interesting question. > > > > > > > > > > > > > > > > > > To make a long story short, the application I'm > > > > > > > > > working > > > > > > > > > on > > > > > > > > > uses > > > > > > > > > pub/sub sockets over TCP. and works like follows: > > > > > > > > > > > > > > > > > > At startup: > > > > > > > > > 1. connects to a proxy/broker at a well-known > > > > > > > > > address, > > > > > > > > > using > > > > > > > > > a > > > > > > > > > pub/sub socket pair ("discovery"); > > > > > > > > > 2. subscribes to a well-known topic using the > > > > > > > > > "discovery" > > > > > > > > > sub > > > > > > > > > socket; > > > > > > > > > 3. binds a different pub/sub socket pair ("data") > > > > > > > > > and > > > > > > > > > retrieves > > > > > > > > > the > > > > > > > > > actual endpoints assigned; > > > > > > > > > 4. publishes the "data" endpoints from step 3 on the > > > > > > > > > "discovery" > > > > > > > > > pub > > > > > > > > > socket; > > > > > > > > > > > > > > > > > > When the application receives a message on the > > > > > > > > > "discovery" > > > > > > > > > sub > > > > > > > > > socket, it connects the "data" socket pair to the > > > > > > > > > endpoints > > > > > > > > > specified > > > > > > > > > in the "discovery" message. > > > > > > > > > > > > > > > > > > So far, this seems to be working relatively well, and > > > > > > > > > allows > > > > > > > > > the > > > > > > > > > high-volume, low-latency "data" messages to be > > > > > > > > > sent/received > > > > > > > > > directly > > > > > > > > > between peers, avoiding the extra hop caused by a > > > > > > > > > proxy/broker > > > > > > > > > connection. The discovery messages use the > > > > > > > > > proxy/broker, > > > > > > > > > but > > > > > > > > > since > > > > > > > > > these are (very) low-volume the extra hop doesn't > > > > > > > > > matter. The > > > > > > > > > use of > > > > > > > > > the proxy also eliminates the "slow joiner" problem > > > > > > > > > that > > > > > > > > > can > > > > > > > > > happen > > > > > > > > > with other configurations. > > > > > > > > > > > > > > > > > > My question is what happens when one of the "data" > > > > > > > > > peer > > > > > > > > > sockets > > > > > > > > > disconnects. Since ZMQ (apparently) keeps trying to > > > > > > > > > reconnect, > > > > > > > > > what > > > > > > > > > would prevent another process from binding to the > > > > > > > > > same > > > > > > > > > ephemeral > > > > > > > > > port? > > > > > > > > > > > > > > > > > > - Can I assume that if the new application at that > > > > > > > > > port > > > > > > > > > is > > > > > > > > > not a > > > > > > > > > ZMQ > > > > > > > > > application, that the reconnect will (silently) fail, > > > > > > > > > and > > > > > > > > > continue to > > > > > > > > > be retried? > > > > > > > > > > > > > > > > The ZMTP handshake would fail, so yes. > > > > > > > > > > > > > > > > > - What if the new application at that port *IS* a ZMQ > > > > > > > > > application? Would the reconnect succeed? And if > > > > > > > > > so, > > > > > > > > > what > > > > > > > > > would > > > > > > > > > happen if it's a *DIFFERENT* ZMQ application, and the > > > > > > > > > messages > > > > > > > > > that > > > > > > > > > it's sending/receiving don't match what the original > > > > > > > > > application > > > > > > > > > expects? > > > > > > > > > > > > > > > > Depends on how you handle it in your application. If > > > > > > > > you > > > > > > > > have > > > > > > > > security > > > > > > > > concerns, then use CURVE with authentication so that > > > > > > > > only > > > > > > > > authorised > > > > > > > > peers can connect. > > > > > > > > > > > > > > > > > It's reasonable for the application to publish a > > > > > > > > > disconnect > > > > > > > > > message > > > > > > > > > when it terminates normally, and the connected peers > > > > > > > > > can > > > > > > > > > disconnect > > > > > > > > > that endpoint. But, applications don't always > > > > > > > > > terminate > > > > > > > > > normally > > > > > > > > > ;-) > > > > > > > > > > > > > > > > That's a common pattern. But the application needs to > > > > > > > > handle > > > > > > > > unexpected > > > > > > > > data somewhat gracefully. What that means is entirely > > > > > > > > up to > > > > > > > > the > > > > > > > > application - as far as the library is concerned, if > > > > > > > > the > > > > > > > > handshake > > > > > > > > succeeds then it's all good (hence the use case for > > > > > > > > CURVE). > > > > > > > > > > > > > > > > > Any guidance, hints or tips would be much appreciated > > > > > > > > > -- > > > > > > > > > thanks > > > > > > > > > in > > > > > > > > > advance! > > > > > > > > > > > > > > > > -- > > > > > > > > Kind regards, > > > > > > > > Luca > > > > > > > > Boccassi_______________________________________________ > > > > > > > > zeromq-dev mailing list > > > > > > > > zeromq-dev@lists.zeromq.org <mailto:zeromq-...@lists.ze > > > > > > > > romq > > > > > > > > .org > > > > > > > > > <mailto:zeromq-dev@lists.zeromq.org <mailto:zeromq-de > > > > > > > > > v@li > > > > > > > > > sts. > > > > > > > > > > > > > > > > zeromq.org>> > > > > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev> > > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>> > > > > > > > > > > > > > > _______________________________________________ > > > > > > > zeromq-dev mailing list > > > > > > > zeromq-dev@lists.zeromq.org <mailto:zeromq-...@lists.zero > > > > > > > mq.o > > > > > > > rg> > > > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev> > > > > > > > > > > > > -- > > > > > > Kind regards, > > > > > > Luca > > > > > > Boccassi_______________________________________________ > > > > > > zeromq-dev mailing list > > > > > > zeromq-dev@lists.zeromq.org <mailto:zeromq-dev@lists.zeromq > > > > > > .org > > > > > > > > > > > > > > > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev> > > > > > > > > > > _______________________________________________ > > > > > zeromq-dev mailing list > > > > > zeromq-dev@lists.zeromq.org > > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > > > -- > > > > Kind regards, > > > > Luca Boccassi_______________________________________________ > > > > zeromq-dev mailing list > > > > zeromq-dev@lists.zeromq.org > > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > > > _______________________________________________ > > > zeromq-dev mailing list > > > zeromq-dev@lists.zeromq.org <mailto:zeromq-dev@lists.zeromq.org> > > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > <https://lists.zeromq.org/mailman/listinfo/zeromq- > > > dev>_______________________________________________ > > > > zeromq-dev mailing list > > zeromq-dev@lists.zeromq.org <mailto:zeromq-dev@lists.zeromq.org> > > https://lists.zeromq.org/mailman/listinfo/zeromq-dev > > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev> > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > https://lists.zeromq.org/mailman/listinfo/zeromq-dev -- Kind regards, Luca Boccassi
signature.asc
Description: This is a digitally signed message part
_______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org https://lists.zeromq.org/mailman/listinfo/zeromq-dev