Hi Pieter, On Mon, 12 Nov 2012, Pieter Hintjens wrote:
> You are probably right, and it could well be that a reboot simply > isn't reported by TCP as a network failure, except after a session > timeout (30 minutes by default). Even after a day, some of the remote subscribers didn't come back. Obviously I cannot control their software stack, but it seems that "30 minutes aftwards" is also not a guarantee for subscription reconnection. > If this is what's happening then we do need to add heartbeating > somewhere. One could argue that 0MQ is just mirroring TCP behaviour > but pragmatically I think heartbeating could be useful at a lower > level. However it's non-trivial to make this work generally, and at > least needs application support to define the intervals. > > Filing a bug report isn't going to help much, then, unless you're also > willing to help make the solution, or convince someone to help make > it. The first step to a solution is probably documenting that there is a use case were reconnection doesn't work. For this we need better experimentation which I am committing to. > What I would do if I was you is (a) experiment with application > heartbeats, which are quite simple to add, Sadly 'simple' is also not so simple, we basically offer a remote pubsub that anyone can connect to. So here we cannot control the application that does so. > (b) establish that indeed > it's the TCP session timeout that's causing this, and (c) see if you > can convince anyone on this list that adding (I assume) optional and > configurable heartbeats in libzmq for at least pub/sub sockets would > be a good investment. I'm going forward with step b. Stefan _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
