[zeromq-dev] Sub problem
Hi, I am still observing a case were a connected ZeroMQ Sub is still not receiving any data anymore after a certain period of idle time without any data. We are willing to test this with exact numbers (our estimation is that less than 4 hours no data results in a dead client) but in the mean time I was reasoning about problem. My hypothesis of this 'dead client' is that the socket is for some reason in a disconnected state, after it reconnects it does not apply the 'original' socket options. This cannot be reproduced using normal code, a socket in ZeroMQ can't be closed and then reconnected or rebinded. Obviously we are going to see if we can reproduce the problem, and tcpdumping the connection to find out if the client is really receiving new data, or that the client is truly 'disconnected'. Would anyone be willing to take a peak if there is any basis for this reasoning? Stefan ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Sub problem
Hi Stefan, Are you connecting SUB to PUB via some firewall or proxy, or is this a direct connection? Also, is the publisher silent for long periods of time? The reason I ask is that we've seen connections going 'stale' sometimes; TCP not reporting an error at the client side but not transmitting anything. It's due to some proxy in the middle getting confused, presumably. If this is happening, consider sending keep-alive messages from PUB to SUB, which the SUB can discard (but must subscribe to, in 3.2). -Pieter On Thu, Dec 13, 2012 at 11:19 PM, Stefan de Konink ste...@konink.de wrote: Hi, I am still observing a case were a connected ZeroMQ Sub is still not receiving any data anymore after a certain period of idle time without any data. We are willing to test this with exact numbers (our estimation is that less than 4 hours no data results in a dead client) but in the mean time I was reasoning about problem. My hypothesis of this 'dead client' is that the socket is for some reason in a disconnected state, after it reconnects it does not apply the 'original' socket options. This cannot be reproduced using normal code, a socket in ZeroMQ can't be closed and then reconnected or rebinded. Obviously we are going to see if we can reproduce the problem, and tcpdumping the connection to find out if the client is really receiving new data, or that the client is truly 'disconnected'. Would anyone be willing to take a peak if there is any basis for this reasoning? Stefan ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Sub problem
Hi Pieter, On 12/14/12 00:20, Pieter Hintjens wrote: Are you connecting SUB to PUB via some firewall or proxy, or is this a direct connection? The same (local) iptables as before, just accept/drop, without any connection tracking what so ever. Also, is the publisher silent for long periods of time? Typically not (messages every second), but in this case the 'original' publisher had an issue, which lead to a longer silence. The reason I ask is that we've seen connections going 'stale' sometimes; TCP not reporting an error at the client side but not transmitting anything. It's due to some proxy in the middle getting confused, presumably. If this is happening, consider sending keep-alive messages from PUB to SUB, which the SUB can discard (but must subscribe to, in 3.2). I was considering this, and using zmq_poll to actually timeout and reconnect. Would you be interested in a tcpdump of a (failed) test? Stefan ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Sub problem
On Fri, Dec 14, 2012 at 12:23 AM, Stefan de Konink ste...@konink.de wrote: The same (local) iptables as before, just accept/drop, without any connection tracking what so ever. I recall last time you found the problem didn't happen when you connected directly, only when you went via iptables? So it's definitely doing *something*. If this is happening, consider sending keep-alive messages from PUB to SUB, which the SUB can discard (but must subscribe to, in 3.2). I was considering this, and using zmq_poll to actually timeout and reconnect. Would you be interested in a tcpdump of a (failed) test? No, but thanks for the offer. Yes, definitely use zmq_poll to timeout, and reconnect. However if TCP reports an error the SUB socket will reconnect automatically. What you have to do (and this is pretty standard across any protocol) is do your own keep-alive to keep the I'm really not messing with your connections, promised!* proxy in the middle out of trouble. PUB socket sends a null message once a second; SUB gets that (must subscribe to it!) and discards it. If SUB doesn't get null message after X seconds, decides that PUB is dead so looks for backup. -Pieter * It really is messing with your connections. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev