[zeromq-dev] Sub problem

2012-12-13 Thread Stefan de Konink
Hi,


I am still observing a case were a connected ZeroMQ Sub is still not
receiving any data anymore after a certain period of idle time without
any data. We are willing to test this with exact numbers (our estimation
is that less than 4 hours no data results in a dead client) but in the
mean time I was reasoning about problem.

My hypothesis of this 'dead client' is that the socket is for some
reason in a disconnected state, after it reconnects it does not apply
the 'original' socket options. This cannot be reproduced using normal
code, a socket in ZeroMQ can't be closed and then reconnected or rebinded.

Obviously we are going to see if we can reproduce the problem, and
tcpdumping the connection to find out if the client is really receiving
new data, or that the client is truly 'disconnected'.

Would anyone be willing to take a peak if there is any basis for this
reasoning?


Stefan
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Sub problem

2012-12-13 Thread Pieter Hintjens
Hi Stefan,

Are you connecting SUB to PUB via some firewall or proxy, or is this a
direct connection?

Also, is the publisher silent for long periods of time?

The reason I ask is that we've seen connections going 'stale'
sometimes; TCP not reporting an error at the client side but not
transmitting anything. It's due to some proxy in the middle getting
confused, presumably.

If this is happening, consider sending keep-alive messages from PUB to
SUB, which the SUB can discard (but must subscribe to, in 3.2).

-Pieter

On Thu, Dec 13, 2012 at 11:19 PM, Stefan de Konink ste...@konink.de wrote:
 Hi,


 I am still observing a case were a connected ZeroMQ Sub is still not
 receiving any data anymore after a certain period of idle time without
 any data. We are willing to test this with exact numbers (our estimation
 is that less than 4 hours no data results in a dead client) but in the
 mean time I was reasoning about problem.

 My hypothesis of this 'dead client' is that the socket is for some
 reason in a disconnected state, after it reconnects it does not apply
 the 'original' socket options. This cannot be reproduced using normal
 code, a socket in ZeroMQ can't be closed and then reconnected or rebinded.

 Obviously we are going to see if we can reproduce the problem, and
 tcpdumping the connection to find out if the client is really receiving
 new data, or that the client is truly 'disconnected'.

 Would anyone be willing to take a peak if there is any basis for this
 reasoning?


 Stefan
 ___
 zeromq-dev mailing list
 zeromq-dev@lists.zeromq.org
 http://lists.zeromq.org/mailman/listinfo/zeromq-dev
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Sub problem

2012-12-13 Thread Stefan de Konink
Hi Pieter,


On 12/14/12 00:20, Pieter Hintjens wrote:
 Are you connecting SUB to PUB via some firewall or proxy, or is this a
 direct connection?

The same (local) iptables as before, just accept/drop, without any
connection tracking what so ever.


 Also, is the publisher silent for long periods of time?

Typically not (messages every second), but in this case the 'original'
publisher had an issue, which lead to a longer silence.


 The reason I ask is that we've seen connections going 'stale'
 sometimes; TCP not reporting an error at the client side but not
 transmitting anything. It's due to some proxy in the middle getting
 confused, presumably.
 
 If this is happening, consider sending keep-alive messages from PUB to
 SUB, which the SUB can discard (but must subscribe to, in 3.2).

I was considering this, and using zmq_poll to actually timeout and
reconnect. Would you be interested in a tcpdump of a (failed) test?


Stefan


___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Sub problem

2012-12-13 Thread Pieter Hintjens
On Fri, Dec 14, 2012 at 12:23 AM, Stefan de Konink ste...@konink.de wrote:

 The same (local) iptables as before, just accept/drop, without any
 connection tracking what so ever.

I recall last time you found the problem didn't happen when you
connected directly, only when you went via iptables? So it's
definitely doing *something*.

 If this is happening, consider sending keep-alive messages from PUB to
 SUB, which the SUB can discard (but must subscribe to, in 3.2).

 I was considering this, and using zmq_poll to actually timeout and
 reconnect. Would you be interested in a tcpdump of a (failed) test?

No, but thanks for the offer. Yes, definitely use zmq_poll to timeout,
and reconnect. However if TCP reports an error the SUB socket will
reconnect automatically.

What you have to do (and this is pretty standard across any protocol)
is do your own keep-alive to keep the I'm really not messing with
your connections, promised!* proxy in the middle out of trouble.

PUB socket sends a null message once a second; SUB gets that (must
subscribe to it!) and discards it. If SUB doesn't get null message
after X seconds, decides that PUB is dead so looks for backup.

-Pieter

* It really is messing with your connections.
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev