On Dec 12, 2013, at 17:40, Randall Nortman <[email protected]> wrote:

> On Thu, Dec 12, 2013 at 06:46:12PM +0200, artemv zmq wrote:
> [...]
>>   Now my question is going more to networking field. I want that kind of
>>   situations when I can lose connection __but__ nor FIN, neither RST will be
>>   generated . In other words I want to lose connection veyr silently (from
>>   client perspective). Will be much appreciated for all that possible
>>   scenarios. �Thanks in advance.
> 
> The situations are numerous -- anything happening to any piece of the
> chain in between one application-level socket and the other can cause
> this.  You have seen that at the host level a software firewall like
> iptables can do it.  Here's a short, incomplete list of other
> possibilities:
> 
> - Network cable unplugged on either end, or at any switch/router in
>  between
(…)

let me take this to a concrete ZMQ case:

- node B does a bind socket
- node A does a connect into B
- between A and B there is bad network with bad nat machines.

What happens:

1. network slows down or stops and HWM fills up

1.1 A with blocking socket (PUSH, DEALER, etc.):
solution: send will fail when HWM is hit. Call socket.disconnect and reconnect. 
loose the messages on the local buffer.

1.2 A with nonblocking socket (PUB, ROUTER):
solution: configure socket to use timeouts so it return error in case of fail, 
or use pollers, etc. Same as 1.1 

1.3 socket with multiple connects and blocking (e.g. push)
send will only fail when all connections are stale. if n-1 are stale but one is 
still working, there is no easy way to know about it.

1.4 socket with multiple connects and non blocking (e.g. pub)
some subscribers will receive nothing and A and B won’t know about it.

Additional trick: set linger to 0 or else the client disconnect may still try 
to send bytes and never close the connection.


2. network is broken in such a way that A’s side of the nat is closed but B’s 
side is still ESTABLISHED
solution: no idea. can’t unbind the socket. can’t understand that the TCP is 
dead. Even if keep alive ZMQ packets were used, don’t know how to kill that 
connection.


This is what’s happening to me now. Using PUSH-PULL, linger(0) and send with 
timeout I can force the clients to try to reconnect. Sometimes connections will 
still hang on the client side, but mostly on FIN_WAIT1 state, albeit I’ve seen 
a couple ESTABLISHED that couldn’t understand. On the bind side it’s typical to 
have dozens of ESTABLISHED connections and not be able to clean them up. They 
don’t seem to affect the performance, but if instead of dozens it becomes 
hundreds of thousands, it could become a problem.

In conclusion, it’s great that ZMQ abstracts the sockets for us, but when sxxx 
hits the fan, it would be nice to be able to press the panic button. In this 
case the only panic button available is closing the zmq socket and opening a 
new one, killing everything.

Please note I’m not complaining about anything here. Just quite confused with 
the current state of my lan and struggling to get solutions to my problems and, 
hopefully, help others and the project with that knowledge. 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to