Hi Marcin, I tried running your test case with the new heartbeats turned on and I saw what I think should be the correct behavior. I set the heartbeat interval, timeout, and TTL to 500 ms, and less than a second after setting iptables to DROP, all the sockets on the peer side went from ESTABLISHED to SYN_SENT, indicating that they were trying to reconnect, and all the ESTABLISHED sockets on the router side were closed. After flushing the INPUT iptables chain, the peers eventually recovered. I put my updated copy of your test script here https://gist.github.com/jbreams/7f507beff87987afad98. I haven't tried this with 4.2.0 talking to 4.1.2 though, although in your configuration I think it would do almost the right thing - I'd expect the router side to work fine and the peers to never close their sockets.
Jonathan On Fri, Jun 26, 2015 at 4:58 PM, Marcin Romaszewicz <[email protected]> wrote: > Hi All, > I've gota trivial bit of code to reproduce this issue on a single host > using iptables to simulate network partition. > https://s3-us-west-2.amazonaws.com/marcin-zmq-example/zmq_test.cpp > The file has comments on how to run the executable, but the short version > is that you start a ZMQ_ROUTER listener which accepts connections from > other peers, and remembers their identities and pings them every 5 seconds. > Then, you start a number of peers which connect to this router and start > pinging it every few seconds. > Once you use the iptables command (also in the comments in the file), the > router can't ping the peers, and the peers can't ping the router. The file > descriptors and connections remain open forever on both sides. > Furthermore, when you undo the iptables block, the connections never come > back.
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
