OK, I did a simple test to try to reproduce this at the dealer-router level and it doesn't happen. So it's not a libzmq issue. I'll dig deeper, it has to be something in the way Zyre is managing its sockets...
On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen <steve.rasmus...@rassimtech.com> wrote: > At little more information: > > One of the first things I tried, when the Wi-Fi connection was > re-established, was delaying sending the START message, until after the old > messages had been received. I couldn't figure out a good time to delay, but > If I delayed it long enough, the HELLO would get through and kick off the > handshake. This made it seem to me that messages were being buffered > somewhere. > > If I just started periodically sending HELLO messages, after receiving > beacons, without removing the peer, the HELLO messages would not ever get > through. > > -Steve > > -----Original Message----- > From: zeromq-dev-boun...@lists.zeromq.org > [mailto:zeromq-dev-boun...@lists.zeromq.org] On Behalf Of Pieter Hintjens > Sent: Friday, June 6, 2014 1:18 PM > To: ZeroMQ development list > Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue > > OK, I've pushed a patch that fixes it, using your workaround more or less. > > I want to test this at the libzmq level, it's weird that old messages are > getting through and the new ones aren't. > > -Pieter > > On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <p...@imatix.com> wrote: >> OK, I've reproduced the problem quite easily. Something strange with >> messages being delivered even though the socket they're sent on is >> torn down entirely. I'm investigating... >> >> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <p...@imatix.com> wrote: >>> OK, I'll simulate this in the code. The peers should automatically >>> resend HELLO if they lost contact. >>> >>> No thanks needed, we enjoy making this software and use it in >>> everything we make. :-) >>> >>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen >>> <steve.rasmus...@rassimtech.com> wrote: >>>>> In principle if the connection is re-established there should be no >>>>> new >>>> HELLO message sent. >>>> >>>> This problem occurs after the Wi-Fi connection has been down long >>>> enough for the peers to remove each other. When the connection come >>>> back up, as I understand it, the HELLO message is necessary to kick-off > handshaking. >>>> >>>>> Can you find a way to reproduce the problem easily? >>>> The easiest method that I've found is using a modified version of >>>> the zpinger tool on two laptops. The modified zpinger tool is set up >>>> to send a whisper, after a time delay, anytime it receives a whisper >>>> from a peer. I either turn the Wi-Fi adapter off/on or move the >>>> laptop out of range to perform the test. >>>> >>>> It seems like this may have something to do with the sockets >>>> maintaining the TCP/IP connection during the break and then being in >>>> a bad state when the Wi-Fi connection comes back up. Is this >>>> possible? If so is there some way to reset the TCP/IP connection? >>>> >>>>> Thanks for taking the time to analyse the problem. >>>> >>>> I need this capability for the system I'm developing. Thank you and >>>> your colleagues for ZeroMQ, CZMQ, Zyre, ... >>>> >>>> Regards, >>>> >>>> Steve >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: zeromq-dev-boun...@lists.zeromq.org >>>> [mailto:zeromq-dev-boun...@lists.zeromq.org] On Behalf Of Pieter >>>> Hintjens >>>> Sent: Thursday, June 5, 2014 5:22 PM >>>> To: ZeroMQ development list >>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue >>>> >>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen >>>> <steve.rasmus...@rassimtech.com> wrote: >>>> >>>>> The problem seems to be with the TCP/IP connection not the beacon. >>>>> After a >>>> network break, the beacon reestablishes the connection, but no data >>>> is getting through the tcp/ip connection. >>>>> It looks as if there are messages that are being buffered before >>>>> the break >>>> and then delivered after. This prevents the "HELLO" message from >>>> getting through. I've tried various things, but the closest the I've >>>> come, so far, is to keep removing the peer until it is reported as >>>> being ready. I'm doing this in the "zyre_node_require_peer" >>>> function. If a peer exists I check to see if it is ready, >>>> "zyre_peer_ready" and if not, I remove the peer, >>>> "zyre_node_remove_peer". This seems to fix the problem that I'm having, > but it seems a little kludgie. >>>> >>>> Thanks for taking the time to analyse the problem. >>>> >>>> In principle if the connection is re-established there should be no >>>> new HELLO message sent. Can you find a way to reproduce the problem > easily? >>>> >>>> Feel free to make a pull request with your change anyhow. I'm >>>> reworking a lot of this code atm so will try to include your change >>>> if I can reproduce the error. >>>> >>>> -Pieter >>>> _______________________________________________ >>>> zeromq-dev mailing list >>>> zeromq-dev@lists.zeromq.org >>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>>> >>>> _______________________________________________ >>>> zeromq-dev mailing list >>>> zeromq-dev@lists.zeromq.org >>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev