Hello,

Context:
I have 2 peers, let’s call them:
  - PeerA under Linux using cppzmq version 4.1.3
  - PeerB under Windows using NetMQ version 4.0.0
I need a mailbox between the 2 of them with 3 reliability guarantees:

  *   Every message is delivered once and only once
  *   Every message sent by one peer is received by the other in the same order 
it was sent.
  *   The integrity of the message is untouched
Considering those constraints I chose to develop a “mailbox” using 1 REP and 1 
REQ socket on each peer(2 sockets per peer). Each socket is used in a separate 
thread.
Each time a message is sent the receiver replies with a ackMessage to ensure 
delivery. This, with the REP/REQ state machine pattern, also guarantees order. 
And finally integrity should be the only guarantee made by ZMQ.
In order to not block forever each socket has sending and receiving timeouts. 
In case those timeouts are reached the sockets are destroyed and recreated.
To be clear I should mention that this “mailbox” runs on an environment with 
lots of threads on both machine (if that makes any difference).
Problem:
It works half the time. Meaning sometimes, every 30 seconds or so, both PeerA 
is sending and PeerB waiting and nothing gets transmitted so both sockets are 
destroyed and recreated. I have been trying to debug it using Wireshark but 
apart from the messages that are transmitted, I do not quite understand the 
inner workings of ZMQ and the messages that are exchanged in between.
Here is a Wireshark dump of the main Packets summary : 
https://pastebin.com/xRgWRWQ3
You can clearly see it disconnecting at 10:03:20 which would be expected if no 
packet was received (timeout in this example is 8 seconds) but attempts to 
reconnect fail several times until 10:03:35 where it finally reconnects and 
send a few more messages before failing again.
My questions would be:

  *   Can version difference between cppzmq and NetMQ create this kind of 
problem?
  *   Why this problem occurs even though one socket is listening and the other 
sending?
  *   Why is the problem present only when sending from PeerA to PeerB (Linux 
-> Windows) and not the other way around? (I have disabled firewall on both 
windows and Linux)
--
Victor Dumas





This message has been scanned for malware by Websense. www.websense.com
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to