I've read everything I can find including the Printed book, but I am at a loss as to the definitive definition as to how PUB/SUB should behave in zmq.
A production system I'm using is experiencing message loss between several nodes using PUB/SUB. >From what i've read, PUB SUB should be reliable when the _HWM are set to zero (don't drop). By reliable I mean no messages should fail to be delivered to an already connected consumer. I implemented some utilities to reproduce the message loss in my system : zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289 zmq_pub: https://gist.github.com/easytiger/e382502badab49856357 zmq_pub takes a number of events to send and the logging frequency and zmq_sub only takes the logging frequency. zmq prints out the number of msgs received vs the packet contents containing the integer packet count from the publisher. It can be seen when sending events in a tight loop that messages simply go missing mid way through (loss is not at beginning or end ruling out slow connectors etc) In a small loop it usually works ok: $ ./zmq_pub 2000 1000 sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58 sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58 $ ./zmq_sub 1 RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1 RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2 RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3 RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4 [...] RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 You can see every message was sent as the counts align. However increase the message counts and messages start going missing $ ./zmq_pub 200000 100000 sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with rc=60 sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with rc=60 ./zmq_sub 10000 RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000 RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000 RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610 RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000 RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524 RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654 RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298 RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117 RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864 RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846 RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135 RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606 RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179 RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627 RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166 RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247 Is this expected behaviour? With PUSH/PULL I get no loss at all with similar utilities. If I put more work between sends (e.g. cout each time) and the full message the results are better. zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54 zmq_pull: https://gist.github.com/easytiger/268a630fd22f959fde93 Is there an issue/bug in my implementation that would cause this? Using zeromq 4.0.3 Many Thanks Gerry -- Gerry Steele
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
