Thanks Charles, that's pretty much my understanding too. Meaning this is a bug in my implementation or in zeromq.
I understand the implications of the slow consumer problem but the fundamental issue here is to establish trust in PUB/SUB. On 14 June 2014 21:09, Charles Remes <[email protected]> wrote: > Let’s back up for a second. > > Take a look at the man page for zmq_setsockopt and read the section on > ZMQ_SNDHWM. It clearly states that zero means “no limit.” Second, it also > states that when the socket reaches its exceptional state then it will > either block or drop messages depending on socket type. > > Next, look at the man page for zmq_socket and check the ZMQ_PUB section. > The socket will reach its mute state (its exceptional state) when it > reaches it high water mark. When it’s mute, it will drop messages. > > So, taking the two together then a socket with a ZMQ_SNDHWM of 0 should > never drop messages because it will never reach its mute state. > > The one exception to this is when there are no SUB sockets connected to > the PUB socket. When there are no connections, all messages are dropped > (because no one is listening and there are no queues created). > > However, I highly recommend *against* setting HWM to 0 for a PUB socket. > Here’s why: > > 1. It gives you a false sense of security that all messages will be > delivered. > If the publishing process dies, any messages in queue go with it so > they’ll never get delivered. > > 2. Your subscribers might be too slow. > If your subscribers can’t keep up with the message flow and the publisher > starts queueing, it *will* run out of memory. You’ll either exhaust the > amount of memory allowed by your process, or your OS will start paging & > swapping and you’ll wish the process had just died. > > cr > > > On Jun 13, 2014, at 5:34 PM, Gerry Steele <[email protected]> wrote: > > Hi Brian > > I noticed your comment on another thread about this and I think you got it > a bit wrong: > > > The high water mark is a hard limit on the maximum number of > outstanding messages ØMQ shall queue in memory for any single peer that the > specified*socket* is communicating with.* A value of zero means no limit.* > > and from your link: > > > Since v3.x, ØMQ forces default limits on its internal buffers (the > so-called high-water mark or HWM), so publisher crashes are rarer *unless > you deliberately set the HWM to infinite.* > > Nothing I read indicates anything other than the fact that no messages > post connections being made should be dropped. > > Thanks > G > > > > On 13 June 2014 23:17, Brian Knox <[email protected]> wrote: > >> "From what i've read, PUB SUB should be reliable when the _HWM are set to >> zero (don't drop). By reliable I mean no messages should fail to be >> delivered to an already connected consumer." >> >> >> Your understanding of pub-sub behavior and how it interacts with the HWM >> is incorrect. Please see: http://zguide.zeromq.org/php:chapter5 >> >> Brian >> >> >> >> >> On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele <[email protected]> >> wrote: >> >>> I've read everything I can find including the Printed book, but I am at >>> a loss as to the definitive definition as to how PUB/SUB should behave in >>> zmq. >>> >>> A production system I'm using is experiencing message loss between >>> several nodes using PUB/SUB. >>> >>> From what i've read, PUB SUB should be reliable when the _HWM are set to >>> zero (don't drop). By reliable I mean no messages should fail to be >>> delivered to an already connected consumer. >>> >>> I implemented some utilities to reproduce the message loss in my system : >>> >>> zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289 >>> zmq_pub: https://gist.github.com/easytiger/e382502badab49856357 >>> >>> >>> zmq_pub takes a number of events to send and the logging frequency and >>> zmq_sub only takes the logging frequency. zmq prints out the number of msgs >>> received vs the packet contents containing the integer packet count from >>> the publisher. >>> >>> It can be seen when sending events in a tight loop that messages simply >>> go missing mid way through (loss is not at beginning or end ruling out slow >>> connectors etc) >>> >>> In a small loop it usually works ok: >>> >>> $ ./zmq_pub 2000 1000 >>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with >>> rc=58 >>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with >>> rc=58 >>> >>> $ ./zmq_sub 1 >>> >>> RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1 >>> RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2 >>> RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3 >>> RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4 >>> [...] >>> RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 >>> >>> You can see every message was sent as the counts align. >>> >>> However increase the message counts and messages start going missing >>> >>> $ ./zmq_pub 200000 100000 >>> >>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #100000 with >>> rc=60 >>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #200000 with >>> rc=60 >>> >>> ./zmq_sub 10000 >>> RECV:10000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000 >>> RECV:20000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000 >>> RECV:30000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610 >>> RECV:40000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000 >>> RECV:50000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524 >>> RECV:60000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654 >>> RECV:70000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298 >>> RECV:80000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117 >>> RECV:90000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864 >>> RECV:100000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846 >>> RECV:110000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135 >>> RECV:120000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606 >>> RECV:130000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179 >>> RECV:140000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627 >>> RECV:150000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166 >>> RECV:160000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247 >>> >>> >>> Is this expected behaviour? With PUSH/PULL I get no loss at all with >>> similar utilities. >>> >>> If I put more work between sends (e.g. cout each time) and the full >>> message the results are better. >>> >>> zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54 >>> zmq_pull: https://gist.github.com/easytiger/268a630fd22f959fde93 >>> >>> Is there an issue/bug in my implementation that would cause this? >>> >>> Using zeromq 4.0.3 >>> >>> Many Thanks >>> Gerry >>> >>> >>> >>> >>> >>> -- >>> Gerry Steele >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > > -- > Gerry Steele > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > -- Gerry Steele
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
