Hey Pieter, I put in the delay in both pub.py and req.py on purpose (in pub.py to make sure rep_sub.py has time to subscribe before pub.py starts publishing, and in req.py just to make the timing of starting pub.py and req.py easier).
The 1,000 was a typo, I did tests with 10,000 messages. I started the test cases by hand. I always started req_sub.py first. What I observed was that by playing around with the order of launching pub.py and req.py I sometimes lost messages, but since req_sub.py always starts first and pub.py has a delay, I figured it could not be due to "slow subscriber connect". However my results were not always reproducible. And right now I cannot reproduce them at all with zeromq 2.0.8 (I upgraded in the meantime). I now suspect messages got lost because I repeatedly started the scripts and somehow I ended up with multiple publishers binding to the same endpoint after each other (with a subscriber connected continuously), in which case some get ignored. Could that make sense? Sorry for the confusion. Consider it a user error until I got a better handle on what went wrong (and I can reproduce it). Best, Koert -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Pieter Hintjens Sent: September 29 2010 06:41 To: ZeroMQ development list Subject: Re: [zeromq-dev] losing messages Koert, And to be precise, on my notebook, the sub socket misses 12,478 messages while connecting. If I raise the pub output to 100k messages then the first message the sub socket receives is: pub-sub msg 12479 Cheers Pieter On Wed, Sep 29, 2010 at 12:33 PM, Pieter Hintjens <[email protected]> wrote: > Koert, > > I've been trying your test cases. There are delays in the req and pub > programs. Could you explain that? I need to know whether to test > with or without those delays. > > What I'm seeing is: > > * The delay in the req.py program has no effect, which is expected. > * If I leave the delay in the publisher, the subscriber gets all > messages, no matter what order I start the programs. > * If I remove the delay in the publisher, the subscriber gets no > messages, no matter what order I start the programs. > > Also, you mentioned 1,000 messages in your email but your test cases > sent 10,000 messages. Again, I need to know whether you changed this > and why. > > Finally, how do you start the test cases, is it by hand or from a > script? This is relevant because doing it by hand introduces > additional delays. > > What I think you are seeing (and what I'm certainly reproducing using > your test cases) is the "slow subscriber connect" symptom, which > means: > > * Connecting takes a certain time, say 10msecs > * During that time a publisher can send say 10,000 messages > * If the publisher does bind/send(10000) and the client does > connect/recv, it will get nothing > > There are three trivial ways to verify that this is what's happening. > > 1. Send more messages, e.g. 100K instead of 1K or 10K > 2. Send very large messages, which will take longer to send > 3. Send periodic messages, i.e. 1 per second > > If you do send periodic messages and you number them, you will see > that the first 1 or 2 messages a publisher sends are *always* lost > unless you explicitly add a delay, or a synchronization of some kind. > > Hope this helps. > > -Pieter > > > On Wed, Sep 22, 2010 at 9:16 PM, Pieter Hintjens <[email protected]> wrote: >> Koert, >> >> So you're saying, if you start the subscriber after the publisher, you >> don't get messages? >> >> If that's what you're seeing, it's normal. Pubsub does not wait for >> subscribers to connect, and if they arrive after the publisher has >> sent its data, they will receive nothing. >> >> -Pieter >> >> On Tue, Sep 21, 2010 at 1:17 PM, Koert Kuipers >> <[email protected]> wrote: >>> Hello all, >>> >>> I ran into a problem while developing a server in python. When a program is >>> listening to both a REP socket and a SUB socket, using multiplexing (poll), >>> messages from the publisher (which should arrive at the SUB socket) get >>> lost. This seems to only happen if there are also messages arriving at the >>> REP socket, and typically all the messages from the publisher get lost. >>> >>> >>> >>> My setup: >>> >>> Windows XP (I also observed the problem on Ubuntu 10.04) >>> >>> zeromq 2.0.7 >>> >>> pyzmq >>> >>> >>> >>> The problem doesn't always occur, and is somewhat hard to replicate. >>> >>> >>> >>> I ended up convincing myself that there is indeed a problem by writing 3 >>> little programs. Program 1 listens to REP and SUB socket, program 2 only has >>> a PUB socket and sends 1000 messages, and program 3 only has REQ socket and >>> does 1000 RPC requests in a row. >>> >>> >>> >>> When I start the programs in this order everything works as expected: >>> >>> Start program 1, then program 2 and then program 3 (program 3 starts while >>> program 2 is still working). Program 1 will report it received 1000 messages >>> on the PUB socket and 1000 messages on the REP socket. >>> >>> >>> >>> But when change the order I get into trouble. I start program 1, then >>> program 3 and then program 2 (program 2 starts while program 3 is still >>> working). Program 1 will report it received 1000 messages on the REP socket >>> but none on the SUB socket. >>> >>> >>> >>> Best, >>> >>> Koert >>> >>> >>> >>> PS I attached the 3 programs. Hope that works. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >> >> >> >> -- >> - >> Pieter Hintjens >> iMatix - www.imatix.com >> > > > > -- > - > Pieter Hintjens > iMatix - www.imatix.com > -- - Pieter Hintjens iMatix - www.imatix.com _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
