Hey Pieter,

I put in the delay in both pub.py and req.py on purpose (in pub.py to make sure 
rep_sub.py has time to subscribe before pub.py starts publishing, and in req.py 
just to make the timing of starting pub.py and req.py easier).

The 1,000 was a typo, I did tests with 10,000 messages.

I started the test cases by hand. I always started req_sub.py first.

What I observed was that by playing around with the order of launching pub.py 
and req.py I sometimes lost messages, but since req_sub.py always starts first 
and pub.py has a delay, I figured it could not be due to "slow subscriber 
connect". However my results were not always reproducible. And right now I 
cannot reproduce them at all with zeromq 2.0.8 (I upgraded in the meantime).

I now suspect messages got lost because I repeatedly started the scripts and 
somehow I ended up with multiple publishers binding to the same endpoint after 
each other (with a subscriber connected continuously), in which case some get 
ignored. Could that make sense?

Sorry for the confusion. Consider it a user error until I got a better handle 
on what went wrong (and I can reproduce it). Best, Koert

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Pieter Hintjens
Sent: September 29 2010 06:41
To: ZeroMQ development list
Subject: Re: [zeromq-dev] losing messages

Koert,

And to be precise, on my notebook, the sub socket misses 12,478
messages while connecting.  If I raise the pub output to 100k messages
then the first message the sub socket receives is:

    pub-sub msg 12479

Cheers
Pieter


On Wed, Sep 29, 2010 at 12:33 PM, Pieter Hintjens <[email protected]> wrote:
> Koert,
>
> I've been trying your test cases.  There are delays in the req and pub
> programs.  Could you explain that?  I need to know whether to test
> with or without those delays.
>
> What I'm seeing is:
>
> * The delay in the req.py program has no effect, which is expected.
> * If I leave the delay in the publisher, the subscriber gets all
> messages, no matter what order I start the programs.
> * If I remove the delay in the publisher, the subscriber gets no
> messages, no matter what order I start the programs.
>
> Also, you mentioned 1,000 messages in your email but your test cases
> sent 10,000 messages.  Again, I need to know whether you changed this
> and why.
>
> Finally, how do you start the test cases, is it by hand or from a
> script?  This is relevant because doing it by hand introduces
> additional delays.
>
> What I think you are seeing (and what I'm certainly reproducing using
> your test cases) is the "slow subscriber connect" symptom, which
> means:
>
> * Connecting takes a certain time, say 10msecs
> * During that time a publisher can send say 10,000 messages
> * If the publisher does bind/send(10000) and the client does
> connect/recv, it will get nothing
>
> There are three trivial ways to verify that this is what's happening.
>
> 1. Send more messages, e.g. 100K instead of 1K or 10K
> 2. Send very large messages, which will take longer to send
> 3. Send periodic messages, i.e. 1 per second
>
> If you do send periodic messages and you number them, you will see
> that the first 1 or 2 messages a publisher sends are *always* lost
> unless you explicitly add a delay, or a synchronization of some kind.
>
> Hope this helps.
>
> -Pieter
>
>
> On Wed, Sep 22, 2010 at 9:16 PM, Pieter Hintjens <[email protected]> wrote:
>> Koert,
>>
>> So you're saying, if you start the subscriber after the publisher, you
>> don't get messages?
>>
>> If that's what you're seeing, it's normal.  Pubsub does not wait for
>> subscribers to connect, and if they arrive after the publisher has
>> sent its data, they will receive nothing.
>>
>> -Pieter
>>
>> On Tue, Sep 21, 2010 at 1:17 PM, Koert Kuipers
>> <[email protected]> wrote:
>>> Hello all,
>>>
>>> I ran into a problem while developing a server in python. When a program is
>>> listening to both a REP socket and a SUB socket, using multiplexing (poll),
>>> messages from the publisher (which should arrive at the SUB socket) get
>>> lost. This seems to only happen if there are also messages arriving at the
>>> REP socket, and typically all the messages from the publisher get lost.
>>>
>>>
>>>
>>> My setup:
>>>
>>> Windows XP (I also observed the problem on Ubuntu 10.04)
>>>
>>> zeromq 2.0.7
>>>
>>> pyzmq
>>>
>>>
>>>
>>> The problem doesn't always occur, and is somewhat hard to replicate.
>>>
>>>
>>>
>>> I ended up convincing myself that there is indeed a problem by writing 3
>>> little programs. Program 1 listens to REP and SUB socket, program 2 only has
>>> a PUB socket and sends 1000 messages, and program 3 only has REQ socket and
>>> does 1000 RPC requests in a row.
>>>
>>>
>>>
>>> When I start the programs in this order everything works as expected:
>>>
>>> Start program 1, then program 2 and then program 3 (program 3 starts while
>>> program 2 is still working). Program 1 will report it received 1000 messages
>>> on the PUB socket and 1000 messages on the REP socket.
>>>
>>>
>>>
>>> But when change the order I get into trouble. I start program 1, then
>>> program 3 and then program 2 (program 2 starts while program 3 is still
>>> working). Program 1 will report it received 1000 messages on the REP socket
>>> but none on the SUB socket.
>>>
>>>
>>>
>>> Best,
>>>
>>> Koert
>>>
>>>
>>>
>>> PS I attached the 3 programs. Hope that works.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>
>>
>>
>> --
>> -
>> Pieter Hintjens
>> iMatix - www.imatix.com
>>
>
>
>
> --
> -
> Pieter Hintjens
> iMatix - www.imatix.com
>



-- 
-
Pieter Hintjens
iMatix - www.imatix.com
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to