Koert, I've been trying your test cases. There are delays in the req and pub programs. Could you explain that? I need to know whether to test with or without those delays.
What I'm seeing is: * The delay in the req.py program has no effect, which is expected. * If I leave the delay in the publisher, the subscriber gets all messages, no matter what order I start the programs. * If I remove the delay in the publisher, the subscriber gets no messages, no matter what order I start the programs. Also, you mentioned 1,000 messages in your email but your test cases sent 10,000 messages. Again, I need to know whether you changed this and why. Finally, how do you start the test cases, is it by hand or from a script? This is relevant because doing it by hand introduces additional delays. What I think you are seeing (and what I'm certainly reproducing using your test cases) is the "slow subscriber connect" symptom, which means: * Connecting takes a certain time, say 10msecs * During that time a publisher can send say 10,000 messages * If the publisher does bind/send(10000) and the client does connect/recv, it will get nothing There are three trivial ways to verify that this is what's happening. 1. Send more messages, e.g. 100K instead of 1K or 10K 2. Send very large messages, which will take longer to send 3. Send periodic messages, i.e. 1 per second If you do send periodic messages and you number them, you will see that the first 1 or 2 messages a publisher sends are *always* lost unless you explicitly add a delay, or a synchronization of some kind. Hope this helps. -Pieter On Wed, Sep 22, 2010 at 9:16 PM, Pieter Hintjens <[email protected]> wrote: > Koert, > > So you're saying, if you start the subscriber after the publisher, you > don't get messages? > > If that's what you're seeing, it's normal. Pubsub does not wait for > subscribers to connect, and if they arrive after the publisher has > sent its data, they will receive nothing. > > -Pieter > > On Tue, Sep 21, 2010 at 1:17 PM, Koert Kuipers > <[email protected]> wrote: >> Hello all, >> >> I ran into a problem while developing a server in python. When a program is >> listening to both a REP socket and a SUB socket, using multiplexing (poll), >> messages from the publisher (which should arrive at the SUB socket) get >> lost. This seems to only happen if there are also messages arriving at the >> REP socket, and typically all the messages from the publisher get lost. >> >> >> >> My setup: >> >> Windows XP (I also observed the problem on Ubuntu 10.04) >> >> zeromq 2.0.7 >> >> pyzmq >> >> >> >> The problem doesn’t always occur, and is somewhat hard to replicate. >> >> >> >> I ended up convincing myself that there is indeed a problem by writing 3 >> little programs. Program 1 listens to REP and SUB socket, program 2 only has >> a PUB socket and sends 1000 messages, and program 3 only has REQ socket and >> does 1000 RPC requests in a row. >> >> >> >> When I start the programs in this order everything works as expected: >> >> Start program 1, then program 2 and then program 3 (program 3 starts while >> program 2 is still working). Program 1 will report it received 1000 messages >> on the PUB socket and 1000 messages on the REP socket. >> >> >> >> But when change the order I get into trouble. I start program 1, then >> program 3 and then program 2 (program 2 starts while program 3 is still >> working). Program 1 will report it received 1000 messages on the REP socket >> but none on the SUB socket. >> >> >> >> Best, >> >> Koert >> >> >> >> PS I attached the 3 programs. Hope that works. >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > > > -- > - > Pieter Hintjens > iMatix - www.imatix.com > -- - Pieter Hintjens iMatix - www.imatix.com _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
