Hey, I had to wait for it to stop dropping pings again, which is why I did not get back. :)
After restarting the process again, the throughput became normal. Ian: >So is this throughput dropping (msgs/second) or dropped data (messages not arriving)? So it looks like at some point the messages started getting delayed by 290 - 400 seconds. So, it's quite likely that this crossed the HWM that we have on the forwarder. Pieter: I should have been more specific in my initial question. >- what version of 0MQ you are using We are using zeromq-2.2.0. >- the operating system and hardware configurations Ubuntu 12.04 on AWS EC2 instances. 2 virtual cores + 7 GB RAM on the forwarder, and 4 virtual cores + 15 GB RAM on the subscriber (albeit that runs other processes that don't have as much network / CPU requirements). Amazon claims that both of them have high I/O throughput. >- the message rate (messages per second) and typical message size 8000 messages per second at peak, 1.5KB typical size. >- whether consumers may be fighting for CPU cores with other processes Yes they could be. Would that explain getting delayed over a long period of time though? >- precisely the types of sockets you are using. We have 4 PUB-SUB forwarders. The producers each "publish" to one forwarder. We have many processes that subscribe from all four forwarders. The forwarders have HWM of 100000. >- whether you're losing one in every two messages, or bursts of messages. When the throughput decreased, I noticed that the messages were getting delayed almost consistently between 300 - 400 seconds. Pretty certain that this triggered the HWM. On Mon, Jan 28, 2013 at 4:53 PM, Varun Vijayaraghavan <[email protected]>wrote: > Pieter, Ian, > > Thanks for your replies, and you have raised good points. > > I am not certain about some of the questions you have asked. It's a good > place for me to start exploring. > > I'll reply back to this thread once I find something. > > Thanks again! > > > > > On Mon, Jan 28, 2013 at 4:17 PM, Ian Barber <[email protected]> wrote: > >> >> >> On Mon, Jan 28, 2013 at 9:32 AM, Varun Vijayaraghavan < >> [email protected]> wrote: >>> >>> On some of our processes, which incidentally run on smaller instances, >>> we see that the message count in the consumer suddenly drop to about 50%. >>> This happens once a week, and does not get fixed by itself till we restart >>> the consumer process. >>> >>> Is this expected behavior related to smaller machines, or .. something >>> else? Also, could someone explain the mechanism that would cause the ping >>> count to drop like that? >>> >> >> So is this throughput dropping (msgs/second) or dropped data (messages >> not arriving)? >> >> Ian >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > > -- > - varun :) > -- - varun :)
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
