Hello, everyone!

I have been observing some abnormally large latencies in an application 
currently under development that uses the 0MQ library for messaging between 
components distributed within and across multiple processes.  I am using 0MQ 
version 2.1.11 (though I also observed the issue with 2.1.10) and the Java 
bindings.  I see the abnormal latencies on my Windows 7 machine, but not my 
Linux machine.

What I have been seeing is that occasionally a message sent from one component 
to another will take a relatively large amount of time to arrive at the 
destination component.  Most messages have a latency of well under a 
millisecond, but occasionally a message will take almost exactly 10 
milliseconds (within a couple hundred microseconds or so).  Here is a link to a 
minimal-ish test program that demonstrates these abnormal latencies on my 
Windows 7 machine:

https://gist.github.com/1725091

The test program currently starts one thread to send ping messages and one 
thread to echo them back.  The pinging thread records the time at which the 
ping messages are sent and the time at which the echoes are received.  If the 
elapsed time is greater than 5 milliseconds it is considered to be an abnormal 
latency.  For each echo received, the latency in milliseconds is printed out, 
along with an overall percentage of all echoes that exceeded the 5 millisecond 
threshold.

The test program as-is will demonstrate the large latencies roughly 2% of the 
time.  If more pingers are added, either in the same process or in a separate 
process, this percentage will increase rather quickly.  For example, with 5 
pinger threads, the large latencies will be observed for about half of all 
echoed messages.  Using an inproc:// transport decreases the rate of abnormal 
latencies, but does not eliminate it.  When running the echoer on Linux and the 
pinger on Windows 7, or vice versa, the issue is still observed, which seems to 
indicate that the latencies can occur in either direction (from pinger to 
echoer or from echoer to pinger) since the issue doesn't occur with both the 
echoer and the pinger on Linux.

In the application currently under development (not the supplied test program), 
there are many dozens of components communicating with one another, though they 
do not send messages as rapidly as the pingers in the test program.  Still, in 
that application I see about 80% of all messages sent from one component to 
another suffer from these large latencies.  As a message will often have to 
make several hops to get to its final destination, these 10-millisecond 
latencies add up quickly.

The thing I find most interesting is that a message will either have a 
sub-millisecond latency or it will have a 10-millisecond latency, but nothing 
in between.  This looks to me like an artifact of something internal, rather 
than the degradation in performance of an overloaded thread.

So, does anyone have any thoughts, suggestions, or requests for clarification?

Sincerely,

Aja Walker
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to