I’ve been testing a lot of combinations of ZeroMQ over Java, between the pure 
jeromq base and the jzmq JNI libzmq C code. Albeit my impression so far is that 
jeromq is way faster than the binding - not that the code isn’t great, but my 
feeling so far is that the JNI jump slows everything down - at a certain point 
I felt the need for a simple zmq_proxy network node and I was pretty sure that 
the C code must be faster than the jeromq. I have some ideas that can improve 
the jeromq proxy code, but it felt easier to just compile the zmq_proxy code 
from the book.

Unfortunately something went completely wrong on my side so I need your help to 
understand what is happening here.

Context:
MacOSX Mavericks fully updated, MBPro i7 4x2 CPU 2.2Ghz 16GB
libzmq from git head
(same for jeromq and libzmq, albeit I’m using my own fork so I can send pulls 
back)
my data are json lines that goes from about 100 bytes to some multi MB 
exceptions, but the average of those million messages is about 500bytes.

Test 1: pure local_thr and remote_thr:

iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8881 500 1000000 &
iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
real    0m0.732s
user    0m0.516s
sys     0m0.394s
message size: 500 [B]
message count: 1000000
mean throughput: 1418029 [msg/s]
mean throughput: 5672.116 [Mb/s]

Test 2: change local_thr to perform connect instead of bind, and put a proxy in 
the middle.
The proxy is the first C code example from the book, available here 
https://gist.github.com/davipt/7361477
iDavi:c bruno$ gcc -o proxy proxy.c -I /usr/local/include/ -L /usr/local/lib/ 
-lzmq
iDavi:c bruno$ ./proxy tcp://*:8881 tcp://*:8882 1
Proxy type=PULL/PUSH in=tcp://*:8881 out=tcp://*:8882

iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
iDavi:perf bruno$ message size: 500 [B]
message count: 1000000
mean throughput: 74764 [msg/s]
mean throughput: 299.056 [Mb/s]

real    0m10.358s
user    0m0.668s
sys     0m0.508s


Test3: use the jeromq equivalent of the proxy: 
https://gist.github.com/davipt/7361623

iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
[1] 15816
iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
[2] 15830
iDavi:perf bruno$ 
real    0m3.429s
user    0m0.654s
sys     0m0.509s
message size: 500 [B]
message count: 1000000
mean throughput: 293532 [msg/s]
mean throughput: 1174.128 [Mb/s]

This performance coming out of Java is okish, it’s here just for comparison, 
and I’ll spend some time looking at it.

The core question is the C proxy - why 10 times slower than the no-proxy 
version?

One thing I noticed, by coincidence, is that on the upper side of the proxy, 
both with the C “producer” as well as the java one, tcpdump shows me 
consistently packets of 16332 (or the MTU size if using ethernet, 1438 I 
think). This value is consistent for the 4 combinations of producers and 
proxies (jeromq vs c).

But on the other side of the proxy, the result is completely different. With 
the jeromq proxy, I see packets of 8192 bytes, but with the C code I see 
packets of either 509 or 1010. It feels like the proxy is sending the messages 
one by one. Again, this value is consistent with the PULL consumer after the 
proxy, being it C or java.

So this is something on the proxy “backend” socket side of the zmq_proxy.

Also, I see quite similar behavior with a PUB - [XSUB+Proxy+XPUB] - SUB version.

What do I need to tweak on the proxy.c ?

Thanks in advance

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to