as requested I’ve created a ticket and updated the branch with the latest code 
and a perf/README.txt explaining how to run it (basically the instructions 
below)

https://github.com/zeromq/libzmq/issues/757


On Nov 10, 2013, at 13:08, Bruno D. Rodrigues <[email protected]> wrote:

> I’ve branched the code to add the proxy code for testing:
> https://github.com/davipt/libzmq/tree/fix-002-proxy_lat_thr
> 
> This now allows me:
> 
> 1. current PUSH/PULL end-to-end test:
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:5555 500 10000000 &
> local_thr bind-to=tcp://127.0.0.1:5555 message-size=500 
> message-count=10000000 type=0 check=0 connect=0
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:5555 500 10000000 &
> remote_thr connect-to=tcp://127.0.0.1:5555 message-size=500 
> message-count=10000000 type=0 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 1380100 [msg/s]
> mean throughput: 5520.400 [Mb/s]
> 
> 2. PUB/SUB end-to-end test:
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:5555 500 10000000 1 &
> local_thr bind-to=tcp://127.0.0.1:5555 message-size=500 
> message-count=10000000 type=1 check=0 connect=0
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:5555 500 10000000 1 &
> remote_thr connect-to=tcp://127.0.0.1:5555 message-size=500 
> message-count=10000000 type=1 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 971666 [msg/s]
> mean throughput: 3886.664 [Mb/s]
> 
> 3. same test via zmq_proxy, by switching local_lat from bind to connect:
> 
> idavi:perf bruno$ ./proxy tcp://*:8881 tcp://*:8882 &
> Proxy type=PULL|PUSH in=tcp://*:8881 out=tcp://*:8882
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 10000000 32 &
> local_thr bind-to=tcp://127.0.0.1:8882 message-size=500 message-count=100000 
> type=32 check=0 connect=32
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:8881 500 10000000 &
> remote_thr connect-to=tcp://127.0.0.1:8881 message-size=500 
> message-count=10000000 type=0 check=0
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 92974 [msg/s]
> mean throughput: 371.896 [Mb/s]
> 
> 4. same test via proxy and PUB/SUB, including checking if every message 
> arrives (*)
> 
> idavi:perf bruno$ ./proxy tcp://*:8881 tcp://*:8882 1 &
> Proxy type=XSUB|XPUB in=tcp://*:8881 out=tcp://*:8882
> 
> idavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 10000000 49 &
> local_thr bind-to=tcp://127.0.0.1:8882 message-size=500 
> message-count=10000000 type=49 check=16 connect=32
> 
> idavi:perf bruno$ ./remote_thr tcp://127.0.0.1:8881 500 10000000 17 &
> remote_thr connect-to=tcp://127.0.0.1:8881 message-size=500 
> message-count=10000000 type=17 check=16
> 
> message size: 500 [B]
> message count: 10000000
> mean throughput: 88721 [msg/s]
> mean throughput: 354.884 [Mb/s]
> 
> (*) if check is enabled on the remote_thr, the message, if size>16, will 
> contain a counter. On the local_thr it will then verify if the counter comes 
> at the expected order and without loosing any message. Hence why the 
> remote_thr needs to increase the HWM and sleep for one second in case of 
> PUB/SUB.
> 
> 
> So, then again, what is happening with the zmq_proxy?
>  
> 
> 
> 
> On Nov 7, 2013, at 22:15, Bruno D. Rodrigues <[email protected]> 
> wrote:
> 
>> I’ve been testing a lot of combinations of ZeroMQ over Java, between the 
>> pure jeromq base and the jzmq JNI libzmq C code. Albeit my impression so far 
>> is that jeromq is way faster than the binding - not that the code isn’t 
>> great, but my feeling so far is that the JNI jump slows everything down - at 
>> a certain point I felt the need for a simple zmq_proxy network node and I 
>> was pretty sure that the C code must be faster than the jeromq. I have some 
>> ideas that can improve the jeromq proxy code, but it felt easier to just 
>> compile the zmq_proxy code from the book.
>> 
>> Unfortunately something went completely wrong on my side so I need your help 
>> to understand what is happening here.
>> 
>> Context:
>> MacOSX Mavericks fully updated, MBPro i7 4x2 CPU 2.2Ghz 16GB
>> libzmq from git head
>> (same for jeromq and libzmq, albeit I’m using my own fork so I can send 
>> pulls back)
>> my data are json lines that goes from about 100 bytes to some multi MB 
>> exceptions, but the average of those million messages is about 500bytes.
>> 
>> Test 1: pure local_thr and remote_thr:
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8881 500 1000000 &
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> real 0m0.732s
>> user 0m0.516s
>> sys  0m0.394s
>> message size: 500 [B]
>> message count: 1000000
>> mean throughput: 1418029 [msg/s]
>> mean throughput: 5672.116 [Mb/s]
>> 
>> Test 2: change local_thr to perform connect instead of bind, and put a proxy 
>> in the middle.
>> The proxy is the first C code example from the book, available here 
>> https://gist.github.com/davipt/7361477
>> iDavi:c bruno$ gcc -o proxy proxy.c -I /usr/local/include/ -L 
>> /usr/local/lib/ -lzmq
>> iDavi:c bruno$ ./proxy tcp://*:8881 tcp://*:8882 1
>> Proxy type=PULL/PUSH in=tcp://*:8881 out=tcp://*:8882
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> iDavi:perf bruno$ message size: 500 [B]
>> message count: 1000000
>> mean throughput: 74764 [msg/s]
>> mean throughput: 299.056 [Mb/s]
>> 
>> real 0m10.358s
>> user 0m0.668s
>> sys  0m0.508s
>> 
>> 
>> Test3: use the jeromq equivalent of the proxy: 
>> https://gist.github.com/davipt/7361623
>> 
>> iDavi:perf bruno$ ./local_thr tcp://127.0.0.1:8882 500 1000000 &
>> [1] 15816
>> iDavi:perf bruno$ time ./remote_thr tcp://127.0.0.1:8881 500 1000000 &
>> [2] 15830
>> iDavi:perf bruno$ 
>> real 0m3.429s
>> user 0m0.654s
>> sys  0m0.509s
>> message size: 500 [B]
>> message count: 1000000
>> mean throughput: 293532 [msg/s]
>> mean throughput: 1174.128 [Mb/s]
>> 
>> This performance coming out of Java is okish, it’s here just for comparison, 
>> and I’ll spend some time looking at it.
>> 
>> The core question is the C proxy - why 10 times slower than the no-proxy 
>> version?
>> 
>> One thing I noticed, by coincidence, is that on the upper side of the proxy, 
>> both with the C “producer” as well as the java one, tcpdump shows me 
>> consistently packets of 16332 (or the MTU size if using ethernet, 1438 I 
>> think). This value is consistent for the 4 combinations of producers and 
>> proxies (jeromq vs c).
>> 
>> But on the other side of the proxy, the result is completely different. With 
>> the jeromq proxy, I see packets of 8192 bytes, but with the C code I see 
>> packets of either 509 or 1010. It feels like the proxy is sending the 
>> messages one by one. Again, this value is consistent with the PULL consumer 
>> after the proxy, being it C or java.
>> 
>> So this is something on the proxy “backend” socket side of the zmq_proxy.
>> 
>> Also, I see quite similar behavior with a PUB - [XSUB+Proxy+XPUB] - SUB 
>> version.
>> 
>> What do I need to tweak on the proxy.c ?
>> 
>> Thanks in advance
>> 
> 

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to