Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
2012/8/29 Julie Anderson julie.anderson...@gmail.com: I understand your frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. :) I will try to come up with a simple version to do the same thing. I should not be hard to do a ping-pong in Java. I really don't believe you're not legally able to paste some code frankly. If there is some sensitive data in the tests you're doing delete it it and use dummy data. Do you think they are going to fire you because you wrote some code which has absolutely no relevance with your company in a mainling list? If not rewrite it when you're at home, I'm sure it's not that hard, complaining about something not being able to prove anything is not that good imho.. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
As I said in the text you quoted: I will try to come up with a simple version to do the same thing. But Stuart did that for me in C. My thanks to him. I am not complaining about anything... Just trying to understand why the extra latency is necessary. There are already some very good answers here about that. This extra latency by itself does not make ZeroMQ bad or slow. I think Robert was the one that addressed that very well. The minority of financial systems (hedge funds and exchanges) will care about 10 microseconds. On Thu, Aug 30, 2012 at 10:10 AM, andrea crotti andrea.crott...@gmail.comwrote: 2012/8/29 Julie Anderson julie.anderson...@gmail.com: I understand your frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. :) I will try to come up with a simple version to do the same thing. I should not be hard to do a ping-pong in Java. I really don't believe you're not legally able to paste some code frankly. If there is some sensitive data in the tests you're doing delete it it and use dummy data. Do you think they are going to fire you because you wrote some code which has absolutely no relevance with your company in a mainling list? If not rewrite it when you're at home, I'm sure it's not that hard, complaining about something not being able to prove anything is not that good imho.. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Thu, Aug 30, 2012 at 11:56 PM, Julie Anderson julie.anderson...@gmail.com wrote: As I said in the text you quoted: I will try to come up with a simple version to do the same thing. But Stuart did that for me in C. My thanks to him. I am not complaining about anything... Just trying to understand why the extra latency is necessary. There are already some very good answers here about that. This extra latency by itself does not make ZeroMQ bad or slow. I think Robert was the one that addressed that very well. The minority of financial syInbox (634)stems (hedge funds and exchanges) will care about 10 microseconds. Exchanges will , Hedge funds wont ( since they are dealing with at least 100 micro seconds from the exchange + the links and their business logic) . Anyway except for the exchange itself ( which doesn't deal with links in their quotes) I haven't seen a system that beats 1ms consistently ( though there are probably a hand full ) in a real life environment over real WAN links . 10 us is 1% of that. Unless you host at the exchange or have some special traffic shaped connection your also lucky to get 1ms through their routers and firewall. So get to 1 ms .. if you can , then worry about the micro seconds.. Ben ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Thu, Aug 30, 2012 at 12:13 AM, Julie Anderson julie.anderson...@gmail.com wrote: Just tested ZeroMQ and Java NIO in the same machine. You're comparing apples to a factory that can process apples into juice at the rate of millions a second. For that extra latency in 0MQ you get things like message batching, asynch i/o, routing patterns. The cost could be brought down (see Martin Sustrik's nano project, which brings it way down) by redesigning the 0MQ internals. Having said this, it's probably worth taking a profiler to 0MQ and seeing if the critical path can't be improved somewhat. -Pieter ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
As far as I see you haven't included your test methodology or your test code. Without any information about your test I can't have any opinion on your results. Maybe I missed an earlier email where you included information about your test environment and methodology? Brian On Wed, Aug 29, 2012 at 11:13 AM, Julie Anderson julie.anderson...@gmail.com wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: - ZeroMQ: message size: 13 [B] roundtrip count: 10 average latency: 19.620 [us] == ONE-WAY LATENCY - Java NIO Selector: (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: 15.340 [us] == RTT LATENCY Conclusion: That's 39.240 versus 15.340 so ZeroMQ overhead on top of TCP is 156% or 23.900 nanoseconds !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.htmlthat shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841 . My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.git The results: *- ZeroMQ:* ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: *19.674* [us] * this is one-way* *- Java NIO:* (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: *16552.15 nanos* | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: *16551 nanos* One-way trip: Iterations: 1,110,000 | Avg Time: *8100.12 nanos* | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: *8099 nanos* *Conclusions:* That's *19.674 versus 8.100* so ZeroMQ overhead on top of TCP is *142%* or *11.574 nanoseconds* !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.com wrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Julie, it is a little exasperating that you keep posting these numbers (and related questions) but, to date, have not shown the CODE used to get them. It is not possible to give a meaningful answer to your questions without looking at the EXACT code you are using. Furthermore, it would be very useful to be able to RUN the same code in one's machine, to ascertain whether the behavior is the same as you are reporting, and maybe fix something in 0MQ. Best regards, -- Gonzalo Diethelm DCV Chile From: zeromq-dev-boun...@lists.zeromq.org [mailto:zeromq-dev-boun...@lists.zeromq.org] On Behalf Of Julie Anderson Sent: Wednesday, August 29, 2012 1:19 PM To: ZeroMQ development list Subject: Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements) New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.html that shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841. My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.githttp://github.com/zeromq/libzmq.git The results: - ZeroMQ: ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1:http://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: 19.674 [us] this is one-way - Java NIO: (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: 16552.15 nanos | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: 16551 nanos One-way trip: Iterations: 1,110,000 | Avg Time: 8100.12 nanos | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: 8099 nanos Conclusions: That's 19.674 versus 8.100 so ZeroMQ overhead on top of TCP is 142% or 11.574 nanoseconds !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.commailto:li...@chuckremes.com wrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: - ZeroMQ: message size: 13 [B] roundtrip count: 10 average latency: 19.620 [us] == ONE-WAY LATENCY - Java NIO Selector: (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: 15.340 [us] == RTT LATENCY Conclusion: That's 39.240 versus 15.340 so ZeroMQ overhead on top of TCP is 156% or 23.900 nanoseconds !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.orgmailto:zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev - DeclaraciĆ³n de confidencialidad: Este Mensaje esta destinado para el uso de la o las personas o entidades a quien ha sido dirigido y puede
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Here are the UDP numbers for whom it may concern. As one would expect much better than TCP. RTT: (round-trip time) Iterations: 1,000,000 | Avg Time: *10373.9 nanos* | Min Time: 8626 nanos | Max Time: 136269 nanos | 75%: 10186 nanos | 90%: 10253 nanos | 99%: 10327 nanos | 99.999%: 10372 nanos OWT: (one-way time) Iterations: 2,221,118 | Avg Time: *5095.66 nanos* | Min Time: 4220 nanos | Max Time: 135584 nanos | 75%: 5001 nanos | 90%: 5037 nanos | 99%: 5071 nanos | 99.999%: 5094 nanos -Julie On Wed, Aug 29, 2012 at 12:18 PM, Julie Anderson julie.anderson...@gmail.com wrote: New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.htmlthat shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841 . My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.git The results: *- ZeroMQ:* ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: *19.674* [us] * this is one-way* *- Java NIO:* (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: *16552.15 nanos* | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: *16551 nanos* One-way trip: Iterations: 1,110,000 | Avg Time: *8100.12 nanos* | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: *8099 nanos* *Conclusions:* That's *19.674 versus 8.100* so ZeroMQ overhead on top of TCP is *142%* or *11.574 nanoseconds* !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.comwrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. Basically ZeroMQ has different use-case then a simple IO event loop. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.comwrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Aug 29, 2012, at 4:46 PM, Julie Anderson wrote: See my comments below: And mine too. On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.com wrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. What numbers? Only you have produced them so far. We have been quite patient with you. It appears you have some experience, so I'm confused as to why you refuse to provide any code for the rest of us to run to duplicate your results. If the roles were reversed I am certain you would want to run it yourself. If you want our help, don't tell us to google some code to run. If it's really that easy then provide a link and make sure that your numbers are coming from the exact same code. Until someone else can independently verify your numbers then everything you have written is just smoke. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. I would be surprised too. Zeromq doesn't solve the same problem as Disruptor. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. Don't be insulting. It doesn't help you or inspire anyone here to look into your claims. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: On Wed, Aug 29, 2012 at 5:28 PM, Chuck Remes li...@chuckremes.com wrote: On Aug 29, 2012, at 4:46 PM, Julie Anderson wrote: See my comments below: And mine too. On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.com wrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. What numbers? Only you have produced them so far. We have been quite patient with you. It appears you have some experience, so I'm confused as to why you refuse to provide any code for the rest of us to run to duplicate your results. If the roles were reversed I am certain you would want to run it yourself. If you want our help, don't tell us to google some code to run. If it's really that easy then provide a link and make sure that your numbers are coming from the exact same code. Until someone else can independently verify your numbers then everything you have written is just smoke. I understand your frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. :) I will try to come up with a simple version to do the same thing. I should not be hard to do a ping-pong in Java. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. I would be surprised too. Zeromq doesn't solve the same problem as Disruptor. Disruptor solves inter-thread communication without synchronization latency (light blocking using memory barriers). So if you have two threads and need them to talk to each other as fast as possible you would use disruptor. That's what I thought the colleague was addressing here: *Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance.* Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. Don't be insulting. It doesn't help you or inspire anyone here to look into your claims. Insulting? I really think I was not insulting anyone or anything, but if you got that impression please accept my sincere apologies. Nothing is perfect. I am just trying to understand ZeroMQ approach and its overhead on top of the raw network latency. Maybe a single-threaded ZeroMQ implementation for the future using non-blocking I/O? cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: See my comments below: On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.comwrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. If low-latency is the most important thing for your application, then use a custom protocol highly tuned network code. ZeroMQ is not a low-level networking library it provide some high-level features that are not available with raw sockets. If you are planing on doing high-frequency trading, then you will need to write your own networking code (or FPGA logic) to squeeze out every last micro/nanosecond. ZeroMQ is not going to be the right solution to every use- case. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. Not all Financial application care only about latency. For some system it is important to scale out to very large number of subscribers and large volume of messages. When comparing ZeroMQ to raw network IO for one connection, ZeroMQ will have more latency overhead. Try your test with many thousands of connections with subscriptions to lots of different topics, then ZeroMQ will start to come out ahead. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. ZeroMQ's inproc transport can be used in an event loop along side the TCP and IPC transports. With ZeroMQ you can mix-and-match transports as needed. If you can do all that with custom code with lower latency, then do it. ZeroMQ is for people who don't have the experience to do that kind of thread-safe programming, or just want to scale out there application. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. ZeroMQ is not adding latency for no reason. If you think that the latency can be eliminated, then go ahead and change the core code to not use IO threads. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. ZeroMQ is competing with other Message-oriented middleware, like RabbitMQ, SwiftMQ, JMS, or other Message queuing systems. These systems are popular with financial institutions. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: Nothing is perfect. I am just trying to understand ZeroMQ approach and its overhead on top of the raw network latency. Maybe a single-threaded ZeroMQ implementation for the future using non-blocking I/O? You might be interested in xsnano [1] which is an experimental project to try different threading models (should be possible to support single-thread model). I am not sure how far along it is. 1. https://github.com/sustrik/xsnano -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On 29 August 2012 17:46, Julie Anderson julie.anderson...@gmail.com wrote: ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. Non-blocking IO is a legacy from early structured programming, it just so happens it fits well into a pro-actor model for IOCP performance. Many people prefer to develop code using the reactor model and 0mq fits well there. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. That's not a financial application, that's an automated trading application. 99.9% of financial applications do not remotely care about latency, your sweeping generalisation just wiped out domains such as hedge fund analytics and news. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. This is relying on pro-actor model and cores to waste. We had a discussion of Disruptor on the mailing list in the past, its quite nice but doesn't remotely scale up for complicated applications like 0mq can. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. Financial systems have a huge problem with integration, look at the billion dollar messaging industry of TIBCO, IBM and Informatica. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. I think its HPC, although I'm certainly using it for financial institutions for applications when latency and network IO are surprisingly irrelevant but memory IO, flexibility, simplicity and low learning curve are. -- Steve-o ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wed, Aug 29, 2012 at 3:45 PM, Julie Anderson julie.anderson...@gmail.com wrote: I understand your frustration. It's abundantly clear to me that whatever expertise you have on the absolute fastest and most trivial way to send data between two programs, you do not understand Chuck's frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. We ask the same understanding of you. We are all busy and have limited time, part of which we invest in working on a free library and providing support for free to people like you. If you don't want us to help you, then leave us alone. If you want to help us, then follow the rules. Please demonstrate your test methodology that will allow us to reproduce your performance claims. -Michel ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Not sure I want to step into the middle of this, but here we go. I'd be really hesitant to base any evaluation of ZMQ's suitability for a highly scalable low latency application on local_lat/remote_lat. They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. That's the point of NIO, which attracted me from the very beginning. Of course it is not the solution for everything but for fast clients you can SERIALIZE them inside a SINGLE thread so all reads and writes from all of them are non-blocking and thread-safe. This is not just faster (debatable, but specifically for fast and not too many clients) but most importantly easier to code, optimize and keep it clean/bug free. Anyone who has done serious multithreading programming in C or even Java knows it is not easy to get it right and context-switches + blocking is the root of all latency and bugs. As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. Very awesome!!! Are 18 micros the round-trip time or one-way time? Are you waiting to send the next packet ONLY after you get the ack from the previous one sent? Sorry but C looks like japanese to me. :))) -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Stuart Brandt wrote: Not sure I want to step into the middle of this, but here we go. I'd be really hesitant to base any evaluation of ZMQ's suitability for a highly scalable low latency application on local_lat/remote_lat. They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. local_lat/remote_lat both have two threads (one for the application and one for IO). So each request message goes from: 1. local_lat to IO thread 2. IO thread send to tcp socket - network stack. 3. recv from tcp socket in remote_lat's IO thread 4. from IO thread to remote_lat 5. remote_lat back to IO thread 6. IO thread send to tcp socket - network stack. 7. recv from tcp socket in local_lat's IO thread 8. IO thread to local_lat. So each message has to pass between threads 4 times (1,4,5,8) and go across the tcp socket 2 times (2-3, 6-7). I think it would be interesting to see how latency is effected when there are many clients sending requests to a server (with one or more worker threads). With ZeroMQ it is very easy to create a server with one or many worker threads and handle many thousands of clients. Doing the same without ZeroMQ is possible, but requires writing a lot more code. But then writing it yourself will allow you to optimize it to your needs (latency vs throughput). As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
2) Do people agree that 11 microseconds are just too much? Nope once you go cross machine that 11 micro seconds become irrelevant . The fastest exchange im aware of for frequent trading is 80 micro seconds (+ transport costs) best case , so who are you talking to and if your not doing frequent trading than mili seconds are fine, The rest of your system and algorithms are far more crucial so IMHO your wasting time in the wrong place. For example you can use ZeroMQ to build an asynch pub sub solution that can do market scanning in parallel from different machines a lot faster than if did all the tcp/ip yourself. ZeroMQ uses a different system for messages of less than 30 bytes eg they are copied.. Im also unaware of any messages so small in the financial industry . Crossmachine will add the TCP/IP header which some transports optomize out on the same machine, unless your looking at only at the IPC case I would re run your tests with 100M 64 and 256 byte messages cross machine . As far as interprocess communication goes there are better ways , ( eg writing direct to the destination semi polled lockless buffer using 256/512 bit SIMD non temporal writes would blow away anything java can do ) but they are all dedicated solutions and dont play nicely with other messages coming from the IP stack and that is the challenge for communication frameworks . if you keep reinventing the wheel with custom solutions sure you can get better results but at what cost , will you finish ..and obviously tuning your higher level algorithms gets better results than the low level stuff. Once you whole system with business logic is sub mili second and that is not enough than I would revisit the lower level transport. Lastly building a low latency message system on Java is dangerous .Java creates messages very quickly but if they are not disposed quickly eg under peak load or some receivers are slower than you get a big permanent memory pool than you are in trouble - you wont see this in micro benches. I had one complete system that worked great and fast and than had huge GC pauses and were talking almost seconds here , pretty much defeating any gains. So unless you manage the memory yourself ( eg a byte array and serialise it so the GC is not aware of it ) you are better of using a system to store the messages outside of javas knowledge and C++ / ZeroMQ is a good for that. Ben ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Thu, Aug 30, 2012 at 10:35 AM, Julie Anderson julie.anderson...@gmail.com wrote: See my comments below: They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. That's the point of NIO, which attracted me from the very beginning. Of course it is not the solution for everything but for fast clients you can SERIALIZE them inside a SINGLE thread so all reads and writes from all of them are non-blocking and thread-safe. This is not just faster (debatable, but specifically for fast and not too many clients) but most importantly easier to code, optimize and keep it clean/bug free. Anyone who has done serious multithreading programming in C or even Java knows it is not easy to get it right and context-switches + blocking is the root of all latency and bugs. If you cant handle state that is true.. but if each thread has its own state or caches state than you don't have to deal with it . if you want scalable high performance you want asynch , many threads. Putting them all on one thread will eventually give you grief either capacity or you will hit some slow work and then forced to off load stuff on threads and that's where the bugs come as it was not your design . Good async designs which minimize state have few bugs , web servers are a great example . re context-switches + blocking is the root of all latency and bugs. .Context switches are less of an issue these days due to processor improvements and especially when you have more cores than active threads. Use your cores , in 4 years you will have 100 thread standard servers and your using one . Blocking is an issue but putting it one thread is IMHO more risky for a complex / high performance app . ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Inline On 8/29/2012 10:37 PM, Robert G. Jakabosky wrote: echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. Echoloopcli does a synchronous send, then a synchronous recv , then does it all again. Echoloopsrv does a synchronous recv, then a synchronous send, then does it all again. I stuck a while loop around the send call because it isn't guaranteed to complete with all bytes of my 40 byte packet having been sent. But since my send queue never maxes out, the 'while' around send is overkill -- I get exactly 100 sends interleaved with 100 recvs. On 8/29/2012 10:35 PM, Julie Anderson wrote: Very awesome!!! Are 18 micros the round-trip time or one-way time? Are you waiting to send the next packet ONLY after you get the ack from the previous one sent? Sorry but C looks like japanese to me. :))) Round-trip. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Stuart Brandt wrote: Inline On 8/29/2012 10:37 PM, Robert G. Jakabosky wrote: echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. Echoloopcli does a synchronous send, then a synchronous recv , then does it all again. Echoloopsrv does a synchronous recv, then a synchronous send, then does it all again. I stuck a while loop around the send call because it isn't guaranteed to complete with all bytes of my 40 byte packet having been sent. But since my send queue never maxes out, the 'while' around send is overkill -- I get exactly 100 sends interleaved with 100 recvs. ah, sorry I over looked the outer loop. So it is doing request/response, instead of bulk send/recv like I had though. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev