Re: [zeromq-dev] Using libsdp and zeromq
Being clever in zeromq and unsetting HAVE_SOCK_CLOEXEC will not help, as the zeromq server will crash sooner or later when exiting a client. Nothing will crash. It will leak socket if you will run exec call or fork then exec. Which is usually avoidable in zeromq apps. To remove memory leak you can use one of the techniques, described here: http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor Dear Paul, Thanks for the quick answer! Unfortunately, the zeromq server does crash, with a thrown error: Invalid argument (stream_engine.cpp:323). Aborted (core dumped) or with this error: Assertion failed: !more (lb.cpp:95) Aborted (core dumped) I use a simple push server, and a pull client. What I did is commenting out //#define ZMQ_HAVE_SOCK_CLOEXEC 1 in platform.hpp, compile, start the server and the client, and kill the client (with ctrl c). There is no fork or exec in my code. Best regards, Michael ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
As far as I see you haven't included your test methodology or your test code. Without any information about your test I can't have any opinion on your results. Maybe I missed an earlier email where you included information about your test environment and methodology? Brian On Wed, Aug 29, 2012 at 11:13 AM, Julie Anderson julie.anderson...@gmail.com wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: - ZeroMQ: message size: 13 [B] roundtrip count: 10 average latency: 19.620 [us] == ONE-WAY LATENCY - Java NIO Selector: (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: 15.340 [us] == RTT LATENCY Conclusion: That's 39.240 versus 15.340 so ZeroMQ overhead on top of TCP is 156% or 23.900 nanoseconds !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] High throughput Zero MQ messaging pattern.
Hi All, I am looking for a messaging pattern for the following scenario. I have a Java NIO based server X, which has some threads processing client requests. These threads receive events asynchronously. Now, I want to send some of the events to another service(another server) Y in asynchronous fashion. Please suggest me a scalable messaging pattern for the above scenario. -- With Best Regards, Girish ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.htmlthat shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841 . My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.git The results: *- ZeroMQ:* ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: *19.674* [us] * this is one-way* *- Java NIO:* (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: *16552.15 nanos* | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: *16551 nanos* One-way trip: Iterations: 1,110,000 | Avg Time: *8100.12 nanos* | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: *8099 nanos* *Conclusions:* That's *19.674 versus 8.100* so ZeroMQ overhead on top of TCP is *142%* or *11.574 nanoseconds* !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.com wrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Julie, it is a little exasperating that you keep posting these numbers (and related questions) but, to date, have not shown the CODE used to get them. It is not possible to give a meaningful answer to your questions without looking at the EXACT code you are using. Furthermore, it would be very useful to be able to RUN the same code in one's machine, to ascertain whether the behavior is the same as you are reporting, and maybe fix something in 0MQ. Best regards, -- Gonzalo Diethelm DCV Chile From: zeromq-dev-boun...@lists.zeromq.org [mailto:zeromq-dev-boun...@lists.zeromq.org] On Behalf Of Julie Anderson Sent: Wednesday, August 29, 2012 1:19 PM To: ZeroMQ development list Subject: Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements) New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.html that shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841. My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.githttp://github.com/zeromq/libzmq.git The results: - ZeroMQ: ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1:http://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: 19.674 [us] this is one-way - Java NIO: (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: 16552.15 nanos | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: 16551 nanos One-way trip: Iterations: 1,110,000 | Avg Time: 8100.12 nanos | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: 8099 nanos Conclusions: That's 19.674 versus 8.100 so ZeroMQ overhead on top of TCP is 142% or 11.574 nanoseconds !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.commailto:li...@chuckremes.com wrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: - ZeroMQ: message size: 13 [B] roundtrip count: 10 average latency: 19.620 [us] == ONE-WAY LATENCY - Java NIO Selector: (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: 15.340 [us] == RTT LATENCY Conclusion: That's 39.240 versus 15.340 so ZeroMQ overhead on top of TCP is 156% or 23.900 nanoseconds !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.orgmailto:zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev - DeclaraciĆ³n de confidencialidad: Este Mensaje esta destinado para el uso de la o las personas o entidades a quien ha sido dirigido y puede
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Here are the UDP numbers for whom it may concern. As one would expect much better than TCP. RTT: (round-trip time) Iterations: 1,000,000 | Avg Time: *10373.9 nanos* | Min Time: 8626 nanos | Max Time: 136269 nanos | 75%: 10186 nanos | 90%: 10253 nanos | 99%: 10327 nanos | 99.999%: 10372 nanos OWT: (one-way time) Iterations: 2,221,118 | Avg Time: *5095.66 nanos* | Min Time: 4220 nanos | Max Time: 135584 nanos | 75%: 5001 nanos | 90%: 5037 nanos | 99%: 5071 nanos | 99.999%: 5094 nanos -Julie On Wed, Aug 29, 2012 at 12:18 PM, Julie Anderson julie.anderson...@gmail.com wrote: New numbers (fun!). Firstly, to make sure I was comparing apples with apples, I modified my tests to compute one-way trip instead of round-trip. I can't paste code, but I am simply using a Java NIO (non-blocking I/O) optimized with busy spinning to send and receive tcp data. This is *standard* Java NIO code, nothing too fancy. You can google around for Java NIO. I found this linkhttp://www.cordinc.com/blog/2010/08/java-nio-server-example.htmlthat shows the basics. You can also do the same thing in C as you can see herehttp://stackoverflow.com/questions/27247/could-you-recommend-some-guides-about-epoll-on-linux/6150841#6150841 . My test now consists of: - JVM A sends a message which consist of the ascii representation of a timestamp in nanos. - JVM B receives this message, parses the long, computer the one-way latency and echoes back the message to JVM A. - JVM A receives the echo, parses the ascii long and makes sure that it matches the one it sent out. - Loop back and send the next message. So now I have both times: one-way and round-trip. I ran my test for 1 million messages over loopback. For ZeroMQ I am using the local_lat and remote_lat programs included with latest zeromq from here: git://github.com/zeromq/libzmq.git The results: *- ZeroMQ:* ./local_lat tcp://lo: 13 100 ./remote_lat tcp://127.0.0.1: 13 100 message size: 13 [B] roundtrip count: 100 average latency: *19.674* [us] * this is one-way* *- Java NIO:* (EPoll with busy spinning) Round-trip: Iterations: 1,000,000 | Avg Time: *16552.15 nanos* | Min Time: 12515 nanos | Max Time: 129816 nanos | 75%: 16290 nanos | 90%: 16369 nanos | 99%: 16489 nanos | 99.999%: *16551 nanos* One-way trip: Iterations: 1,110,000 | Avg Time: *8100.12 nanos* | Min Time: 6150 nanos | Max Time: 118035 nanos | 75%: 7966 nanos | 90%: 8010 nanos | 99%: 8060 nanos | 99.999%: *8099 nanos* *Conclusions:* That's *19.674 versus 8.100* so ZeroMQ overhead on top of TCP is *142%* or *11.574 nanoseconds* !!! That's excessive. I would expect 1 microsecond overhead there. So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 11 microseconds are just too much? My rough guess: ZeroMQ uses threads? (the beauty of NIO is that it is single-threaded, so there is always only one thread reading and writing to the network) -Julie On Wed, Aug 29, 2012 at 10:24 AM, Chuck Remes li...@chuckremes.comwrote: On Aug 29, 2012, at 10:13 AM, Julie Anderson wrote: Just tested ZeroMQ and Java NIO in the same machine. The results: * - ZeroMQ:* message size: 13 [B] roundtrip count: 10 average latency: *19.620* [us] *== ONE-WAY LATENCY* *- Java NIO Selector:* (EPoll) Average RTT (round-trip time) latency of a 13-byte message: 15.342 [us] Min Time: 11.664 [us] 99.999% percentile: *15.340* [us] *== RTT LATENCY* *Conclusion:* That's *39.240 versus 15.340* so ZeroMQ overhead on top of TCP is *156%* or *23.900 nanoseconds* !!! That's excessive. I would expect 1 or 2 microseconds there. So my questions are: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) 2) Do people agree that 23 microseconds are just too much? As a favor to me, please rerun the tests so that at least 1 million (10 million is better) messages are sent. This shouldn't take more than a few minutes to run. Thanks. Secondly, are you using the local_lat and remote_lat programs that are included with zeromq or did you write your own? If you wrote your own, please share the code. Thirdly, a pastie containing the code for both tests so others could independently reproduce your results would be very handy. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] zmq3 'make check' fails on rt-preempt kernel - test_pair_tcp hangs
On Tue, Aug 28, 2012 at 2:16 PM, Ian Barber ian.bar...@gmail.com wrote: Ah, I fixed a similar issue in master the other day, may well be the same thing. I'll check and send a pull req when I get home. Ian That's all merged in now by the way, so give it another go. Ian ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] cannot open snapshot.zero.mq
On Wed, Aug 29, 2012 at 2:20 AM, lzqsmst lzqs...@qq.com wrote: what's wrong the site snapshot.zero.mq, i want to get the 0mq php dll for windows ,please~ The machine it's on has died - Mikko is looking into it, but the server seems to have become rather unhappy. Ian ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. Basically ZeroMQ has different use-case then a simple IO event loop. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.comwrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Aug 29, 2012, at 4:46 PM, Julie Anderson wrote: See my comments below: And mine too. On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.com wrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. What numbers? Only you have produced them so far. We have been quite patient with you. It appears you have some experience, so I'm confused as to why you refuse to provide any code for the rest of us to run to duplicate your results. If the roles were reversed I am certain you would want to run it yourself. If you want our help, don't tell us to google some code to run. If it's really that easy then provide a link and make sure that your numbers are coming from the exact same code. Until someone else can independently verify your numbers then everything you have written is just smoke. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. I would be surprised too. Zeromq doesn't solve the same problem as Disruptor. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. Don't be insulting. It doesn't help you or inspire anyone here to look into your claims. cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: On Wed, Aug 29, 2012 at 5:28 PM, Chuck Remes li...@chuckremes.com wrote: On Aug 29, 2012, at 4:46 PM, Julie Anderson wrote: See my comments below: And mine too. On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.com wrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. What numbers? Only you have produced them so far. We have been quite patient with you. It appears you have some experience, so I'm confused as to why you refuse to provide any code for the rest of us to run to duplicate your results. If the roles were reversed I am certain you would want to run it yourself. If you want our help, don't tell us to google some code to run. If it's really that easy then provide a link and make sure that your numbers are coming from the exact same code. Until someone else can independently verify your numbers then everything you have written is just smoke. I understand your frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. :) I will try to come up with a simple version to do the same thing. I should not be hard to do a ping-pong in Java. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. I would be surprised too. Zeromq doesn't solve the same problem as Disruptor. Disruptor solves inter-thread communication without synchronization latency (light blocking using memory barriers). So if you have two threads and need them to talk to each other as fast as possible you would use disruptor. That's what I thought the colleague was addressing here: *Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance.* Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. Don't be insulting. It doesn't help you or inspire anyone here to look into your claims. Insulting? I really think I was not insulting anyone or anything, but if you got that impression please accept my sincere apologies. Nothing is perfect. I am just trying to understand ZeroMQ approach and its overhead on top of the raw network latency. Maybe a single-threaded ZeroMQ implementation for the future using non-blocking I/O? cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: See my comments below: On Wed, Aug 29, 2012 at 4:06 PM, Robert G. Jakabosky bo...@sharedrealm.comwrote: On Wednesday 29, Julie Anderson wrote: So questions remain: 1) What does ZeroMQ do under the rood that justifies so many extra clock cycles? (I am really curious to know) ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. That's my humble opinion and the numbers support it. If low-latency is the most important thing for your application, then use a custom protocol highly tuned network code. ZeroMQ is not a low-level networking library it provide some high-level features that are not available with raw sockets. If you are planing on doing high-frequency trading, then you will need to write your own networking code (or FPGA logic) to squeeze out every last micro/nanosecond. ZeroMQ is not going to be the right solution to every use- case. 2) Do people agree that 11 microseconds are just too much? No. A simple IO event loop using epoll is fine for a IO (network) bound application, but if you need to do complex work (cpu bound) mixed with non- blocking IO, then ZeroMQ can make it easy to scale. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. Not all Financial application care only about latency. For some system it is important to scale out to very large number of subscribers and large volume of messages. When comparing ZeroMQ to raw network IO for one connection, ZeroMQ will have more latency overhead. Try your test with many thousands of connections with subscriptions to lots of different topics, then ZeroMQ will start to come out ahead. Also try comparing the latency of Java NIO using TCP/UDP against ZeroMQ using the inproc transport using two threads in the same JVM instance. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. ZeroMQ's inproc transport can be used in an event loop along side the TCP and IPC transports. With ZeroMQ you can mix-and-match transports as needed. If you can do all that with custom code with lower latency, then do it. ZeroMQ is for people who don't have the experience to do that kind of thread-safe programming, or just want to scale out there application. With ZeroMQ it is easy to do thread to thread, process to process, and/or server to server communication all at the same time using the same interface. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. ZeroMQ is not adding latency for no reason. If you think that the latency can be eliminated, then go ahead and change the core code to not use IO threads. Basically ZeroMQ has different use-case then a simple IO event loop. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. ZeroMQ is competing with other Message-oriented middleware, like RabbitMQ, SwiftMQ, JMS, or other Message queuing systems. These systems are popular with financial institutions. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Julie Anderson wrote: Nothing is perfect. I am just trying to understand ZeroMQ approach and its overhead on top of the raw network latency. Maybe a single-threaded ZeroMQ implementation for the future using non-blocking I/O? You might be interested in xsnano [1] which is an experimental project to try different threading models (should be possible to support single-thread model). I am not sure how far along it is. 1. https://github.com/sustrik/xsnano -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On 29 August 2012 17:46, Julie Anderson julie.anderson...@gmail.com wrote: ZeroMQ is using background IO threads to do the sending/receiving. So the extra latency is do to passing the messages between the application thread and the IO thread. This kind of thread architecture sucks for latency sensitive applications. That's why non-blocking I/O exists. Non-blocking IO is a legacy from early structured programming, it just so happens it fits well into a pro-actor model for IOCP performance. Many people prefer to develop code using the reactor model and 0mq fits well there. Totally agree, but that has nothing to do with a financial application. Financial applications do not need to do complex CPU bound analysis like a image processing application would need. Financial application only cares about LATENCY and network I/O. That's not a financial application, that's an automated trading application. 99.9% of financial applications do not remotely care about latency, your sweeping generalisation just wiped out domains such as hedge fund analytics and news. What is the problem with inproc? Just use a method call in the same JVM or shared memory for different JVMs. If you want inter-thread communication there are blazing-fast solutions in Java for that too. For example, I would be surprised if ZeroMQ can come close to Disruptor for inter-thread communication. This is relying on pro-actor model and cores to waste. We had a discussion of Disruptor on the mailing list in the past, its quite nice but doesn't remotely scale up for complicated applications like 0mq can. This generic API is cool, but it is solving a problem financial systems do not have and creating a bigger problem by adding latency. Financial systems have a huge problem with integration, look at the billion dollar messaging industry of TIBCO, IBM and Informatica. I thought ZeroMQ flagship customers were financial institutions. Then maybe I was wrong. I think its HPC, although I'm certainly using it for financial institutions for applications when latency and network IO are surprisingly irrelevant but memory IO, flexibility, simplicity and low learning curve are. -- Steve-o ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wed, Aug 29, 2012 at 3:45 PM, Julie Anderson julie.anderson...@gmail.com wrote: I understand your frustration. It's abundantly clear to me that whatever expertise you have on the absolute fastest and most trivial way to send data between two programs, you do not understand Chuck's frustration. I don't put the code here because I don't want to, but because I am legally unable to. If you have a boss or employer you can understand that. We ask the same understanding of you. We are all busy and have limited time, part of which we invest in working on a free library and providing support for free to people like you. If you don't want us to help you, then leave us alone. If you want to help us, then follow the rules. Please demonstrate your test methodology that will allow us to reproduce your performance claims. -Michel ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] ipc sockets and multiple connections
If the ipc transport is used on unix, can I have one bind and multiple connects, similar to how I would with the tcp transport? For some reason I have this idea that unix shared pipes can only be 1 to 1, but I am not totally sure on that. Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] ipc sockets and multiple connections
On Aug 29, 2012, at 6:52 PM, Justin Karneges wrote: If the ipc transport is used on unix, can I have one bind and multiple connects, similar to how I would with the tcp transport? For some reason I have this idea that unix shared pipes can only be 1 to 1, but I am not totally sure on that. Try it, it just works. So, you shouldn't equate ipc with pipes because the two are not one and the same. Pipes don't really exist on Windows yet the ipc transport is still supported there. (Granted, the ipc on Windows is emulated with tcp sockets, but that just reinforces the idea that ipc is not pipes.) cr ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Not sure I want to step into the middle of this, but here we go. I'd be really hesitant to base any evaluation of ZMQ's suitability for a highly scalable low latency application on local_lat/remote_lat. They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
See my comments below: They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. That's the point of NIO, which attracted me from the very beginning. Of course it is not the solution for everything but for fast clients you can SERIALIZE them inside a SINGLE thread so all reads and writes from all of them are non-blocking and thread-safe. This is not just faster (debatable, but specifically for fast and not too many clients) but most importantly easier to code, optimize and keep it clean/bug free. Anyone who has done serious multithreading programming in C or even Java knows it is not easy to get it right and context-switches + blocking is the root of all latency and bugs. As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. Very awesome!!! Are 18 micros the round-trip time or one-way time? Are you waiting to send the next packet ONLY after you get the ack from the previous one sent? Sorry but C looks like japanese to me. :))) -Julie ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Stuart Brandt wrote: Not sure I want to step into the middle of this, but here we go. I'd be really hesitant to base any evaluation of ZMQ's suitability for a highly scalable low latency application on local_lat/remote_lat. They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. local_lat/remote_lat both have two threads (one for the application and one for IO). So each request message goes from: 1. local_lat to IO thread 2. IO thread send to tcp socket - network stack. 3. recv from tcp socket in remote_lat's IO thread 4. from IO thread to remote_lat 5. remote_lat back to IO thread 6. IO thread send to tcp socket - network stack. 7. recv from tcp socket in local_lat's IO thread 8. IO thread to local_lat. So each message has to pass between threads 4 times (1,4,5,8) and go across the tcp socket 2 times (2-3, 6-7). I think it would be interesting to see how latency is effected when there are many clients sending requests to a server (with one or more worker threads). With ZeroMQ it is very easy to create a server with one or many worker threads and handle many thousands of clients. Doing the same without ZeroMQ is possible, but requires writing a lot more code. But then writing it yourself will allow you to optimize it to your needs (latency vs throughput). As a purely academic discussion, though, I've uploaded raw C socket versions of a client and server that can be used to mimic local_lat and remote_lat -- at least for TCP sockets. On my MacBook, I get ~18 microseconds per 40 byte packet across a test of 100 packets on local loopback. This is indeed about half of what I get with local_lat/remote_lat on tcp://127.0.0.1. http://pastebin.com/4SSKbAgx (echoloopcli.c) http://pastebin.com/rkc6itTg (echoloopsrv.c) There's probably some amount of slop/unfairness in there since I cut a lot of corners, so if folks want to pursue the comparison further, I'm more than willing to bring it closer to apples-to-apples. echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
2) Do people agree that 11 microseconds are just too much? Nope once you go cross machine that 11 micro seconds become irrelevant . The fastest exchange im aware of for frequent trading is 80 micro seconds (+ transport costs) best case , so who are you talking to and if your not doing frequent trading than mili seconds are fine, The rest of your system and algorithms are far more crucial so IMHO your wasting time in the wrong place. For example you can use ZeroMQ to build an asynch pub sub solution that can do market scanning in parallel from different machines a lot faster than if did all the tcp/ip yourself. ZeroMQ uses a different system for messages of less than 30 bytes eg they are copied.. Im also unaware of any messages so small in the financial industry . Crossmachine will add the TCP/IP header which some transports optomize out on the same machine, unless your looking at only at the IPC case I would re run your tests with 100M 64 and 256 byte messages cross machine . As far as interprocess communication goes there are better ways , ( eg writing direct to the destination semi polled lockless buffer using 256/512 bit SIMD non temporal writes would blow away anything java can do ) but they are all dedicated solutions and dont play nicely with other messages coming from the IP stack and that is the challenge for communication frameworks . if you keep reinventing the wheel with custom solutions sure you can get better results but at what cost , will you finish ..and obviously tuning your higher level algorithms gets better results than the low level stuff. Once you whole system with business logic is sub mili second and that is not enough than I would revisit the lower level transport. Lastly building a low latency message system on Java is dangerous .Java creates messages very quickly but if they are not disposed quickly eg under peak load or some receivers are slower than you get a big permanent memory pool than you are in trouble - you wont see this in micro benches. I had one complete system that worked great and fast and than had huge GC pauses and were talking almost seconds here , pretty much defeating any gains. So unless you manage the memory yourself ( eg a byte array and serialise it so the GC is not aware of it ) you are better of using a system to store the messages outside of javas knowledge and C++ / ZeroMQ is a good for that. Ben ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Thu, Aug 30, 2012 at 10:35 AM, Julie Anderson julie.anderson...@gmail.com wrote: See my comments below: They appear to be single threaded synchronous tests which seems very unlike the kinds of applications being discussed (esp. if you're using NIO). More realistic is a network connection getting slammed with lots of concurrent sends and recvswhich is where lots of mistakes can be made if you roll your own. That's the point of NIO, which attracted me from the very beginning. Of course it is not the solution for everything but for fast clients you can SERIALIZE them inside a SINGLE thread so all reads and writes from all of them are non-blocking and thread-safe. This is not just faster (debatable, but specifically for fast and not too many clients) but most importantly easier to code, optimize and keep it clean/bug free. Anyone who has done serious multithreading programming in C or even Java knows it is not easy to get it right and context-switches + blocking is the root of all latency and bugs. If you cant handle state that is true.. but if each thread has its own state or caches state than you don't have to deal with it . if you want scalable high performance you want asynch , many threads. Putting them all on one thread will eventually give you grief either capacity or you will hit some slow work and then forced to off load stuff on threads and that's where the bugs come as it was not your design . Good async designs which minimize state have few bugs , web servers are a great example . re context-switches + blocking is the root of all latency and bugs. .Context switches are less of an issue these days due to processor improvements and especially when you have more cores than active threads. Use your cores , in 4 years you will have 100 thread standard servers and your using one . Blocking is an issue but putting it one thread is IMHO more risky for a complex / high performance app . ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
Inline On 8/29/2012 10:37 PM, Robert G. Jakabosky wrote: echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. Echoloopcli does a synchronous send, then a synchronous recv , then does it all again. Echoloopsrv does a synchronous recv, then a synchronous send, then does it all again. I stuck a while loop around the send call because it isn't guaranteed to complete with all bytes of my 40 byte packet having been sent. But since my send queue never maxes out, the 'while' around send is overkill -- I get exactly 100 sends interleaved with 100 recvs. On 8/29/2012 10:35 PM, Julie Anderson wrote: Very awesome!!! Are 18 micros the round-trip time or one-way time? Are you waiting to send the next packet ONLY after you get the ack from the previous one sent? Sorry but C looks like japanese to me. :))) Round-trip. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] zmq3 'make check' fails on rt-preempt kernel - test_pair_tcp hangs
Am 29.08.2012 um 22:54 schrieb Ian Barber: On Tue, Aug 28, 2012 at 2:16 PM, Ian Barber ian.bar...@gmail.com wrote: Ah, I fixed a similar issue in master the other day, may well be the same thing. I'll check and send a pull req when I get home. Ian That's all merged in now by the way, so give it another go. thanks - I did (commit f79e1f8) 'make check' now runs the test_pair_tcp test fine However, later I get: test_shutdown_stress running... /bin/bash: line 5: 23543 Segmentation fault ${dir}$tst FAIL: test_shutdown_stress if I change test_shutdown_stress to start the program under gdb like so #exec $progdir/$program ${1+$@} gdb $progdir/$program ${1+$@} it seems to run ok: lots of [Thread 0xa6de7b70 (LWP 26506) exited] [New Thread 0xa65e6b70 (LWP 26507)] ... but then Program exited normally. looks like a bit of a Heisenbug - or a race condition - to me - Michael Ian ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] Too much ZeroMQ overhead versus plain TCP Java NIO Epoll (with measurements)
On Wednesday 29, Stuart Brandt wrote: Inline On 8/29/2012 10:37 PM, Robert G. Jakabosky wrote: echoloop*.c is testing throughput not latency, since it sends all messages at once instead of sending one message and waiting for it to return before sending the next message. Try comparing it with local_thr/remote_thr. Echoloopcli does a synchronous send, then a synchronous recv , then does it all again. Echoloopsrv does a synchronous recv, then a synchronous send, then does it all again. I stuck a while loop around the send call because it isn't guaranteed to complete with all bytes of my 40 byte packet having been sent. But since my send queue never maxes out, the 'while' around send is overkill -- I get exactly 100 sends interleaved with 100 recvs. ah, sorry I over looked the outer loop. So it is doing request/response, instead of bulk send/recv like I had though. -- Robert G. Jakabosky ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev