Re: proton-c event test stable and fast for 5 billion messages
On 11/24/2014 04:13 PM, Alan Conway wrote: On Thu, 2014-11-20 at 14:10 -0500, Michael Goulish wrote: I recently finished switching over my proton-c programs psend precv to the new event-based interface, and my first test of them was a 5 billion message soak test. The programs survived this test with no memory growth, and no gradual slowdown. This test is meant to find the fastest possible speed of the proton-c code itself. (In future, we could make other similar tests designed to mimic realistic user scenarios.) In this test, I run both sender and receiver on one box, with the loopback interface. I have MTU == 64K, I use a credit scheme of 600 initial credits, and 300 new credits whenever credit falls below 300. The messages are small: exactly 100 bytes long. I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144 KB cache. (Letting the OS decide which processors to use for my two processes.) On that system, with the above credit scheme, the test is sustaining throughput of 408,500 messages per second . That's over a single link, between two singly-threaded processes. That is an excellent result. It sets the context for doing performance work on proton-based systems (which is nearly everything we do at this point) At that rate, proton certainly doesn't sound like its the bottleneck for any of the stuff I've been looking at, but I'd be interested in seeing results for a range of larger message sizes. [...] First thing I would suggest is adding command line parameters for connection info, message size, credit etc. etc. Simple send/recieve programs like this, when parameterized flexibly, are *extremely* useful building blocks for a huge range of performance experiments. At present the code just 'sends' the same chunk of raw memory allocated at the start. On the receiver side the data is never actually read by the application. This is certainly useful to isolate the performance different layers. However assessing the impact of sending and receiving real messages is also important. The rates you are seeing are indeed very impressive. The next step on this track is to figure out where performance is lost in more realistic and richer scenarios, and ways to reduce that.
Re: proton-c event test stable and fast for 5 billion messages
On 11/24/2014 06:54 PM, Fraser Adams wrote: That said with talk of new APIs I think that we should have a reasonably clear roadmap, we've already got qpid::messaging and messenger, two separate AMQP 1.0 JMS clients not to mention potential confusion on the native python versus the python qpid::messaging binding (and don't get me started on QMF - three separate APIs depending on the language :'( ) I don't think we've done a great job clearing up confusion arounf the differing APIs that we have. I agree. I've been working on some examples of using the engine API in python, along with some utility code to simplify the common cases and remove boiler plate. Though I mentioned this work on this list at the start, any conversation since has been on the user list (cc:ed, sorry for cross-posting) as it is of more general interest (so again I'd urge everyone who hasn't already to subscribe to that!). The work has been done on the 'examples' branch in subversion and now git, and I'm gearing up to submitting a patch for inclusion on trunk very soon. It was inspired by Ken's work on pyngus and Rafi's examples on github and has also benefited greatly from very helpful feedback from Alan, Justin and Ted. The engine API is not new, though it has been recently enhanced by the addition of events. It has suffered from being hard (or tedious) to use though. The 'engine' on its own is not sufficient either and requires some IO mechanism; this is a strength in cases where an existing IO framework is in use, but a gap that needs to be plugged for the general use case. The qpid messaging API does provide a simpler API than the previous engine API (the intricacies of the address syntax perhaps excepted!), but it does so at the expense of more limited control/capabilities. It also has some weaknesses when you want to build more genuinely asynchronous, reactive programs. To me, the messenger api is similarly limiting and is in reality not that simple either. It also falls short for more completely reactive programs. For the examples I've been working on, rather than hiding the engine behind the 'hard shell' of a new API, I've tried to supplement the existing API with some additional (optional) utilities (to which more can be added, allowing evolution to cover broader cases). The aim being to simplify the common cases while retaining the full flexibility of the underlying engine where needed. Certainly for python, this is the approach I would choose myself and therefore the one I'd recommend for first consideration. I don't consider it a new API as such - though clearly there are some new exposed classes. I do think that it has the potential to provide a less confusing face to programming with AMQP. Feedback of all kinds is welcomed as always, via any channel, though I'd like to conduct as much of the discussion as possible on the user list, as I believe this is of general interest to the whole community.
Re: proton-c event test stable and fast for 5 billion messages
On Thu, 2014-11-20 at 14:10 -0500, Michael Goulish wrote: I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144 KB cache. (Letting the OS decide which processors to use for my two processes.) On that system, with the above credit scheme, the test is sustaining throughput of 408,500 messages per second . That's over a single link, between two singly-threaded processes. This is significantly faster than my previous, non-event-based code, and I find the code*much* easier to understand. Out of curiosity on the same hardware how does that test perform relative to the messenger based soak tests msgr-send/msgr-recv and how about if you tweaked msgr-send/msgr-recv to use non-blocking and passive mode. I'm curious about where the messenger bottlenecks might be? FWIW I definitely think there's mileage in event based operation - I'm also pretty interested in the best way to have things scale across loads of cores too, I think that's one worry I have with qpidd and the traditional clients. Do we know when lock contention starts to limit throughput? Given initiatives in ActiveMQ Apollo for more aync. lock-free operation (I think it uses hawt-dispatch, but I'm no expert) I suspect that now is a good time to think about how qpid based systems might scale across loads of cores. That said with talk of new APIs I think that we should have a reasonably clear roadmap, we've already got qpid::messaging and messenger, two separate AMQP 1.0 JMS clients not to mention potential confusion on the native python versus the python qpid::messaging binding (and don't get me started on QMF - three separate APIs depending on the language :'( ) I don't think we've done a great job clearing up confusion arounf the differing APIs that we have. I could have predicted a change brewing, 'cause I've finally (just about) got my head around Messenger :-D Frase
proton-c event test stable and fast for 5 billion messages
I recently finished switching over my proton-c programs psend precv to the new event-based interface, and my first test of them was a 5 billion message soak test. The programs survived this test with no memory growth, and no gradual slowdown. This test is meant to find the fastest possible speed of the proton-c code itself. (In future, we could make other similar tests designed to mimic realistic user scenarios.) In this test, I run both sender and receiver on one box, with the loopback interface. I have MTU == 64K, I use a credit scheme of 600 initial credits, and 300 new credits whenever credit falls below 300. The messages are small: exactly 100 bytes long. I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144 KB cache. (Letting the OS decide which processors to use for my two processes.) On that system, with the above credit scheme, the test is sustaining throughput of 408,500 messages per second . That's over a single link, between two singly-threaded processes. This is significantly faster than my previous, non-event-based code, and I find the code *much* easier to understand. This may still not be the maximum possible speed on my box. It looks like the limiting factor will be the receiver, and right now it is using only 74% of its CPU -- so if we could get it to use 100% we *might* see a performance gain to the neighborhood of 550,000 messages per second. But I have not been able to get closer to 100% just by fooling with the credit scheme. Hmm. If you'd like to take a look, the code is here: https://github.com/mick-goulish/proton_c_clients.git