Re: proton-c event test stable and fast for 5 billion messages

2014-11-25 Thread Gordon Sim

On 11/24/2014 04:13 PM, Alan Conway wrote:

On Thu, 2014-11-20 at 14:10 -0500, Michael Goulish wrote:

I recently finished switching over my proton-c programs psend  precv
to the new event-based interface, and my first test of them was a
5 billion message soak test.

The programs survived this test with no memory growth, and no gradual
slowdown.

This test is meant to find the fastest possible speed of the proton-c
code itself. (In future, we could make other similar tests designed
to mimic realistic user scenarios.) In this test, I run both sender
and receiver on one box, with the loopback interface. I have MTU ==
64K, I use a credit scheme of 600 initial credits, and 300 new credits
whenever credit falls below 300. The messages are small: exactly 100
bytes long.

I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144
KB cache. (Letting the OS decide which processors to use for my two
processes.)

On that system, with the above credit scheme, the test is sustaining
throughput of 408,500 messages per second . That's over a single link,
between two singly-threaded processes.



That is an excellent result. It sets the context for doing performance
work on proton-based systems (which is nearly everything we do at this
point) At that rate, proton certainly doesn't sound like its the
bottleneck for any of the stuff I've been looking at, but  I'd be
interested in seeing results for a range of larger message sizes.


[...]


First thing I would suggest is adding command line parameters for
connection info, message size, credit etc. etc. Simple send/recieve
programs like this, when parameterized flexibly, are *extremely* useful
building blocks for a huge range of performance experiments.


At present the code just 'sends' the same chunk of raw memory allocated 
at the start. On the receiver side the data is never actually read by 
the application. This is certainly useful to isolate the performance 
different layers. However assessing the impact of sending and receiving 
real messages is also important.


The rates you are seeing are indeed very impressive. The next step on 
this track is to figure out where performance is lost in more realistic 
and richer scenarios, and ways to reduce that.




Re: proton-c event test stable and fast for 5 billion messages

2014-11-25 Thread Gordon Sim

On 11/24/2014 06:54 PM, Fraser Adams wrote:

That said with talk of new APIs I think that we should have a reasonably
clear roadmap, we've already got qpid::messaging and messenger, two
separate AMQP 1.0 JMS clients not to mention potential confusion on the
native python versus the python qpid::messaging binding (and don't get
me started on QMF - three separate APIs depending on the language :'( )

I don't think we've done a great job clearing up confusion arounf the
differing APIs that we have.


I agree.

I've been working on some examples of using the engine API in python, 
along with some utility code to simplify the common cases and remove 
boiler plate. Though I mentioned this work on this list at the start, 
any conversation since has been on the user list (cc:ed, sorry for 
cross-posting) as it is of more general interest (so again I'd urge 
everyone who hasn't already to subscribe to that!). The work has been 
done on the 'examples' branch in subversion and now git, and I'm gearing 
up to submitting a patch for inclusion on trunk very soon. It was 
inspired by Ken's work on pyngus and Rafi's examples on github and has 
also benefited greatly from very helpful feedback from Alan, Justin and Ted.


The engine API is not new, though it has been recently enhanced by the 
addition of events. It has suffered from being hard (or tedious) to use 
though. The 'engine' on its own is not sufficient either and requires 
some IO mechanism; this is a strength in cases where an existing IO 
framework is in use, but a gap that needs to be plugged for the general 
use case.


The qpid messaging API does provide a simpler API than the previous 
engine API (the intricacies of the address syntax perhaps excepted!), 
but it does so at the expense of more limited control/capabilities. It 
also has some weaknesses when you want to build more genuinely 
asynchronous, reactive programs.


To me, the messenger api is similarly limiting and is in reality not 
that simple either. It also falls short for more completely reactive 
programs.


For the examples I've been working on, rather than hiding the engine 
behind the 'hard shell' of a new API, I've tried to supplement the 
existing API with some additional (optional) utilities (to which more 
can be added, allowing evolution to cover broader cases). The aim being 
to simplify the common cases while retaining the full flexibility of the 
underlying engine where needed.


Certainly for python, this is the approach I would choose myself and 
therefore the one I'd recommend for first consideration. I don't 
consider it a new API as such - though clearly there are some new 
exposed classes. I do think that it has the potential to provide a less 
confusing face to programming with AMQP.


Feedback of all kinds is welcomed as always, via any channel, though I'd 
like to conduct as much of the discussion as possible on the user list, 
as I believe this is of general interest to the whole community.


Re: proton-c event test stable and fast for 5 billion messages

2014-11-24 Thread Fraser Adams



On Thu, 2014-11-20 at 14:10 -0500, Michael Goulish wrote:

I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144
KB cache. (Letting the OS decide which processors to use for my two
processes.)

On that system, with the above credit scheme, the test is sustaining
throughput of 408,500 messages per second . That's over a single link,
between two singly-threaded processes.



This is significantly faster than my previous, non-event-based code,
and I find the code*much*  easier to understand.



Out of curiosity on the same hardware how does that test perform 
relative to the messenger based soak tests msgr-send/msgr-recv and how 
about if you tweaked msgr-send/msgr-recv to use non-blocking and passive 
mode. I'm curious about where the messenger bottlenecks might be?



FWIW I definitely think there's mileage in event based operation - I'm 
also pretty interested in the best way to have things scale across loads 
of cores too, I think that's one worry I have with qpidd and the 
traditional clients. Do we know when lock contention starts to limit 
throughput? Given initiatives in ActiveMQ Apollo for more aync. 
lock-free operation (I think it uses hawt-dispatch, but I'm no expert) I 
suspect that now is a good time to think about how qpid based systems 
might scale across loads of cores.


That said with talk of new APIs I think that we should have a reasonably 
clear roadmap, we've already got qpid::messaging and messenger, two 
separate AMQP 1.0 JMS clients not to mention potential confusion on the 
native python versus the python qpid::messaging binding (and don't get 
me started on QMF - three separate APIs depending on the language :'( )


I don't think we've done a great job clearing up confusion arounf the 
differing APIs that we have.


I could have predicted a change brewing, 'cause I've finally (just 
about) got my head around Messenger :-D


Frase


proton-c event test stable and fast for 5 billion messages

2014-11-20 Thread Michael Goulish

I recently finished switching over my proton-c programs psend  precv 
to the new event-based interface, and my first test of them was a 
5 billion message soak test. 

The programs survived this test with no memory growth, and no gradual 
slowdown. 

This test is meant to find the fastest possible speed of the proton-c 
code itself. (In future, we could make other similar tests designed 
to mimic realistic user scenarios.) In this test, I run both sender 
and receiver on one box, with the loopback interface. I have MTU == 
64K, I use a credit scheme of 600 initial credits, and 300 new credits 
whenever credit falls below 300. The messages are small: exactly 100 
bytes long. 

I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144 
KB cache. (Letting the OS decide which processors to use for my two 
processes.) 

On that system, with the above credit scheme, the test is sustaining 
throughput of 408,500 messages per second . That's over a single link, 
between two singly-threaded processes. 

This is significantly faster than my previous, non-event-based code, 
and I find the code *much* easier to understand. 

This may still not be the maximum possible speed on my box. It looks 
like the limiting factor will be the receiver, and right now it is 
using only 74% of its CPU -- so if we could get it to use 100% we *might* 
see a performance gain to the neighborhood of 550,000 messages per second. 
But I have not been able to get closer to 100% just by fooling with the 
credit scheme. Hmm. 



If you'd like to take a look, the code is here: 
https://github.com/mick-goulish/proton_c_clients.git