I want to draw your attention to a type of performance testing that we
haven't paid much attention in the past.
(The goals for #1 may include this, but just want to clarify if it's the case).
In the past we have focused a lot on performance tests that are
somewhat artificial and meaningless in real world use cases. These
type of testing would give us a false sense of security, until things
blow up in production.
Ex creating a message and then sending/receiving it repeatedly. These
gloss over message creation overheads, memory issues ..etc .Also
caching and other optimizations etc may makes the numbers look more
impressive than they really are.
Same goes for stability testing.
We should look at some select *real world use cases* and see how
proton could handle them. We could try to setup identical deployments
(or at least something that closely resembles) and see if we could
meet the requirements associated with those use cases.
This type of tests could give us good feedback about whether the
product is "production ready".
These tests are not easy to automate and I don't' think they need to
be. They are the kind of tests that would need to be run before every
P.S Artificial tests also have a place. They are good at identifying
regressions and could be easily automated.
Both types of tests are important.
On Mon, Feb 4, 2013 at 11:36 AM, Ken Giusti <kgiu...@redhat.com> wrote:
> I like what I'm hearing - here are some objectives based on what has been
> proposed so far:
> 1) Messenger-based scale and soak tests.
> These would be 'black-box' type tests that would mimic simple deployment
> scenarios, using clients based on messenger. The goal would be to stress
> messenger's features under scale over time, e.g: # of connections, links,
> window sizes, header count, etc.
> I think we should leverage Hiram's amqp benchmarking project here - it
> appears to offer all the high-level test setup and control that we'd need for
> these types of tests. I'm assuming we'd need to develop the messenger-based
> clients for sending/receiving traffic. In the near term we could run the
> benchmarks against the existing QPID test clients and QPID broker, leveraging
> Gordon's 1.0 integration work. But that would forgo messenger test coverage.
> 2) Static component performance tests.
> These would be the self-contained performance tests as described previously
> by Rafi. Each test would exercise one specific aspect of
> proton/messenger/driver, limiting the impact of any non-relevant factors.
> Each test would provide a "operations-per-time-unit" metric that we could
> 3) Client language inter-opt tests.
> These tests would guarantee that the type encodings work across
> implementation languages. We'd need to develop a simple message creator and
> consumer in each supported language binding. The test would run all
> combinations of creator vs consumer and verify that types encoded in one
> language can be decoded in another (as best as can be done given a target
> language's idiosyncrasies).
> I think #3 is probably the 'low-hanging' fruit of the three - at least it's
> bounded by the number of available bindings and supported types.
> #2 is a bit more open-ended, and would require some duplication of effort
> assuming separate tests for each implementation language.
> I'd need to spend some time getting familiar with the benchmarking project,
> but it seems like it would make writing tests for #1 a lot easier.
> ----- Original Message -----
>> On Thu, Jan 31, 2013 at 9:41 AM, Ken Giusti <kgiu...@redhat.com>
>> > Hi Folks,
>> > I'd like to solicit some ideas regarding $SUBJECT.
>> > I'm thinking we could take an approach similar to what is done on
>> > the C++
>> > broker tests now. That is we should develop a set of "native" send
>> > and
>> > receive programs that can be used to profile various performance
>> > characteristics (msgs/sec with varying size, header content
>> > encode/decode
>> > etc). By "native" I mean implementations in Java and C.
>> > I've hacked our C "send" and "recv" examples to provide a rough
>> > swag a
>> > measuring msgs/sec performance. I use these to double check that
>> > any
>> > changes I make to the proton C codebase do not have an unexpected
>> > impact on
>> > performance. This really belongs somewhere in our source tree, but
>> > for now
>> > you can grab the source here:
>> > https://github.com/kgiusti/proton-tools.git
>> > We do something similar for the QPID broker - simple native clients
>> > (qpid-send, qpid-receive) that do the performance sensitive message
>> > generation/consumption. We've written python scripts that drive
>> > these
>> > clients for various test cases.
>> > If we follow that approach, not only could we create a canned set
>> > of basic
>> > benchmarks that we could distribute, but we could also build
>> > inter-opt
>> > tests by running one native client against the other. E.g. C sender
>> > vs Java
>> > receiver. That could be a useful addition to the current "unit"
>> > test
>> > framework - I don't believe we do any canned interopt testing yet.
>> > Thoughts?
>> This is a good start at performance measurements for messenger,
>> however I
>> think it's too indirect when it comes to measuring engine
>> performance. An
>> end-to-end measure like this is going to be significantly influenced
>> both the driver and aspects of the messenger implementation. This
>> could be
>> a problem because people directly embedding the engine might not be
>> the driver and might be using the engine differently from messenger.
>> I think it would be good to include some performance metrics that
>> the various components of proton. For example having a metric that
>> repeatedly encodes/decodes a message would be quite useful in
>> isolating the
>> message implementation. Setting up two engines in memory and using
>> them to
>> blast zero sized messages back and forth as fast as possible would
>> tell us
>> how much protocol overhead the engine is adding. Using the codec
>> to encode/decode data would also be a useful measure. Each of these
>> probably want to have multiple profiles, different message content,
>> different acknowledgement/flow control patterns, and different kinds
>> I think breaking out the different dimensions of the implementation
>> above would provide a very useful tool to run before/after any
>> sensitive changes to detect and isolate regressions, or to test