Folks,

Regarding system level tests, including test for scale and soak:

The existing system tests for QPID are, naturally, tests of a
broker-centric model.  Since AMQP 1.0 allows for messaging models that
go beyond the simple broker-based approach, we need a system test bed
that can do the same.  This would include support for testing
peer-to-peer patterns, as well as testing distributed messaging
systems yet to be defined.

But rather than propose specific tests, I'd like to take a step back
and propose that we start by creating a proton-based traffic generator
tool.

I'd like to see us create something along the lines of
qpid-cpp-benchmark/qpid-send/qpid-receive, but derived from proton.

For those unfamiliar with qpid-cpp-benchmark, et al.  This is a tool
that can be used to generate message loads and various traffic
patterns for testing the broker.  It consists of a control program,
and a set of clients that generate/consume message flows and output
performance metrics.  The clients are created, configured and
controlled by the control program.  There are a ton of options that
can be used to configure the traffic patterns, including scaling the
number of clients during the test run.

Running a system test involves setting up the broker(s), then using
the tool to generate a traffic pattern against the broker under test.
The broker configuration and traffic flow characteristics are dictated
by the goals of the test.  At the end of the test, the clients report
their performance metrics back to the control program which presents
them as the results of the test.

We can evolve this approach to better match proton's vision of a
distributed messaging world.  This would require us to replace the
exising 0.10 based test clients with proton-based clients -
specifically based on messenger.  For extra coverage, we implement
these clients in all the languages supported by messenger.  These
clients would support an identical management interface to the control
program.  That way we could easily swap client implementations for any
given test.  Eg. run a test with Java message producers and PHP based
consumers, repeat with C consumers and Ruby producers, etc.

Unlike the 0.10 clients, the proton clients would support the option
of listening for incoming connections - taa daa - point-to-point cross
language system tests!

And, of course, it would be possible for the clients to be usable
stand-alone - without the control program.

This traffic generator wouldn't be involved in setting up the
system(s) under test, at least not in the short term.  This could
change as we develop our stable of proton-based products.  For now,
assume each test is setup/configured prior to running the traffic
generator. Ideally, the traffic generator would accept "scripts" that
would describe the traffic flow for a given test.

thoughts?

-K

----- Original Message -----
> > Personally I feel like #3 is really a bit of a different animal
> > from
> > the
> > others. It's a functional test rather than a performance test, and
> > I'm not
> > sure how possible/desirable it is to cover both with the same test
> > code.
> 
> Agreed - it isn't a perf/soak test.  But it is on my mental "glaring
> holes in our proton testing" list that often prevents me from
> getting a decent evening's sleep.  You've done a good job describing
> an approach to solving this particular issue - I've taken the
> liberty to capture it as a JIRA:
> 
> https://issues.apache.org/jira/browse/PROTON-215
> 
> 
> 
> 
> 
> -K
> 
> ----- Original Message -----
> > On Mon, Feb 4, 2013 at 11:36 AM, Ken Giusti <kgiu...@redhat.com>
> > wrote:
> > 
> > > I like what I'm hearing - here are some objectives based on what
> > > has been
> > > proposed so far:
> > >
> > >
> > > 1) Messenger-based scale and soak tests.
> > >
> > >    These would be 'black-box' type tests that would mimic simple
> > > deployment scenarios, using clients based on messenger.  The goal
> > > would be
> > > to stress messenger's features under scale over time, e.g: # of
> > > connections, links, window sizes, header count, etc.
> > >
> > > I think we should leverage Hiram's amqp benchmarking project here
> > > -
> > > it
> > > appears to offer all the high-level test setup and control that
> > > we'd need
> > > for these types of tests.  I'm assuming we'd need to develop the
> > > messenger-based clients for sending/receiving traffic.  In the
> > > near
> > > term we
> > > could run the benchmarks against the existing QPID test clients
> > > and
> > > QPID
> > > broker, leveraging Gordon's 1.0 integration work.  But that would
> > > forgo
> > > messenger test coverage.
> > >
> > > 2) Static component performance tests.
> > >
> > > These would be the self-contained performance tests as described
> > > previously by Rafi.  Each test would exercise one specific aspect
> > > of
> > > proton/messenger/driver, limiting the impact of any non-relevant
> > > factors.
> > >  Each test would provide a "operations-per-time-unit" metric that
> > >  we could
> > > track.
> > >
> > > 3) Client language inter-opt tests.
> > >
> > > These tests would guarantee that the type encodings work across
> > > implementation languages.  We'd need to develop a simple message
> > > creator
> > > and consumer in each supported language binding.  The test would
> > > run all
> > > combinations of creator vs consumer and verify that types encoded
> > > in one
> > > language can be decoded in another (as best as can be done given
> > > a
> > > target
> > > language's idiosyncrasies).
> > >
> > > Opinions?
> > >
> > >
> > > I think #3 is probably the 'low-hanging' fruit of the three - at
> > > least
> > > it's bounded by the number of available bindings and supported
> > > types.
> > >
> > > #2 is a bit more open-ended, and would require some duplication
> > > of
> > > effort
> > > assuming separate tests for each implementation language.
> > >
> > 
> > Why do you say this is open ended? Are you thinking of metrics
> > other
> > than
> > the 3 I suggested?
> > 
> > I think there are probably ways to address the duplication here.
> > For
> > example both the message and the codec metrics could load the test
> > data
> > from a file. This would limit the duplication to a very simple
> > driver
> > loop
> > and allow the actual test data to be shared. This would both
> > minimize
> > duplication and provide an easy way to parameterize the metrics.
> > 
> > I'd need to spend some time getting familiar with the benchmarking
> > project,
> > > but it seems like it would make writing tests for #1 a lot
> > > easier.
> > >
> > 
> > Personally I feel like #3 is really a bit of a different animal
> > from
> > the
> > others. It's a functional test rather than a performance test, and
> > I'm not
> > sure how possible/desirable it is to cover both with the same test
> > code.
> > Also, I don't think it's necessary or sufficient to test all
> > combinations
> > of clients across the wire. This doesn't actually verify that
> > certain
> > bindings don't have compensating bugs when speaking to each other,
> > something that is actually quite likely given the way type
> > information is
> > often lost when rendering into higher level languages. What's
> > important is
> > to test that each binding correctly renders to/from AMQP encoded
> > messages.
> > If we can do this rigorously then we know they will interoperate
> > with
> > each
> > other.
> > 
> > I believe we can achieve this by defining a specifically formatted
> > message
> > that includes every single AMQP data type. Each binding should then
> > load a
> > number of alternative AMQP-encoded representations of this message
> > from a
> > file and check using assertions written in the host language of the
> > binding
> > that each property/object/etc is correctly rendered. We should also
> > define
> > tests for each binding that authors that specifically formatted
> > message
> > using that binding and verifies that the encoded representation is
> > included
> > as one of the alternatives. I think this scheme provides the same
> > coverage
> > as over-the-wire N-way interop tests (when it comes to data-type
> > coverage
> > at least), however it is more complete as we can include
> > representations of
> > this message as generated by non-proton based clients, and by
> > virtue
> > of not
> > running over the wire it would be simpler/easier to run as part of
> > the
> > standard test suite, which is where this kind of test really should
> > live
> > rather than being part of a performance suite that takes longer and
> > gets
> > run less frequently.
> > 
> > For priorities I'd personally vote #3 (as I've described it) as
> > highest
> > priority regardless of low-hangingness as it is really filling in
> > missing
> > functional test coverage for each binding.
> > 
> > --Rafael
> > 
> 

Reply via email to