On Mon, Feb 4, 2013 at 11:36 AM, Ken Giusti <kgiu...@redhat.com> wrote:
> I like what I'm hearing - here are some objectives based on what has been
> proposed so far:
> 1) Messenger-based scale and soak tests.
> These would be 'black-box' type tests that would mimic simple
> deployment scenarios, using clients based on messenger. The goal would be
> to stress messenger's features under scale over time, e.g: # of
> connections, links, window sizes, header count, etc.
> I think we should leverage Hiram's amqp benchmarking project here - it
> appears to offer all the high-level test setup and control that we'd need
> for these types of tests. I'm assuming we'd need to develop the
> messenger-based clients for sending/receiving traffic. In the near term we
> could run the benchmarks against the existing QPID test clients and QPID
> broker, leveraging Gordon's 1.0 integration work. But that would forgo
> messenger test coverage.
> 2) Static component performance tests.
> These would be the self-contained performance tests as described
> previously by Rafi. Each test would exercise one specific aspect of
> proton/messenger/driver, limiting the impact of any non-relevant factors.
> Each test would provide a "operations-per-time-unit" metric that we could
> 3) Client language inter-opt tests.
> These tests would guarantee that the type encodings work across
> implementation languages. We'd need to develop a simple message creator
> and consumer in each supported language binding. The test would run all
> combinations of creator vs consumer and verify that types encoded in one
> language can be decoded in another (as best as can be done given a target
> language's idiosyncrasies).
> I think #3 is probably the 'low-hanging' fruit of the three - at least
> it's bounded by the number of available bindings and supported types.
> #2 is a bit more open-ended, and would require some duplication of effort
> assuming separate tests for each implementation language.
Why do you say this is open ended? Are you thinking of metrics other than
the 3 I suggested?
I think there are probably ways to address the duplication here. For
example both the message and the codec metrics could load the test data
from a file. This would limit the duplication to a very simple driver loop
and allow the actual test data to be shared. This would both minimize
duplication and provide an easy way to parameterize the metrics.
I'd need to spend some time getting familiar with the benchmarking project,
> but it seems like it would make writing tests for #1 a lot easier.
Personally I feel like #3 is really a bit of a different animal from the
others. It's a functional test rather than a performance test, and I'm not
sure how possible/desirable it is to cover both with the same test code.
Also, I don't think it's necessary or sufficient to test all combinations
of clients across the wire. This doesn't actually verify that certain
bindings don't have compensating bugs when speaking to each other,
something that is actually quite likely given the way type information is
often lost when rendering into higher level languages. What's important is
to test that each binding correctly renders to/from AMQP encoded messages.
If we can do this rigorously then we know they will interoperate with each
I believe we can achieve this by defining a specifically formatted message
that includes every single AMQP data type. Each binding should then load a
number of alternative AMQP-encoded representations of this message from a
file and check using assertions written in the host language of the binding
that each property/object/etc is correctly rendered. We should also define
tests for each binding that authors that specifically formatted message
using that binding and verifies that the encoded representation is included
as one of the alternatives. I think this scheme provides the same coverage
as over-the-wire N-way interop tests (when it comes to data-type coverage
at least), however it is more complete as we can include representations of
this message as generated by non-proton based clients, and by virtue of not
running over the wire it would be simpler/easier to run as part of the
standard test suite, which is where this kind of test really should live
rather than being part of a performance suite that takes longer and gets
run less frequently.
For priorities I'd personally vote #3 (as I've described it) as highest
priority regardless of low-hangingness as it is really filling in missing
functional test coverage for each binding.