|
Page Created :
qpid :
Distributed Testing
Distributed Testing has been created by Rupert Smith (Aug 15, 2007). Content:Testing Proposal.Use Cases.The following usage scenarios are covered by this test framework design proposal: Performance Testing.
Want to be able to distribute performance tests accross many machines in parallel, in order to more accurately simulate real usage scenarios and to be able to fully stress test the broker under load. For example:
System Testing.
Configurable framework, capable of exercising every imaginable combination of options, both in-vm broker and standalone, accross one client/test circuit up to many clients/test circuits in parallel. Build tests out of a standardized construction block.
Publisher/Receiver pair. The standard consruction block for a test, is a test circuit. This consists of a publisher, and a receiver. The publisher and receiver may reside on the same machine, or may be distributed. Will use a standard set of properties to define the desired circuit topology. Tests are always to be controlled from the publishing side only. The receiving end of the circuit is to be exposed to the test code through an interface, that abstracts as much as possible the receiving end of the test. The interface exposes a set of 'assertions' that may be applied to the receiving end of the test circuit. In the case where the receiving end of the circuit resides on the same JVM, the assertions will call the receivers code locally. Where the receiving end is distributed accross one or more machines, the assertions will be applied to a test report gethered from all of the receivers. Test code will be written to the assertions making as few assumptions as possible about the exact test topology. A test circuit defines a test topology, M producers, N consumers, Z outgoing routes between them. Probing for the available test topology.
When the test distribution framework starts up, it should broadcast an 'enlist' request on a known topic. All available nodes in the network to reply in order to make it known that they are available to carry out tests. For the requested test case, C test circuits are to be run in parallel. Each test defines its desired M by N topology for each circuit. The entire network may be available to run both roles, or the test case may have specified a limit on the number of publishing nodes and set the 'single_role' flag. If the number of publishing nodes exhausts the available network and the single role flag is on, then there are no nodes available to run the receiver roles, the test will fail with an error at this point. Suppose there are P nodes available to run the publisher roles, and R nodes available to run the receiver roles. The C test circuits will be divided up as evenly as possible amongst the P nodes. The C * N receivers will be divided up as evenly as possible amongst the R nodes. A more concrete example. There are 10 test machines available. Want to run a pub/sub test with 2 publishers, publishing to 50 topics, with 250 subscribers, measuring total throughput. The distribution framework probes to find the ten machines. The test parameters specify a concurrency level of 2 circuits, limited to 2 nodes, with the single role flag set, which leaves 8 nodes to play the receiver role. The test parameters specify each circuit as having 25 topics, unique to the circuit, and 125 receivers. The total of 250 receivers are distributed amongst the 8 available nodes, 31 each, except for two of them which get 32. The test specifies a duration of 10 minutes, sending messages 500 bytes in size using test batches of 10000 messages, as fast as possible. The distribution framework sends a start signal to each of the publishers. The publishers run for 10000 messages. The publishers request a report from each receiver on their cicruit. The receivers send back to the publishers a report on the number of messages received in the batch. The publishers assert that the correct number for the batch were indeed received, and log a time sample for the batch. This continues for 10 minutes. At the end of the 10 minutes, the publishers collate all of their timings, failures, errors into a log message. The distribution framework requests the test report from each publishing nodes, and these logs are combined together to produce a single log for the entire run. Some stats, such as total time taken, total messages through the system, total throughput are calculated and added as a summary to the log, along with a record of the requested and actual topology used to run the test.
Test Procedures.A variety of different tests can be written against a standard test circuit, many of these will follow a common pattern. One of the aims of using a common test circuit configured by a number of test parameters, is to be able to automate the generation of all possible test cases that can be produced from the circuit combined with the common testing pattern, and an outline of a procedure for doing this is described here. The typical test sequence is described below: A typical test sequence.
The thorough test procedure.The thorough test procedure uses the typical test sequence described above, but generates all of combinations of test parameters and corresponding assertions against the results. The all_combinations function produces all combinations of test parameters described in Appendix A. all_combinations : List<Properties> The expected_results function, produces a list of assertions, given a set of test parameters. For example, mandatory && no_route -> assertions.add(producer.assertMessageReturned), assertions.add(receiver.assertMessageNotReceived). expected_results: Properties -> List<Assertions> For parameters : all_combinations Send mesages. For assertion : exected_results(parameters) Appendix A - Test Parameters.
Total combinations over all test parameters: 4 * 48 * 16 * 3 = 9216 combinations. Defaults give an in-VM broker, 1:1 P2P topology, no tx, auto ack, no flags, publisher -> receiver route configured, no return route. Appendix B - Clock Synchronization Algorithm.On connection/initialization of the framework, synch clocks between all nodes in the available toplogy. For in vm tests, the clock delta and error will automatically be zero. For throughput measurements, the overall test times will be long enough that the error does not need to be particularly small. For latency measurements, want to get accurate clock synchronization. This should not be too hard to achieve over a quiet local network. After determining the list of clients available to conduct tests against, the Coordinator synchronizes the clocks of each in turn. The synchronization is done against one client at a time, at a fairly low messaging rate over the Qpid broker. If needed, a more accurate mechanism, using something like NTP over UDP could be used. Ensure the clock synchronization is captured by an interface, to allow better solutions to be added at a later date. Here is a simple algorithm to get started with:
The above algorithm includes broker latency, two network hops each way, plus possible effects of buffering/resends on the TCP protocol. A fairly easy improvement on it might be:
|
Unsubscribe or edit your notifications preferences
