Re: [Virt-test-devel] [RFC] [Important] Result fidelity vs. Testing time

Lucas Meneghel Rodrigues Mon, 09 Dec 2013 17:15:13 -0800

On 12/05/2013 07:22 PM, Ademar de Souza Reis Jr. wrote:

On Wed, Dec 04, 2013 at 02:54:07PM -0500, Chris Evich wrote:

All,


Currently virt-tests attempts to detect when the current virtualization
environment can be recycled for use by the next test.  Measurements show
this optimization saves a significant amount of testing time.  However,
I believe the practical cost is significant additional maintenance
burdens, and (perhaps worse) greater-than-zero questionable test results.

On the maintenance-front, environment-cleanliness detection complexity
increases in proportion to additional hardware (and configuration)
support for both the harness and tests.  This leads to the harness
requiring a lot of "magic" (complicated and distributed logic) code to
support cleanliness detection.  I'm sure most seasoned developers here
have encountered failures in this area on more than a few occasions, and
have been exposed to pressure for complicated or messy fixes and
workarounds.

On the results-front, for all except the most simple tests, using the
default/automatic environment, PASS/FAIL trustworthiness is tied
directly to:

* Trust that the harness has managed expectations precisely for an
unknown number of proceeding tests.

* Assumption that the harness usually does the right thing over the
long-term.  Otherwise tests can force environment reset/re-build when it
is critical.

After a lengthy discussion with lmr on the topic, we are questioning the
practical benefits of the time-savings versus the maintenance cost and
importance of long-term result-trust and reproduce-ability.

I believe we can significantly increase result-trust and reduce
maintenance burden (including "magic" code), if the harness takes an
"environment is always dirty" stance.  In other words, take steps to
rebuild a known "pristine" environment between every test and
remove/reduce most of the cleanliness detection.  Placing more of the
setup burden on the tests, which are closer to the state-requirements
anyway.

However, we feel it's important to also get the community's input on
this topic.  Are most of you already using a combination of
'--restore-image-between-tests' and 'kill_vm = yes' anyway? Or do you
see large benefits from the harness doing cleanliness detection despite
the costs?  What is your opinion or feedback on this topic?


Hi there.

This is a very nice topic to discuss, I'm glad you brought it.

As a wise man once said, "(Early) optimization is the root of all
evils".  I hold the opinion that the *default* should be
'--restore-image-between-tests' and 'kill_vm = yes'.

Users who are concerned about testing time and know what they're
doing (or the risk they're taking) would enable the optimizations
in their test environments.

Do you have any numbers on the amount of time saved on a typical
run of virt-test with/without these optimizations? In other
words, what's the magnitude of the problem we're solving by
turning these optimizations on by default?

I've discovered a very brain damaged bug in the whole restart-vm logicthat was causing test failures. In short, restart_vm was rebooting thevm at the beginning of a test, when it makes sense to just shut down allthe vms at the end of the test. I've removed restart_vm and now theoption kill_vm = yes is set. Don't worry, the default forkill_vm_gracefully = yes, so the machines that can be shut down throughguest OS will be.

I've fixed that and now we can make a better comparison. Of course, dueto the fact I've measured the time only once I can't consider it'scientific', I can do more measurements later. But the first numbers are:


Current default (time optimized)

$ ./run -t qemu
SETUP: PASS (1.60 s)
DATA DIR: /home/lmr/virt_test
DEBUG LOG: /home/lmr/Code/virt-test.git/logs/run-2013-12-09-22.33.34/debug.log
TESTS: 10
(1/10) type_specific.migrate.default.tcp: PASS (36.95 s)
(2/10) type_specific.migrate.default.unix: PASS (28.04 s)
(3/10) type_specific.migrate.default.exec.default_exec: PASS (28.19 s)
(4/10) type_specific.migrate.default.exec.gzip_exec: PASS (30.07 s)
(5/10) type_specific.migrate.default.fd: PASS (28.09 s)
(6/10) type_specific.migrate.with_set_speed.tcp: PASS (24.15 s)
(7/10) type_specific.migrate.with_set_speed.unix: PASS (23.98 s)
(8/10) type_specific.migrate.with_set_speed.exec.default_exec: PASS (24.04 s)
(9/10) type_specific.migrate.with_set_speed.exec.gzip_exec: PASS (27.97 s)
(10/10) type_specific.migrate.with_set_speed.fd: PASS (23.97 s)
TOTAL TIME: 276.08 s (04:36)
TESTS PASSED: 10
TESTS FAILED: 0
SUCCESS RATE: 100.00 %


With restart-vm (no time optimizations)

$ ./run -t qemu --restart-vm
SETUP: PASS (1.50 s)
DATA DIR: /home/lmr/virt_test
DEBUG LOG: /home/lmr/Code/virt-test.git/logs/run-2013-12-09-22.24.43/debug.log
TESTS: 10
(1/10) type_specific.migrate.default.tcp: PASS (37.95 s)
(2/10) type_specific.migrate.default.unix: PASS (37.00 s)
(3/10) type_specific.migrate.default.exec.default_exec: PASS (37.59 s)
(4/10) type_specific.migrate.default.exec.gzip_exec: PASS (39.16 s)
(5/10) type_specific.migrate.default.fd: PASS (38.02 s)
(6/10) type_specific.migrate.with_set_speed.tcp: PASS (33.03 s)
(7/10) type_specific.migrate.with_set_speed.unix: PASS (92.96 s)
(8/10) type_specific.migrate.with_set_speed.exec.default_exec: PASS (32.68 s)
(9/10) type_specific.migrate.with_set_speed.exec.gzip_exec: PASS (39.67 s)
(10/10) type_specific.migrate.with_set_speed.fd: PASS (33.51 s)
TOTAL TIME: 421.69 s (07:01)
TESTS PASSED: 10
TESTS FAILED: 0
SUCCESS RATE: 100.00 %

So, on a typical virt-test workload, we save about 34 % of the timeneeded to run the suite.

I was talking to Lukas, and we think tests such as virtio_console couldbe much more impacted from this. Last time I ran those tests, theyalready take up to 3:30 hours.


Let me come up with more data so we can better think about this.




_______________________________________________
Virt-test-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/virt-test-devel

Re: [Virt-test-devel] [RFC] [Important] Result fidelity vs. Testing time

Reply via email to