Re: [openstack-dev] Migrating to testr parallel in tempest

Ben Nemec Wed, 14 Aug 2013 09:16:36 -0700

On 2013-08-13 16:39, Clark Boylan wrote:

On Tue, Aug 13, 2013 at 1:25 PM, Matthew Treinish<[email protected]> wrote:
Hi everyone,
So for the past month or so I've been working on getting tempest towork stablywith testr in parallel. As part of this you may have noticed thetestr-fulljobs that get run on the zuul check queue. I was using that job todebug someof the more obvious race conditions and stability issues with runningtempestin parallel. After a bunch of fixes to tempest and finding some realbugs in
some of the projects things seem to have smoothed out.
So I pushed the testr-full run to the gate queue earlier today. I'llbe keepingtrack of the success rate of this job vs the serial job and use thisas thedetermining factor before we push this live to be the default for alltempestruns. So assuming that the success rate matches up well enough withserial jobon the gate queue then I will push out the change that will migrateall thevoting jobs to run in parallel hopefully either Friday afternoon orearly nextweek. Also, if anyone has any input on what threshold they feel isgood enoughfor this I'd welcome any input on that. For example, do we want toensurea >= 1:1 match for job success? Or would something like 90% as stableas theserial job be good enough considering the speed advantage. (Theparallel runstake about half as much time as a full serial run, the parallel jobnormallyfinishes in ~25-30min) Since this affects almost every project I don'twant to
define this threshold without input from everyone.
After there is some more data for the gate queue's parallel job I'llhave somepretty graphite graphs that I can share comparing the success trendsbetween
the parallel and serial jobs.
So at this point we're in the home stretch and I'm asking foreveryone's helpin getting this merged. So, if everyone who is reviewing and pushingcommitscould watch the results from these non-voting jobs and if things failon theparallel job but not the serial job please investigate the failure andopen abug if necessary. If it turns out to be a bug in tempest please linkit against
this blueprint:

https://blueprints.launchpad.net/tempest/+spec/speed-up-tempest
so that I'll give it the attention it deserves. I'd hate to get thisclose togetting this merged and have a bit of racy code get merged at the lastsecond
and block us for another week or two.
I feel that we need to get this in before the H3 rush starts up as itwill help
everyone get through the extra review load faster.
Getting this in before the H3 rush would be very helpful. When we made
the switch with Nova's unittests we fixed as many of the test bugs
that we could find, merged the change to switch the test runner, then
treated all failures as very high priority bugs that received
immediate attention. Getting this in before H3 will give everyone a
little more time to debug any potential new issues exposed by Jenkins
or people running the tests locally.

I think we should be bold here and merge this as soon as we have good
numbers that indicate the trend is for these tests to pass. Graphite
can give us the pass to fail ratios over time, as long as these trends
are similar for both the old nosetest jobs and the new testr job I say
we go for it. (Disclaimer: most of the projecst I work on are not
affected by the tempest jobs; however, I am often called upon to help
sort out issues in the gate).

I'm inclined to agree. It's not as if we don't have transient failuresnow, and if we're looking at a 50% speedup in recheck/verify times thenas long as the new version isn't significantly less stable it should bea net improvement.

Of course, without hard numbers we're kind of discussing in a vacuumhere.


-Ben

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Migrating to testr parallel in tempest

Reply via email to