Hi.
I've been discussing this with Cleber and Lucas (in private just
because I'm their manager at Red Hat) and decided to open this to
the general audience of autotest, in the hope that we'll get more
ideas and refine the brainstorm:
We have an internal testgrid that runs some of the virt-tests:
but for each tests that passes, we've been getting at least 2
notifications of failures. The absolute majority of them is due
to infrastructure problems (network, some repository is offline,
disk is full, NFS failure, Cobbler failure, job aborted, etc).
This is a historical problem in autotest and we want a clean
solution to solve it for good, without kludges or ugly hacks.
So I propose we think outside of the box: what would be the ideal
solution to this problem, without the limitations imposed by the
current autotest architecture or backwards compatibility?
Once we have this ideal solution as a goal, we start thinking of
what needs to be sacrificed because of the autotest architecture,
not the other way around.
Naturally, the solution can be implemented in phases.
Here is my proposal, at the requirements level:
Definitions:
- Testgrid: the infrastructure used to run tests. It's composed
of test runner(s), scheduler(s), RPC server(s), database(s),
infrastructure for provisioning, etc.
- Autotest user: submits jobs to the testgrid and/or monitors the
output of the jobs run;
- Testgrid admin: responsible for the maintenance of the
testgrid, fixing the infrastructure and the services that it
depends on;
Requirements (as user stories)
- As an autotest user, I want to be able to declare
requirements for my test to be run. For example, I may need a
specific package installed, or a specific service to be
online. Besides, the test runner should automatically find
out some requirements based on the test code I write. For
example: if I use a method exposed by autotest that has a
dependency on a particular service or package, the test
runner should automatically consider it a requirement of my
test as well.
- As an autotest user, I want two primary kinds of
notifications sent to me over e-mail: either the test run and
passed, or the test run and failed (note: the test did run).
Receiving a notification of a test failure should be like an
alarm: it means there's something broken with *my code* and
needs immediate attention. False positives should be a very
rare exception. Test jobs that failed due to broken
infrastructure or broken services should be kept in a queue
for a (long) period of time until the infrastructure gets
fixed. After that period, they should be aborted,
potentially sending me a notification e-mail.
- As an autotest user, I want the e-mail that notifies me of
the job status to be consistent and clear about what went
wrong. It should include links to more detailed information,
log snippets, version of the components run, failure rate,
etc. I don't want e-mails with missing fields or inconsistent
reports.
- As an autotest user I should be able to query the testgrid
queue, my job status and the testgrid status via some sort of
webservice API. A dashboard and rpc-client using this
API would be great.
- As a testgrid admin, I want to be notified if a service is
broken or offline. I want to have scripts or tests that
monitor these services and pause the testgrid if something is
wrong, putting the test jobs on hold.
- As a testgrid admin I want to tell for how long the testgrid
was offline due to broken infrastructure and list which
services went down and when, to have a general idea of what's
broken and needs to be fixed in the long term.
- As a testgrid admin, I want to be able to select which tests
should run on which platform/hardware/OS. For example, I want
to blacklist some tests (or variants) from a particular
machine in the virtlab, or from a particular version of the
OS.
Comments?
Thanks.
- Ademar
--
Ademar de Souza Reis Jr.
Red Hat
^[:wq!
_______________________________________________
Virt-test-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/virt-test-devel