On 10/22/2014 06:07 AM, Thierry Carrez wrote:
Ihar Hrachyshka wrote:
For stable branches, we have so called periodic jobs that are
triggered once in a while against the current code in a stable branch,
and report to openstack-stable-maint@ mailing list. An example of
failing periodic job report can be found at [2]. I envision that
similar approach can be applied to test auxiliary features in gate. So
once something is broken in master, the interested parties behind the
auxiliary feature will be informed in due time.
The main issue with periodic jobs is that since they are non-blocking,
they can get ignored really easily. It takes a bit of organization and
process to get those failures addressed.

It's only recently (and a lot thanks to you) that failures in the
periodic jobs for stable branches are being taken into account quickly
and seriously. For years the failures just lingered until they blocked
someone's work enough for that person to go and fix them.

So while I think periodic jobs are a good way to increase corner case
testing coverage, I am skeptical of our collective ability to have the
discipline necessary for them not to become a pain. We'll need a strict
process around them: identified groups of people signed up to act on
failure, and failure stats so that we can remove jobs that don't get
enough attention.

While I share some of your skepticism, we have to find a way to make this work. Saying we are doing our best to ensure the quality of upstream OpenStack based on a single-tier of testing (the gate) that is limited to 40min runs is not plausible. Of course a lot more testing happens downstream but we can do better as a community. I think we should rephrase this subject as "non-gating" jobs. We could have various kinds of stress and longevity jobs running to good effect if we can solve this process problem.

Following on your process suggestion, in practice the most likely way this could actually work is to have a rotation of "build guardians" that agree to keep an eye on jobs for a short period of time. There would need to be a separate rotation list for each project that has non-gating, project-specific jobs. This will likely happen as we move towards deeper functional testing in projects. The qa team would be the logical pool for a rotation of more global jobs of the kind I think Ihar was referring to.

As for failure status, each of these non-gating jobs would have their own name so logstash could be used to debug failures. Do we already have anything that tracks failure rates of jobs?


OpenStack-dev mailing list

Reply via email to