On 10/28/2014 02:15 PM, Aleksandra Fedorova wrote:
Vitaly,

though comments like this are definitely better than nothing, I think
we should address these issues in a more formal way.

For random failures we have to retrigger the build until it passes.
Yes, it could take some time (two-three rebuilds?), but it is the only
reliable way which shows that it is indeed random and hasn't suddenly
become permanent. If it fails three times in a row, this issue is
probably bigger than you think. Should we really ignore/postpone it
then?

And if it is really the known issue, we need to fix or disable this
particular test. And I think that this fix should be merged in the
repo via the general workflow.

It doesn't only make everything pass the CI properly, it also adds
this necessary step where you announce the issue publicly and it gets
approved as the "official" known issue. I would even add certain
keyword for the commit message to mark this temporary fixes to
simplify tracking.

Aleksandra,

You are 100% correct here. Under no circumstances should any human be able to merge code into a master source tree. Only the CI system, after a successful run of tests, should be able to merge code into master. If there are, as Vitaly says, issues with a nailgun test that cause random failures, then the test (or nailgun, whichever is the cause) should be fixed ASAP.

We deal with similar issues in the main OpenStack gate, and luckily there we don't allow humans to merge code directly into a branch. Only the CI system can do that, which means that although at times we get frustrated developers who must "do the recheck dance" a bit, there is a forcing function to have developers fix bugs in tests and server code that trigger false failures.

All the best, and keep up the good work.

-jay

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to