On Sat, Aug 17, 2013 at 9:46 AM, Robert Collins <[email protected]>wrote:
> On 17 August 2013 23:49, Salvatore Orlando <[email protected]> wrote: > > I tend to agree that when the gate for a project is broken, nothing > should > > be merged for that project until the gate jobs are green again. > > In the case of Neutron, making the job non voting only caused more bugs > to > > slip through, and that meant more works for the developer themselves, and > > more headaches for developers of other projects relying on it. > > > > > When dealing with intermittent failures, like the bug which probably > started > > the issues we've been witnessing in the past 3 weeks, I think it might a > > sensible idea to make the job non-voting only for projects which surely > > can't be the cause of the gate failure; or perhaps skip the offending > test > > only. > > > > This means however asymettrical gating, and from Monty's post it seems > > there's something quite wrong with it. However, due to my lack of > expertise > > on the subject, I am unable to see the issue with it. > > > > Salvatore > > The asymmetry we should fear is when project A can land something > something which will break project B. In this case the proposal is to > say 'B is broken already, permit A to land things without remorse > until B is unbroken'. > > The problem is, if A makes the breakage of B worse, B ends up in > catchup mode, which is most unfun. > > Concretely, take heat for A and neutron for B. Tempest d-g jobs start > failing in neutron, so they are made skips. Now heat could make > neutron tests in tempest worse, and we won't know - or if we do know, > they'll still land. > > Previous discussion here has endorsed 'revert problematic commits, > it's not blame on the developer, just do it', so I'm not going to > mention that. > > What I will suggest we do is start running some number - lets say 20 - > of midnight state jobs, all identical. Ignoring datetime sensitive > tests, which are fortunately rare, this should identify tests that > fail 5% of the time, independent of incoming commits. We can use this > to generate a baseline reference for which tests fail intermittently > in trunk, and when something breaks intermittently outside of that > set, we can be pretty *sure* it's in the last days commits. > +1, although we already have a manual vaguely similar version of this ( http://status.openstack.org/rechecks/) > > Secondly, in principle it should be straight forward to do this for > any point in time, so when a new problem shows it's head, we can start > a bisection up programmatically - independent of the dev analysis - to > find where it was introduced. If we have resources we could even do > N-section rather than bisection. +1 > > Killing all intermittent issues test suites is /hard/, so I think we > need to have a belt-and-braces approach and engineer a rapid response > system to spikes in intermittent failures, in addition to working on > the failures themselves. > -Rob > -- > Robert Collins <[email protected]> > Distinguished Technologist > HP Converged Cloud > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
