Re: [openstack-dev] [gate] Automatic elastic rechecks

2014-07-24 Thread Jeremy Stanley
On 2014-07-18 15:09:34 +0100 (+0100), Daniel P. Berrange wrote:
[...]
 If there were multiple failures and only some were identified, it would
 be reasonable to *not* automatically recheck.
[...]

Another major blocker is that we often add signatures for failures
which occur 100% of the time, and while those tend to get fixed a
bit faster than 1% failures, automatic rechecks would mean that for
some period while we're investigating the gate would just be
spinning over and over running jobs which had no chance of passing.

I suppose it could be argued that elastic-recheck needs a
categorization mechanism so that it also won't recommend rechecking
for those sorts of scenarios (all discussion of automated rechecks
aside).
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] Automatic elastic rechecks

2014-07-24 Thread Daniel P. Berrange
On Thu, Jul 24, 2014 at 04:31:05PM +, Jeremy Stanley wrote:
 On 2014-07-18 15:09:34 +0100 (+0100), Daniel P. Berrange wrote:
 [...]
  If there were multiple failures and only some were identified, it would
  be reasonable to *not* automatically recheck.
 [...]
 
 Another major blocker is that we often add signatures for failures
 which occur 100% of the time, and while those tend to get fixed a
 bit faster than 1% failures, automatic rechecks would mean that for
 some period while we're investigating the gate would just be
 spinning over and over running jobs which had no chance of passing.
 
 I suppose it could be argued that elastic-recheck needs a
 categorization mechanism so that it also won't recommend rechecking
 for those sorts of scenarios (all discussion of automated rechecks
 aside).

Yep, if there's a bug which is known to hit 90%+ of the time due
to some known problem, launchpad could be tagged with NoRecheck
and e-r taught to avoid re-queuing such failures.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] Automatic elastic rechecks

2014-07-18 Thread Matt Riedemann



On 7/17/2014 9:01 AM, Matthew Booth wrote:

Elastic recheck is a great tool. It leaves me messages like this:

===
I noticed jenkins failed, I think you hit bug(s):

check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550
gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550

We don't automatically recheck or reverify, so please consider doing
that manually if someone hasn't already. For a code review which is not
yet approved, you can recheck by leaving a code review comment with just
the text:

 recheck bug 1334550

For bug details see: http://status.openstack.org/elastic-recheck/
===

In an ideal world, every person seeing this would diligently check that
the fingerprint match was accurate before submitting a recheck request.

In the real world, how about we just do it automatically?

Matt



We don't want automatic rechecks because then we're just piling on to 
races, because you can have jenkins failures where we have a fingerprint 
for one job failure but there is some other job failing on your patch 
which is an unrecognized failure (no e-r fingerprint query yet).  If we 
never force people to investigate the failures and write fingerprints 
because we're just always automatically rechecking things for them, 
we'll drop our categorization rates and most likely eventually fall into 
a locked gate once we hit 2-3 really nasty races hitting at the same time.


So the best way to avoid a locked gate is to stay on top of managing the 
worst offenders and making sure everyone is actually looking at what 
failed so we can quickly identify new races.


--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] Automatic elastic rechecks

2014-07-18 Thread Daniel P. Berrange
On Fri, Jul 18, 2014 at 09:06:45AM -0500, Matt Riedemann wrote:
 
 
 On 7/17/2014 9:01 AM, Matthew Booth wrote:
 Elastic recheck is a great tool. It leaves me messages like this:
 
 ===
 I noticed jenkins failed, I think you hit bug(s):
 
 check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550
 gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550
 
 We don't automatically recheck or reverify, so please consider doing
 that manually if someone hasn't already. For a code review which is not
 yet approved, you can recheck by leaving a code review comment with just
 the text:
 
  recheck bug 1334550
 
 For bug details see: http://status.openstack.org/elastic-recheck/
 ===
 
 In an ideal world, every person seeing this would diligently check that
 the fingerprint match was accurate before submitting a recheck request.
 
 In the real world, how about we just do it automatically?
 
 Matt
 
 
 We don't want automatic rechecks because then we're just piling on to races,
 because you can have jenkins failures where we have a fingerprint for one
 job failure but there is some other job failing on your patch which is an
 unrecognized failure (no e-r fingerprint query yet).  If we never force
 people to investigate the failures and write fingerprints because we're just
 always automatically rechecking things for them, we'll drop our
 categorization rates and most likely eventually fall into a locked gate once
 we hit 2-3 really nasty races hitting at the same time.

If there were multiple failures and only some were identified, it would
be reasonable to *not* automatically recheck. 

Given that we have issues with resources available to the gate it would
also seems like a benefit to allow us to only recheck the actual jobs
which fail. ie if 1 job fails, don't recheck all 8 jobs because that is
just wasting resource and increases the chances of failing again, and
again and again which wastes more resources and everyone's time.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [gate] Automatic elastic rechecks

2014-07-17 Thread Matthew Booth
Elastic recheck is a great tool. It leaves me messages like this:

===
I noticed jenkins failed, I think you hit bug(s):

check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550
gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550

We don't automatically recheck or reverify, so please consider doing
that manually if someone hasn't already. For a code review which is not
yet approved, you can recheck by leaving a code review comment with just
the text:

recheck bug 1334550

For bug details see: http://status.openstack.org/elastic-recheck/
===

In an ideal world, every person seeing this would diligently check that
the fingerprint match was accurate before submitting a recheck request.

In the real world, how about we just do it automatically?

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev