Re: [openstack-dev] [gate] Automatic elastic rechecks
On 2014-07-18 15:09:34 +0100 (+0100), Daniel P. Berrange wrote: [...] If there were multiple failures and only some were identified, it would be reasonable to *not* automatically recheck. [...] Another major blocker is that we often add signatures for failures which occur 100% of the time, and while those tend to get fixed a bit faster than 1% failures, automatic rechecks would mean that for some period while we're investigating the gate would just be spinning over and over running jobs which had no chance of passing. I suppose it could be argued that elastic-recheck needs a categorization mechanism so that it also won't recommend rechecking for those sorts of scenarios (all discussion of automated rechecks aside). -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gate] Automatic elastic rechecks
On Thu, Jul 24, 2014 at 04:31:05PM +, Jeremy Stanley wrote: On 2014-07-18 15:09:34 +0100 (+0100), Daniel P. Berrange wrote: [...] If there were multiple failures and only some were identified, it would be reasonable to *not* automatically recheck. [...] Another major blocker is that we often add signatures for failures which occur 100% of the time, and while those tend to get fixed a bit faster than 1% failures, automatic rechecks would mean that for some period while we're investigating the gate would just be spinning over and over running jobs which had no chance of passing. I suppose it could be argued that elastic-recheck needs a categorization mechanism so that it also won't recommend rechecking for those sorts of scenarios (all discussion of automated rechecks aside). Yep, if there's a bug which is known to hit 90%+ of the time due to some known problem, launchpad could be tagged with NoRecheck and e-r taught to avoid re-queuing such failures. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gate] Automatic elastic rechecks
On 7/17/2014 9:01 AM, Matthew Booth wrote: Elastic recheck is a great tool. It leaves me messages like this: === I noticed jenkins failed, I think you hit bug(s): check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550 gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550 We don't automatically recheck or reverify, so please consider doing that manually if someone hasn't already. For a code review which is not yet approved, you can recheck by leaving a code review comment with just the text: recheck bug 1334550 For bug details see: http://status.openstack.org/elastic-recheck/ === In an ideal world, every person seeing this would diligently check that the fingerprint match was accurate before submitting a recheck request. In the real world, how about we just do it automatically? Matt We don't want automatic rechecks because then we're just piling on to races, because you can have jenkins failures where we have a fingerprint for one job failure but there is some other job failing on your patch which is an unrecognized failure (no e-r fingerprint query yet). If we never force people to investigate the failures and write fingerprints because we're just always automatically rechecking things for them, we'll drop our categorization rates and most likely eventually fall into a locked gate once we hit 2-3 really nasty races hitting at the same time. So the best way to avoid a locked gate is to stay on top of managing the worst offenders and making sure everyone is actually looking at what failed so we can quickly identify new races. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gate] Automatic elastic rechecks
On Fri, Jul 18, 2014 at 09:06:45AM -0500, Matt Riedemann wrote: On 7/17/2014 9:01 AM, Matthew Booth wrote: Elastic recheck is a great tool. It leaves me messages like this: === I noticed jenkins failed, I think you hit bug(s): check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550 gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550 We don't automatically recheck or reverify, so please consider doing that manually if someone hasn't already. For a code review which is not yet approved, you can recheck by leaving a code review comment with just the text: recheck bug 1334550 For bug details see: http://status.openstack.org/elastic-recheck/ === In an ideal world, every person seeing this would diligently check that the fingerprint match was accurate before submitting a recheck request. In the real world, how about we just do it automatically? Matt We don't want automatic rechecks because then we're just piling on to races, because you can have jenkins failures where we have a fingerprint for one job failure but there is some other job failing on your patch which is an unrecognized failure (no e-r fingerprint query yet). If we never force people to investigate the failures and write fingerprints because we're just always automatically rechecking things for them, we'll drop our categorization rates and most likely eventually fall into a locked gate once we hit 2-3 really nasty races hitting at the same time. If there were multiple failures and only some were identified, it would be reasonable to *not* automatically recheck. Given that we have issues with resources available to the gate it would also seems like a benefit to allow us to only recheck the actual jobs which fail. ie if 1 job fails, don't recheck all 8 jobs because that is just wasting resource and increases the chances of failing again, and again and again which wastes more resources and everyone's time. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [gate] Automatic elastic rechecks
Elastic recheck is a great tool. It leaves me messages like this: === I noticed jenkins failed, I think you hit bug(s): check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550 gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550 We don't automatically recheck or reverify, so please consider doing that manually if someone hasn't already. For a code review which is not yet approved, you can recheck by leaving a code review comment with just the text: recheck bug 1334550 For bug details see: http://status.openstack.org/elastic-recheck/ === In an ideal world, every person seeing this would diligently check that the fingerprint match was accurate before submitting a recheck request. In the real world, how about we just do it automatically? Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev