Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On 25/11/13 21:24 -0600, Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net wrote: This has been mentioned in other threads, but I thought I'd call it out and make it an explicit topic. We have over 100 recheck bugs open on http://status.openstack.org/rechecks/ - there is quite a bit of variation in how frequently they are seen :(. In a way thats good, but stuff that have been open for months and not seen are likely noise (in /rechecks). The rest - the ones we see happening are noise in the gate. The lower we can drive the spurious failure rate, the less repetitive analysing a failure will be, and the more obvious new ones will be - it forms a virtuous circle. However, many of these bugs - a random check of the first 5 listed found /none/ that had been triaged - are no prioritised for fixing. So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. I agree with Dolph. I'd rather tag them instead of marking them as critical. It is also true that it's not possible to land a patch if the gate fails, which means these bugs can be interpreted as critical as well. However, I personally don't think we should let the gate mark those bugs as critical. Would a combination of High + tag - elastic-recheck - make sense? With the above it would be easier to triage them, to know where they came from and to prioritise them correctly. Cheers, FF -- @flaper87 Flavio Percoco pgpdMm3Ymu0RB.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. It's a classic bugtracking dilemma where the Importance field is both used to describe bug impact and priority... while they don't always match. That said, the impact of those bugs, considering potential development activity breakage, *is* quite critical (they all are timebombs which will create future gate fails if not handled at top priority). So I think marking them Critical + tagging them is not that much of an abuse, if we start including the gate impact in our bug Impact assessments. That said, I'm also fine with High+Tag, as long as it triggers the appropriate fast response everywhere. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On Tue, Nov 26, 2013 at 7:37 PM, Flavio Percoco fla...@redhat.com wrote: On 25/11/13 21:24 -0600, Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net wrote: I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. I agree with Dolph. I'd rather tag them instead of marking them as critical. It is also true that it's not possible to land a patch if the gate fails, which means these bugs can be interpreted as critical as well. However, I personally don't think we should let the gate mark those bugs as critical. Would a combination of High + tag - elastic-recheck - make sense? With the above it would be easier to triage them, to know where they came from and to prioritise them correctly. Given that they potentially block not only critical bugs from the same project from being fixed, but critical bugs from all projects being fixed (and at the very least they slow the process of fixing them down), I think it's quite reasonable to mark them as critical. I think it'd also be useful if there was a convention of manually tagging the bugs as gate (or something similar) when the submitting ones which were the result of transient failures. It would make them easier to find and reduce duplicated bug reports which can hide the apparent regularity of the bug occurring Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On 11/25/2013 10:24 PM, Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: This has been mentioned in other threads, but I thought I'd call it out and make it an explicit topic. We have over 100 recheck bugs open on http://status.openstack.org/rechecks/ - there is quite a bit of variation in how frequently they are seen :(. In a way thats good, but stuff that have been open for months and not seen are likely noise (in /rechecks). The rest - the ones we see happening are noise in the gate. The lower we can drive the spurious failure rate, the less repetitive analysing a failure will be, and the more obvious new ones will be - it forms a virtuous circle. However, many of these bugs - a random check of the first 5 listed found /none/ that had been triaged - are no prioritised for fixing. So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. A race condition in a project which causes it to act with undefined behavior some statistically significant part of the time in a relatively small, single node, non highly parallel (at max 4 simultaneous requests) seems catastrophically breaking to me. -Sean -- Sean Dague http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.orgwrote: Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. It's a classic bugtracking dilemma where the Importance field is both used to describe bug impact and priority... while they don't always match. ++ That said, the impact of those bugs, considering potential development activity breakage, *is* quite critical (they all are timebombs which will create future gate fails if not handled at top priority). I generally agree, but I don't think it's fair to say that the impact of a transient is universally a single priority, either. Some transient issues occur more frequently and therefore have higher impact. So I think marking them Critical + tagging them is not that much of an abuse, if we start including the gate impact in our bug Impact assessments. That said, I'm also fine with High+Tag, as long as it triggers the appropriate fast response everywhere. I'm fine with starting them at High, and elevating to Critical as appropriate. Is the idea here to automatically apply a tag + priority as a result of recheck/reverify bug X ? (as long as existing priority isn't overwritten!) -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -Dolph ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
Excerpts from Thierry Carrez's message of 2013-11-26 03:23:51 -0800: Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. It's a classic bugtracking dilemma where the Importance field is both used to describe bug impact and priority... while they don't always match. If I'm on the fence between 1 importance or the other, I look at the bug list of the two importance lists: For instance, there are all 122 High importance bugs in Nova, and 6 Critical bugs. If we are comfortable with developers choosing to fix all 122 of the other High bugs before this bug, then make it High. If not, make it Critical. Likewise, if we are uncomfortable with this bug being chosen before any of the 6 Critical bugs, then make it High. I realize those two choices could make a person uncomfortable and wish for something in-between like Hitical or Criticigh, but micro-management is no way to actually get things done and it does only take a few seconds to reprioritize as we add insight and data over time. That said, the impact of those bugs, considering potential development activity breakage, *is* quite critical (they all are timebombs which will create future gate fails if not handled at top priority). So I think marking them Critical + tagging them is not that much of an abuse, if we start including the gate impact in our bug Impact assessments. That said, I'm also fine with High+Tag, as long as it triggers the appropriate fast response everywhere. IMO the tags are a distraction to triage. Critical or High is enough of a conundrum to resolve. The tags will certainly help guide trackers, and they should add them, but the person doing triage should mostly focus on will the patient die? type questions. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On Nov 26, 2013 8:48 AM, Dolph Mathews dolph.math...@gmail.com wrote: On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.org wrote: Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. It's a classic bugtracking dilemma where the Importance field is both used to describe bug impact and priority... while they don't always match. ++ That said, the impact of those bugs, considering potential development activity breakage, *is* quite critical (they all are timebombs which will create future gate fails if not handled at top priority). I generally agree, but I don't think it's fair to say that the impact of a transient is universally a single priority, either. Some transient issues occur more frequently and therefore have higher impact. So I think marking them Critical + tagging them is not that much of an abuse, if we start including the gate impact in our bug Impact assessments. That said, I'm also fine with High+Tag, as long as it triggers the appropriate fast response everywhere. I'm fine with starting them at High, and elevating to Critical as appropriate. Is the idea here to automatically apply a tag + priority as a result of recheck/reverify bug X ? (as long as existing priority isn't overwritten!) I certainly hope we don't automatically set priority based on raw recheck data. We have a second list of bugs that we feed to elastic-recheck this list is reviewed for duplicates and include fingerprints see we can better assess the bug frequency. I think the idea is to mark bugs from that list as critical. I also think it should be a manual process. As a bug should be reviewed (does it have enough detail etc) before setting it to critical. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -Dolph ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On Tue, 2013-11-26 at 12:29 -0800, Joe Gordon wrote: On Nov 26, 2013 8:48 AM, Dolph Mathews dolph.math...@gmail.com wrote: On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.org wrote: Dolph Mathews wrote: On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. It's a classic bugtracking dilemma where the Importance field is both used to describe bug impact and priority... while they don't always match. ++ That said, the impact of those bugs, considering potential development activity breakage, *is* quite critical (they all are timebombs which will create future gate fails if not handled at top priority). I generally agree, but I don't think it's fair to say that the impact of a transient is universally a single priority, either. Some transient issues occur more frequently and therefore have higher impact. So I think marking them Critical + tagging them is not that much of an abuse, if we start including the gate impact in our bug Impact assessments. That said, I'm also fine with High+Tag, as long as it triggers the appropriate fast response everywhere. I'm fine with starting them at High, and elevating to Critical as appropriate. Is the idea here to automatically apply a tag + priority as a result of recheck/reverify bug X ? (as long as existing priority isn't overwritten!) I certainly hope we don't automatically set priority based on raw recheck data. We have a second list of bugs that we feed to elastic-recheck this list is reviewed for duplicates and include fingerprints see we can better assess the bug frequency. I think the idea is to mark bugs from that list as critical. I also think it should be a manual process. As a bug should be reviewed (does it have enough detail etc) before setting it to critical. [Just to circle back and clarify my €0.02c during the TC and project meetings tonight] Any recheck bug which appears regularly in the graphs here: http://status.openstack.org/elastic-recheck/ means that a human has looked at it, determined a fingerprint for it, we have a bunch of details about it and we have data as to it's regularity. Any such bug is fair game to be marked Critical. If it is still there a month later, but no-one is making any progress on it and it's happening pretty irregularly ... then I think we'll see a desire to move it back from Critical to High again so that the Critical list isn't cluttered with stuff people are no longer paying close attention to. So, yeah - the intent sounds good to me. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
This has been mentioned in other threads, but I thought I'd call it out and make it an explicit topic. We have over 100 recheck bugs open on http://status.openstack.org/rechecks/ - there is quite a bit of variation in how frequently they are seen :(. In a way thats good, but stuff that have been open for months and not seen are likely noise (in /rechecks). The rest - the ones we see happening are noise in the gate. The lower we can drive the spurious failure rate, the less repetitive analysing a failure will be, and the more obvious new ones will be - it forms a virtuous circle. However, many of these bugs - a random check of the first 5 listed found /none/ that had been triaged - are no prioritised for fixing. So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. Thoughts? -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board
On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.netwrote: This has been mentioned in other threads, but I thought I'd call it out and make it an explicit topic. We have over 100 recheck bugs open on http://status.openstack.org/rechecks/ - there is quite a bit of variation in how frequently they are seen :(. In a way thats good, but stuff that have been open for months and not seen are likely noise (in /rechecks). The rest - the ones we see happening are noise in the gate. The lower we can drive the spurious failure rate, the less repetitive analysing a failure will be, and the more obvious new ones will be - it forms a virtuous circle. However, many of these bugs - a random check of the first 5 listed found /none/ that had been triaged - are no prioritised for fixing. So my proposal is that we make it part of the base hygiene for a project that any recheck bugs being seen (either by elastic-recheck or manual inspection) be considered critical and prioritised above feature work. I agree with the notion here (that fixing transient failures is critically high priority work for the community) -- but marking the bug as critical priority is just a subjective abuse of the priority field. A non-critical bug is not necessarily non-critical work. The critical status should be reserved for issues that are actually non-shippable, catastrophically breaking issues. Thoughts? -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -Dolph ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev