Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Flavio Percoco

On 25/11/13 21:24 -0600, Dolph Mathews wrote:


On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins robe...@robertcollins.net
wrote:

   This has been mentioned in other threads, but I thought I'd call it
   out and make it an explicit topic.

   We have over 100 recheck bugs open on
   http://status.openstack.org/rechecks/ - there is quite a bit of
   variation in how frequently they are seen :(. In a way thats good, but
   stuff that have been open for months and not seen are likely noise (in
   /rechecks). The rest - the ones we see happening are noise in the
   gate.

   The lower we can drive the spurious failure rate, the less repetitive
   analysing a failure will be, and the more obvious new ones will be -
   it forms a virtuous circle.

   However, many of these bugs - a random check of the first 5 listed
   found /none/ that had been triaged - are no prioritised for fixing.

   So my proposal is that we make it part of the base hygiene for a
   project that any recheck bugs being seen (either by elastic-recheck or
   manual inspection) be considered critical and prioritised above
   feature work.


I agree with the notion here (that fixing transient failures is critically high
priority work for the community) -- but marking the bug as critical priority
is just a subjective abuse of the priority field. A non-critical bug is not
necessarily non-critical work. The critical status should be reserved for
issues that are actually non-shippable, catastrophically breaking issues.



I agree with Dolph. I'd rather tag them instead of marking them as
critical. It is also true that it's not possible to land a patch if
the gate fails, which means these bugs can be interpreted as critical
as well. However, I personally don't think we should let the gate mark
those bugs as critical.

Would a combination of High + tag - elastic-recheck - make sense?

With the above it would be easier to triage them, to know where they
came from and to prioritise them correctly.

Cheers,
FF

--
@flaper87
Flavio Percoco


pgpdMm3Ymu0RB.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Thierry Carrez
Dolph Mathews wrote:
 On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
 robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
 
 So my proposal is that we make it part of the base hygiene for a
 project that any recheck bugs being seen (either by elastic-recheck or
 manual inspection) be considered critical and prioritised above
 feature work.
 
 I agree with the notion here (that fixing transient failures is
 critically high priority work for the community) -- but marking the bug
 as critical priority is just a subjective abuse of the priority field.
 A non-critical bug is not necessarily non-critical work. The critical
 status should be reserved for issues that are actually non-shippable,
 catastrophically breaking issues.

It's a classic bugtracking dilemma where the Importance field is both
used to describe bug impact and priority... while they don't always match.

That said, the impact of those bugs, considering potential development
activity breakage, *is* quite critical (they all are timebombs which
will create future gate fails if not handled at top priority).

So I think marking them Critical + tagging them is not that much of an
abuse, if we start including the gate impact in our bug Impact
assessments. That said, I'm also fine with High+Tag, as long as it
triggers the appropriate fast response everywhere.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Christopher Yeoh
On Tue, Nov 26, 2013 at 7:37 PM, Flavio Percoco fla...@redhat.com wrote:

 On 25/11/13 21:24 -0600, Dolph Mathews wrote:


 On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins 
 robe...@robertcollins.net
 wrote:
 I agree with the notion here (that fixing transient failures is
 critically high
 priority work for the community) -- but marking the bug as critical
 priority
 is just a subjective abuse of the priority field. A non-critical bug is
 not
 necessarily non-critical work. The critical status should be reserved
 for
 issues that are actually non-shippable, catastrophically breaking issues.



 I agree with Dolph. I'd rather tag them instead of marking them as
 critical. It is also true that it's not possible to land a patch if
 the gate fails, which means these bugs can be interpreted as critical
 as well. However, I personally don't think we should let the gate mark
 those bugs as critical.

 Would a combination of High + tag - elastic-recheck - make sense?

 With the above it would be easier to triage them, to know where they
 came from and to prioritise them correctly.


Given that they potentially block not only critical bugs from the same
project from being fixed, but
critical bugs from all projects being fixed (and at the very least they
slow the process of fixing them
down), I think it's quite reasonable to mark them as critical.

I think it'd also be useful if there was a convention of manually tagging
the bugs as gate (or something
similar) when the submitting ones which were the result of transient
failures. It would make them easier
to find and reduce duplicated bug reports which can hide the apparent
regularity of the bug occurring

 Chris
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Sean Dague
On 11/25/2013 10:24 PM, Dolph Mathews wrote:
 
 On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
 robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
 
 This has been mentioned in other threads, but I thought I'd call it
 out and make it an explicit topic.
 
 We have over 100 recheck bugs open on
 http://status.openstack.org/rechecks/ - there is quite a bit of
 variation in how frequently they are seen :(. In a way thats good, but
 stuff that have been open for months and not seen are likely noise (in
 /rechecks). The rest - the ones we see happening are noise in the
 gate.
 
 The lower we can drive the spurious failure rate, the less repetitive
 analysing a failure will be, and the more obvious new ones will be -
 it forms a virtuous circle.
 
 However, many of these bugs - a random check of the first 5 listed
 found /none/ that had been triaged - are no prioritised for fixing.
 
 So my proposal is that we make it part of the base hygiene for a
 project that any recheck bugs being seen (either by elastic-recheck or
 manual inspection) be considered critical and prioritised above
 feature work.
 
 
 I agree with the notion here (that fixing transient failures is
 critically high priority work for the community) -- but marking the bug
 as critical priority is just a subjective abuse of the priority field.
 A non-critical bug is not necessarily non-critical work. The critical
 status should be reserved for issues that are actually non-shippable,
 catastrophically breaking issues.

A race condition in a project which causes it to act with undefined
behavior some statistically significant part of the time in a relatively
small, single node, non highly parallel (at max 4 simultaneous requests)
seems catastrophically breaking to me.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Dolph Mathews
On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.orgwrote:

 Dolph Mathews wrote:
  On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
  robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
 
  So my proposal is that we make it part of the base hygiene for a
  project that any recheck bugs being seen (either by elastic-recheck
 or
  manual inspection) be considered critical and prioritised above
  feature work.
 
  I agree with the notion here (that fixing transient failures is
  critically high priority work for the community) -- but marking the bug
  as critical priority is just a subjective abuse of the priority field.
  A non-critical bug is not necessarily non-critical work. The critical
  status should be reserved for issues that are actually non-shippable,
  catastrophically breaking issues.

 It's a classic bugtracking dilemma where the Importance field is both
 used to describe bug impact and priority... while they don't always match.


++


 That said, the impact of those bugs, considering potential development
 activity breakage, *is* quite critical (they all are timebombs which
 will create future gate fails if not handled at top priority).


I generally agree, but I don't think it's fair to say that the impact of a
transient is universally a single priority, either. Some transient issues
occur more frequently and therefore have higher impact.


 So I think marking them Critical + tagging them is not that much of an
 abuse, if we start including the gate impact in our bug Impact
 assessments. That said, I'm also fine with High+Tag, as long as it
 triggers the appropriate fast response everywhere.


I'm fine with starting them at High, and elevating to Critical as
appropriate.

Is the idea here to automatically apply a tag + priority as a result of
recheck/reverify bug X ? (as long as existing priority isn't overwritten!)



 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 

-Dolph
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Clint Byrum
Excerpts from Thierry Carrez's message of 2013-11-26 03:23:51 -0800:
 Dolph Mathews wrote:
  On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
  robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
  
  So my proposal is that we make it part of the base hygiene for a
  project that any recheck bugs being seen (either by elastic-recheck or
  manual inspection) be considered critical and prioritised above
  feature work.
  
  I agree with the notion here (that fixing transient failures is
  critically high priority work for the community) -- but marking the bug
  as critical priority is just a subjective abuse of the priority field.
  A non-critical bug is not necessarily non-critical work. The critical
  status should be reserved for issues that are actually non-shippable,
  catastrophically breaking issues.
 
 It's a classic bugtracking dilemma where the Importance field is both
 used to describe bug impact and priority... while they don't always match.
 

If I'm on the fence between 1 importance or the other, I look at the bug
list of the two importance lists:

For instance, there are all 122 High importance bugs in Nova, and 6
Critical bugs.

If we are comfortable with developers choosing to fix all 122 of the other
High bugs before this bug, then make it High. If not, make it Critical.
Likewise, if we are uncomfortable with this bug being chosen before any
of the 6 Critical bugs, then make it High.

I realize those two choices could make a person uncomfortable and wish for
something in-between like Hitical or Criticigh, but micro-management
is no way to actually get things done and it does only take a few seconds
to reprioritize as we add insight and data over time.

 That said, the impact of those bugs, considering potential development
 activity breakage, *is* quite critical (they all are timebombs which
 will create future gate fails if not handled at top priority).
 
 So I think marking them Critical + tagging them is not that much of an
 abuse, if we start including the gate impact in our bug Impact
 assessments. That said, I'm also fine with High+Tag, as long as it
 triggers the appropriate fast response everywhere.
 

IMO the tags are a distraction to triage. Critical or High is enough of
a conundrum to resolve. The tags will certainly help guide trackers, and
they should add them, but the person doing triage should mostly focus on
will the patient die? type questions.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Joe Gordon
On Nov 26, 2013 8:48 AM, Dolph Mathews dolph.math...@gmail.com wrote:


 On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.org
wrote:

 Dolph Mathews wrote:
  On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
  robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
 
  So my proposal is that we make it part of the base hygiene for a
  project that any recheck bugs being seen (either by
elastic-recheck or
  manual inspection) be considered critical and prioritised above
  feature work.
 
  I agree with the notion here (that fixing transient failures is
  critically high priority work for the community) -- but marking the bug
  as critical priority is just a subjective abuse of the priority
field.
  A non-critical bug is not necessarily non-critical work. The critical
  status should be reserved for issues that are actually non-shippable,
  catastrophically breaking issues.

 It's a classic bugtracking dilemma where the Importance field is both
 used to describe bug impact and priority... while they don't always
match.


 ++


 That said, the impact of those bugs, considering potential development
 activity breakage, *is* quite critical (they all are timebombs which
 will create future gate fails if not handled at top priority).


 I generally agree, but I don't think it's fair to say that the impact of
a transient is universally a single priority, either. Some transient issues
occur more frequently and therefore have higher impact.


 So I think marking them Critical + tagging them is not that much of an
 abuse, if we start including the gate impact in our bug Impact
 assessments. That said, I'm also fine with High+Tag, as long as it
 triggers the appropriate fast response everywhere.


 I'm fine with starting them at High, and elevating to Critical as
appropriate.

 Is the idea here to automatically apply a tag + priority as a result of
recheck/reverify bug X ? (as long as existing priority isn't overwritten!)

I certainly hope we don't automatically set priority based on raw recheck
data. We have a second list of bugs that we feed to elastic-recheck this
list is reviewed for duplicates and include fingerprints see we can better
assess the bug frequency.  I think the idea is to mark bugs from that list
as critical.  I also think it should be a manual process. As a bug should
be reviewed (does it have enough detail etc) before setting it to critical.




 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --

 -Dolph

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-26 Thread Mark McLoughlin
On Tue, 2013-11-26 at 12:29 -0800, Joe Gordon wrote:
 On Nov 26, 2013 8:48 AM, Dolph Mathews dolph.math...@gmail.com wrote:
 
 
  On Tue, Nov 26, 2013 at 5:23 AM, Thierry Carrez thie...@openstack.org
 wrote:
 
  Dolph Mathews wrote:
   On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
   robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:
  
   So my proposal is that we make it part of the base hygiene for a
   project that any recheck bugs being seen (either by
 elastic-recheck or
   manual inspection) be considered critical and prioritised above
   feature work.
  
   I agree with the notion here (that fixing transient failures is
   critically high priority work for the community) -- but marking the bug
   as critical priority is just a subjective abuse of the priority
 field.
   A non-critical bug is not necessarily non-critical work. The critical
   status should be reserved for issues that are actually non-shippable,
   catastrophically breaking issues.
 
  It's a classic bugtracking dilemma where the Importance field is both
  used to describe bug impact and priority... while they don't always
 match.
 
 
  ++
 
 
  That said, the impact of those bugs, considering potential development
  activity breakage, *is* quite critical (they all are timebombs which
  will create future gate fails if not handled at top priority).
 
 
  I generally agree, but I don't think it's fair to say that the impact of
 a transient is universally a single priority, either. Some transient issues
 occur more frequently and therefore have higher impact.
 
 
  So I think marking them Critical + tagging them is not that much of an
  abuse, if we start including the gate impact in our bug Impact
  assessments. That said, I'm also fine with High+Tag, as long as it
  triggers the appropriate fast response everywhere.
 
 
  I'm fine with starting them at High, and elevating to Critical as
 appropriate.
 
  Is the idea here to automatically apply a tag + priority as a result of
 recheck/reverify bug X ? (as long as existing priority isn't overwritten!)
 
 I certainly hope we don't automatically set priority based on raw recheck
 data. We have a second list of bugs that we feed to elastic-recheck this
 list is reviewed for duplicates and include fingerprints see we can better
 assess the bug frequency.  I think the idea is to mark bugs from that list
 as critical.  I also think it should be a manual process. As a bug should
 be reviewed (does it have enough detail etc) before setting it to critical.

[Just to circle back and clarify my €0.02c during the TC and project
meetings tonight]

Any recheck bug which appears regularly in the graphs here:

  http://status.openstack.org/elastic-recheck/

means that a human has looked at it, determined a fingerprint for it, we
have a bunch of details about it and we have data as to it's regularity.
Any such bug is fair game to be marked Critical.

If it is still there a month later, but no-one is making any progress on
it and it's happening pretty irregularly ... then I think we'll see a
desire to move it back from Critical to High again so that the Critical
list isn't cluttered with stuff people are no longer paying close
attention to.

So, yeah - the intent sounds good to me.

Thanks,
Mark.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-25 Thread Robert Collins
This has been mentioned in other threads, but I thought I'd call it
out and make it an explicit topic.

We have over 100 recheck bugs open on
http://status.openstack.org/rechecks/ - there is quite a bit of
variation in how frequently they are seen :(. In a way thats good, but
stuff that have been open for months and not seen are likely noise (in
/rechecks). The rest - the ones we see happening are noise in the
gate.

The lower we can drive the spurious failure rate, the less repetitive
analysing a failure will be, and the more obvious new ones will be -
it forms a virtuous circle.

However, many of these bugs - a random check of the first 5 listed
found /none/ that had been triaged - are no prioritised for fixing.

So my proposal is that we make it part of the base hygiene for a
project that any recheck bugs being seen (either by elastic-recheck or
manual inspection) be considered critical and prioritised above
feature work.

Thoughts?

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] Treating recently seen recheck bugs as critical across the board

2013-11-25 Thread Dolph Mathews
On Mon, Nov 25, 2013 at 8:12 PM, Robert Collins
robe...@robertcollins.netwrote:

 This has been mentioned in other threads, but I thought I'd call it
 out and make it an explicit topic.

 We have over 100 recheck bugs open on
 http://status.openstack.org/rechecks/ - there is quite a bit of
 variation in how frequently they are seen :(. In a way thats good, but
 stuff that have been open for months and not seen are likely noise (in
 /rechecks). The rest - the ones we see happening are noise in the
 gate.

 The lower we can drive the spurious failure rate, the less repetitive
 analysing a failure will be, and the more obvious new ones will be -
 it forms a virtuous circle.

 However, many of these bugs - a random check of the first 5 listed
 found /none/ that had been triaged - are no prioritised for fixing.

 So my proposal is that we make it part of the base hygiene for a
 project that any recheck bugs being seen (either by elastic-recheck or
 manual inspection) be considered critical and prioritised above
 feature work.


I agree with the notion here (that fixing transient failures is critically
high priority work for the community) -- but marking the bug as critical
priority is just a subjective abuse of the priority field. A non-critical
bug is not necessarily non-critical work. The critical status should be
reserved for issues that are actually non-shippable, catastrophically
breaking issues.



 Thoughts?

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 

-Dolph
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev