Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread James Polley
There was some discussion of this topic during today's meeting.

Full notes are at
http://eavesdrop.openstack.org/meetings/tripleo/2014/tripleo.2014-09-09-19.04.log.html
starting around 19:24

To summarise, we had one action item, and one comment from dprince which
attracted broad agreement, and which I think is worth repeating here:

19:34:26 tchaypo #action bnemec to look at adding a report of items that
have a -1 from a core but no response for 14 days, as a first step towards
possibly auto-WIPing these patches

19:41:54 dprince I sort of feel like all the focus on stats in our
meetings is going to encourage people to game them (i.e. fly by reviews)
19:42:13 dprince when what we really need is ownershipt to push through
the important things...
19:42:45 * dprince thinkgs stats are becoming a TripleO waste of time
19:44:12 jp_at_hp stats not important, being active and responsive enough
to not push new contributors away does matter
19:44:51 lifeless jp_at_hp: I agree; the stats are a tool, not the thing
itself.

On Fri, Sep 5, 2014 at 9:47 PM, Sullivan, Jon Paul jonpaul.sulli...@hp.com
wrote:

  -Original Message-
  From: Jay Dobies [mailto:jason.dob...@redhat.com]
  Sent: 04 September 2014 18:24
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [TripleO] Review metrics - what do we want
  to measure?
 
   It can, by running your own... but again it seems far better for core
   reviewers to decide if a change has potential or needs to be
   abandoned--that way there's an accountable human making that
   deliberate choice rather than the review team hiding behind an
   automated process so that no one is to blame for hurt feelings
   besides the infra operators who are enforcing this draconian measure
   for you.
  
   The thing is that it's also pushing more work onto already overloaded
   core review teams.  Maybe submitters don't like auto-abandon, but I
   bet they like having a core reviewer spending time cleaning up dead
   reviews instead of reviewing their change even less.
  
   TBH, if someone's offended by the bot then I can't imagine how
   incensed they must be when a human does the same thing.  The bot
   clearly isn't making it personal, and even if the human isn't either
   it's much easier to have misunderstandings (see also every over-
  reaction to a -1 ever).
  
   I suppose it makes it easier for cores to ignore reviews, but from the
   other discussions I've read that hasn't gone away just because
   auto-abandon did, so I'm not convinced that's a solution anyway.
 
  +1, I don't think it'll come as much of a shock if a -1 review gets
  closed due to time without progress.

 +1 to auto-abandon.

 Taking an excerpt from Dereks mail:

  A patch got negative feedback and we're automating the process
  to prompt the submitter to deal with it. It may be more friendly if it
  was a 2 step process
1. (after a few days if inactivity) Add a comment saying you got
  negative feedback with suggestions of how to proceed and information
  that the review will be autoabandoned if nothing is done in X number of
  days.
2. Auto abandon patch, with as much information as possible on how to
  reopen if needed.

 This sounds like the best solution.  A 1 week warning and a 2 week
 abandoning.

 It is about as welcoming as this sort of thing could be for a new
 committer, whilst also avoiding an ever-growing backlog of changes that
 will never see attention because the submitter has moved on to other things.

 There is also the benefit of prompting submitters into action when things
 get auto-abandoned, rather than them having sitting at priority #2 or #3
 forever.

 I was never particularly upset over the auto-abandon, just click a single
 button in the ui and get on with it.

 
   /2 cents
  
  
   To make the whole process a little friendlier we could increase
   the time frame from 1 week to 2.
  
   snarkHow about just automatically abandon any new change as soon
   as it's published, and if the contributor really feels it's
   important they'll unabandon it./snark
  
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 Thanks,
 Jon-Paul Sullivan ☺ Cloud Services - @hpcloud

 Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park,
 Galway.
 Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John
 Rogerson's Quay, Dublin 2.
 Registered Number: 361933

 The contents of this message and any attachments to it are confidential
 and may be legally privileged. If you have received this message in error
 you should delete it from your system immediately and advise the sender.

 To any recipient of this message within

Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread Steven Hardy
On Thu, Sep 04, 2014 at 01:54:20PM +, Jeremy Stanley wrote:
 On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
 [...]
  How would people feel about turning [auto-abandon] back on?
 
 A lot of reviewers (myself among them) feel auto-abandon was a
 cold and emotionless way to provide feedback on a change. Especially
 on high-change-volume projects where core reviewers may at times get
 sucked into triaging other problems for long enough that the
 auto-abandoner kills lots of legitimate changes (possibly from
 new contributors who will get even more disgusted by this than the
 silence itself and walk away indefinitely with the impression that
 we really aren't a welcoming development community at all).

I can understand this argument, and perhaps an auto-abandon with a long
period like say a month for the submitter to address comments and reset the
clock would be a reasonable compromise?

The problem we have now, is there is a constantly expanding queue of zombie
reviews, where the submitter got some negative feedback and, for whatever
reason, has chosen not to address it.

For example, in my Incoming reviews queue on the gerrit dashboard, I've
got reviews I (or someone) -1'd over four months ago, with zero feedback
from the submitter, what value is there to these reviews cluttering up the
dashboard for every reviewer?

To make matters worse Jenkins comes along periodically and rechecks or
figures out the patch merge failed and bumps the zombie back up to the top
of the queue - obviously I don't realise that it's not a human comment I
need to read until I've expended effort clicking on the review and reading
it :(

From a reviewer perspective, it's impossible, and means the review
dashboard is basically unusable without complex queries to weed out the
active reviews from the zombies.

  Can it be done on a per project basis?
 
 It can, by running your own... but again it seems far better for
 core reviewers to decide if a change has potential or needs to be
 abandoned--that way there's an accountable human making that
 deliberate choice rather than the review team hiding behind an
 automated process so that no one is to blame for hurt feelings
 besides the infra operators who are enforcing this draconian measure
 for you.

With all the threads about core-reviewer overload, I think it's
unreasonable to expect us to scrub the review queue making daily judgements
on whether a patch should be abandoned, and I'd argue that a human
abandoning another human's patch has much more demotivating impact on
contributors than a clearly documented automated process that you must
address negative review comments within a set period or your review will
expire.

If you think about mailing list patch workflow, it's similar - you post
your patch, get email review feedback, then post new reviews with fixes.
If you fail to post new reviews with fixes, your review falls out the
bottom of people's inboxes and you don't get your patch merged.

 
  To make the whole process a little friendlier we could increase
  the time frame from 1 week to 2.
 
 snarkHow about just automatically abandon any new change as soon
 as it's published, and if the contributor really feels it's
 important they'll unabandon it./snark

I think that's a pretty unfair comment - all reviewers and most
contributors are working really hard, all we're asking for is that
contributors work with reviewers to get their patch into shape withing a
reasonable time. :(

As someone who's drinking from the firehose every day with a seemingly
insurmountable review queue, I'd rather we worked towards processes which
help us manage the workload in a sustainable way, rather than turning the
review queue into the zombie-soup-of-dispair it currently is.

All we need is two things:

1. A way to automatically escalate reviews which have no feedback at all
from core reviewers within a reasonable time (say a week or two)

2. A way to automatically remove reviews from the queue which have core
negative feedback with no resolution within a reasonable time (say a month
or several weeks, so it's not percieved contributor-hostile).

Note (2) still allows core reviewers to decide if a change has potential,
it just becomes opt-in, e.g we have to address the issues which prevent us
giving it positive feedback, either by directly working with the
contributor, or delegating the work to an active contributor if the
original patch author has decided not to continue work on the patch.

Ultimately, it's not just about reviews - who's going to maintain all these
features if the author is not committed enough to post just one patch a month
while getting it through the review process? Oh wait, that'll be another
job for the core team won't it?

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread Ben Nemec
On 09/10/2014 03:57 AM, Steven Hardy wrote:
 On Thu, Sep 04, 2014 at 01:54:20PM +, Jeremy Stanley wrote:
 On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
 [...]
 How would people feel about turning [auto-abandon] back on?

 A lot of reviewers (myself among them) feel auto-abandon was a
 cold and emotionless way to provide feedback on a change. Especially
 on high-change-volume projects where core reviewers may at times get
 sucked into triaging other problems for long enough that the
 auto-abandoner kills lots of legitimate changes (possibly from
 new contributors who will get even more disgusted by this than the
 silence itself and walk away indefinitely with the impression that
 we really aren't a welcoming development community at all).
 
 I can understand this argument, and perhaps an auto-abandon with a long
 period like say a month for the submitter to address comments and reset the
 clock would be a reasonable compromise?
 
 The problem we have now, is there is a constantly expanding queue of zombie
 reviews, where the submitter got some negative feedback and, for whatever
 reason, has chosen not to address it.
 
 For example, in my Incoming reviews queue on the gerrit dashboard, I've
 got reviews I (or someone) -1'd over four months ago, with zero feedback
 from the submitter, what value is there to these reviews cluttering up the
 dashboard for every reviewer?
 
 To make matters worse Jenkins comes along periodically and rechecks or
 figures out the patch merge failed and bumps the zombie back up to the top
 of the queue - obviously I don't realise that it's not a human comment I
 need to read until I've expended effort clicking on the review and reading
 it :(
 
 From a reviewer perspective, it's impossible, and means the review
 dashboard is basically unusable without complex queries to weed out the
 active reviews from the zombies.
 
 Can it be done on a per project basis?

 It can, by running your own... but again it seems far better for
 core reviewers to decide if a change has potential or needs to be
 abandoned--that way there's an accountable human making that
 deliberate choice rather than the review team hiding behind an
 automated process so that no one is to blame for hurt feelings
 besides the infra operators who are enforcing this draconian measure
 for you.
 
 With all the threads about core-reviewer overload, I think it's
 unreasonable to expect us to scrub the review queue making daily judgements
 on whether a patch should be abandoned, and I'd argue that a human
 abandoning another human's patch has much more demotivating impact on
 contributors than a clearly documented automated process that you must
 address negative review comments within a set period or your review will
 expire.
 
 If you think about mailing list patch workflow, it's similar - you post
 your patch, get email review feedback, then post new reviews with fixes.
 If you fail to post new reviews with fixes, your review falls out the
 bottom of people's inboxes and you don't get your patch merged.
 

 To make the whole process a little friendlier we could increase
 the time frame from 1 week to 2.

 snarkHow about just automatically abandon any new change as soon
 as it's published, and if the contributor really feels it's
 important they'll unabandon it./snark
 
 I think that's a pretty unfair comment - all reviewers and most
 contributors are working really hard, all we're asking for is that
 contributors work with reviewers to get their patch into shape withing a
 reasonable time. :(
 
 As someone who's drinking from the firehose every day with a seemingly
 insurmountable review queue, I'd rather we worked towards processes which
 help us manage the workload in a sustainable way, rather than turning the
 review queue into the zombie-soup-of-dispair it currently is.
 
 All we need is two things:
 
 1. A way to automatically escalate reviews which have no feedback at all
 from core reviewers within a reasonable time (say a week or two)
 
 2. A way to automatically remove reviews from the queue which have core
 negative feedback with no resolution within a reasonable time (say a month
 or several weeks, so it's not percieved contributor-hostile).

A suggestion Robert made during that discussion was to have auto-WIP
instead of auto-abandon.  That should be less hostile to contributors
and still makes it easy to filter out the reviews that aren't ready to
merge anyway.  For me personally, and for the sake of tracking the
stats, that would be sufficient to address the problem.

 
 Note (2) still allows core reviewers to decide if a change has potential,
 it just becomes opt-in, e.g we have to address the issues which prevent us
 giving it positive feedback, either by directly working with the
 contributor, or delegating the work to an active contributor if the
 original patch author has decided not to continue work on the patch.
 
 Ultimately, it's not just about reviews - who's going to maintain all 

Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Derek Higgins
On 14/08/14 00:03, James Polley wrote:
 In recent history, we've been looking each week at stats
 from http://russellbryant.net/openstack-stats/tripleo-openreviews.html
 to get a gauge on how our review pipeline is tracking.
 
 The main stats we've been tracking have been the since the last
 revision without -1 or -2. I've included some history at [1], but the
 summary is that our 3rd quartile has slipped from 13 days to 16 days
 over the last 4 weeks or so. Our 1st quartile is fairly steady lately,
 around 1 day (down from 4 a month ago) and median is unchanged around 7
 days.
 
 There was lots of discussion in our last meeting about what could be
 causing this[2]. However, the thing we wanted to bring to the list for
 the discussion is:
 
 Are we tracking the right metric? Should we be looking to something else
 to tell us how well our pipeline is performing?
 
 The meeting logs have quite a few suggestions about ways we could tweak
 the existing metrics, but if we're measuring the wrong thing that's not
 going to help.
 
 I think that what we are looking for is a metric that lets us know
 whether the majority of patches are getting feedback quickly. Maybe
 there's some other metric that would give us a good indication?

Bring back auto abandon...

Gerrit at one stage not so long ago used to auto abandon patches that
had negative feedback and was over a week without activity, I believe
this was removed when a gerrit upgrade gave core reviewers the ability
to abandon other peoples patches.

This was the single best part of the entire process to keep things moving
o submitters were forced to keep patches current
o reviewers were not looking at stale or already known broken patches
o If something wasn't important it got abandoned and was never heard of
again, if it was important it would be reopened
o patch submitters were forced to engage with the reviewers quickly on
negative feedback instead of leaving a patch sitting there indefinitely

Here is the number I think we should be looking at
http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt
  Queue growth in the last 30 days: 72 (2.4/day)

http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt
  Queue growth in the last 90 days: 132 (1.5/day)

Obviously this isn't sustainable

re enabling auto abandon will ensure the majority of the patches we are
looking at are current/good quality and not lost in a sea of -1's.

How would people feel about turning it back on? Can it be done on a per
project basis?

To make the whole process a little friendlier we could increase the time
frame from 1 week to 2.

Derek.

 
 
 
 
 [1] Current Stats since the last revision without -1 or -2 :
 
 Average wait time: 10 days, 17 hours, 6 minutes
 1st quartile wait time: 1 days, 1 hours, 36 minutes
 Median wait time: 7 days, 5 hours, 33 minutes
 3rd quartile wait time: 16 days, 8 hours, 16 minutes
 
 At last week's meeting we had: 3rd quartile wait time: 15 days, 13
 hours, 47 minutes
 A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes
 The week before that was the mid-cycle, but the week before that:
 
 19:53:38 lifeless Stats since the last revision without -1 or -2 :
 19:53:38 lifeless Average wait time: 10 days, 17 hours, 49 minutes
 19:53:38 lifeless 1st quartile wait time: 4 days, 7 hours, 57 minutes
 19:53:38 lifeless Median wait time: 7 days, 10 hours, 52 minutes
 19:53:40 lifeless 3rd quartile wait time: 13 days, 13 hours, 25 minutes
 
 [2] Some of the things suggested as potential causes of the long 3rd
 median times:
 
 * We have small number of really old reviews that have only positive
 scores but aren't being landed
 * Some reviews get a -1 but then sit for a long time waiting for the
 author to reply
 * We have some really old reviews that suddenly get revived after a long
 period being in WIP or abandoned, which reviewstats seems to miscount
 * Reviewstats counts weekends, we don't (so a change that gets pushed at
 5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as
 having no wait time, but by reviewstats as ~36 hours)
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Jeremy Stanley
On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
[...]
 How would people feel about turning [auto-abandon] back on?

A lot of reviewers (myself among them) feel auto-abandon was a
cold and emotionless way to provide feedback on a change. Especially
on high-change-volume projects where core reviewers may at times get
sucked into triaging other problems for long enough that the
auto-abandoner kills lots of legitimate changes (possibly from
new contributors who will get even more disgusted by this than the
silence itself and walk away indefinitely with the impression that
we really aren't a welcoming development community at all).

 Can it be done on a per project basis?

It can, by running your own... but again it seems far better for
core reviewers to decide if a change has potential or needs to be
abandoned--that way there's an accountable human making that
deliberate choice rather than the review team hiding behind an
automated process so that no one is to blame for hurt feelings
besides the infra operators who are enforcing this draconian measure
for you.

 To make the whole process a little friendlier we could increase
 the time frame from 1 week to 2.

snarkHow about just automatically abandon any new change as soon
as it's published, and if the contributor really feels it's
important they'll unabandon it./snark
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Derek Higgins
On 04/09/14 14:54, Jeremy Stanley wrote:
 On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
 [...]
 How would people feel about turning [auto-abandon] back on?
 
 A lot of reviewers (myself among them) feel auto-abandon was a
 cold and emotionless way to provide feedback on a change. Especially
 on high-change-volume projects where core reviewers may at times get
 sucked into triaging other problems for long enough that the
 auto-abandoner kills lots of legitimate changes (possibly from
 new contributors who will get even more disgusted by this than the
 silence itself and walk away indefinitely with the impression that
 we really aren't a welcoming development community at all).

Ok, I see how this may be unwelcoming to a new contributor, a feeling
that could be justified in some cases. Any established contributor
should (I know I did when it was enforce) see it as part of the process.

perhaps we exempt new users?

On the other hand I'm not talking about abandoning a change because
there was silence for a fixed period of time, I'm talking about
abandoning it because it got negative feedback and it wasn't addressed
either through discussion or a new patch.

I have no problem if we push the inactivity period out to a month or
more, I just think there needs to be a cutoff at some stage.


 
 Can it be done on a per project basis?
 
 It can, by running your own... but again it seems far better for
 core reviewers to decide if a change has potential or needs to be
 abandoned--that way there's an accountable human making that
 deliberate choice rather than the review team hiding behind an
 automated process so that no one is to blame for hurt feelings
 besides the infra operators who are enforcing this draconian measure
 for you.

There are plenty of examples of places where we have automated processes
in the community (some of which may hurt feeling) in order to take load
off specific individuals or the community in general. In fact automating
processes in places where people don't scale or are bottlenecks seems to
be a common theme.

We automate CI and give people negative feedback
We expire bugs in some projects that are Incomplete and are 60 days inactive

I really don't see this as the review team hiding behind an automated
process. A patch got negative feedback and we're automating the process
to prompt the submitter to deal with it. It may be more friendly if it
was a 2 step process
  1. (after a few days if inactivity) Add a comment saying you got
negative feedback with suggestions of how to proceed and information
that the review will be autoabandoned if nothing is done in X number of
days.
  2. Auto abandon patch, with as much information as possible on how to
reopen if needed.

 
 To make the whole process a little friendlier we could increase
 the time frame from 1 week to 2.
 
 snarkHow about just automatically abandon any new change as soon
 as it's published, and if the contributor really feels it's
 important they'll unabandon it./snark
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Ben Nemec
On 09/04/2014 08:54 AM, Jeremy Stanley wrote:
 On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
 [...]
 How would people feel about turning [auto-abandon] back on?
 
 A lot of reviewers (myself among them) feel auto-abandon was a
 cold and emotionless way to provide feedback on a change. Especially
 on high-change-volume projects where core reviewers may at times get
 sucked into triaging other problems for long enough that the
 auto-abandoner kills lots of legitimate changes (possibly from
 new contributors who will get even more disgusted by this than the
 silence itself and walk away indefinitely with the impression that
 we really aren't a welcoming development community at all).
 
 Can it be done on a per project basis?
 
 It can, by running your own... but again it seems far better for
 core reviewers to decide if a change has potential or needs to be
 abandoned--that way there's an accountable human making that
 deliberate choice rather than the review team hiding behind an
 automated process so that no one is to blame for hurt feelings
 besides the infra operators who are enforcing this draconian measure
 for you.

The thing is that it's also pushing more work onto already overloaded
core review teams.  Maybe submitters don't like auto-abandon, but I bet
they like having a core reviewer spending time cleaning up dead reviews
instead of reviewing their change even less.

TBH, if someone's offended by the bot then I can't imagine how incensed
they must be when a human does the same thing.  The bot clearly isn't
making it personal, and even if the human isn't either it's much easier
to have misunderstandings (see also every over-reaction to a -1 ever).

I suppose it makes it easier for cores to ignore reviews, but from the
other discussions I've read that hasn't gone away just because
auto-abandon did, so I'm not convinced that's a solution anyway.

/2 cents

 
 To make the whole process a little friendlier we could increase
 the time frame from 1 week to 2.
 
 snarkHow about just automatically abandon any new change as soon
 as it's published, and if the contributor really feels it's
 important they'll unabandon it./snark
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Jay Dobies

It can, by running your own... but again it seems far better for
core reviewers to decide if a change has potential or needs to be
abandoned--that way there's an accountable human making that
deliberate choice rather than the review team hiding behind an
automated process so that no one is to blame for hurt feelings
besides the infra operators who are enforcing this draconian measure
for you.


The thing is that it's also pushing more work onto already overloaded
core review teams.  Maybe submitters don't like auto-abandon, but I bet
they like having a core reviewer spending time cleaning up dead reviews
instead of reviewing their change even less.

TBH, if someone's offended by the bot then I can't imagine how incensed
they must be when a human does the same thing.  The bot clearly isn't
making it personal, and even if the human isn't either it's much easier
to have misunderstandings (see also every over-reaction to a -1 ever).

I suppose it makes it easier for cores to ignore reviews, but from the
other discussions I've read that hasn't gone away just because
auto-abandon did, so I'm not convinced that's a solution anyway.


+1, I don't think it'll come as much of a shock if a -1 review gets 
closed due to time without progress.



/2 cents




To make the whole process a little friendlier we could increase
the time frame from 1 week to 2.


snarkHow about just automatically abandon any new change as soon
as it's published, and if the contributor really feels it's
important they'll unabandon it./snark




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Jesus M. Gonzalez-Barahona
On Wed, 2014-09-03 at 12:58 +1200, Robert Collins wrote:
 On 14 August 2014 11:03, James Polley j...@jamezpolley.com wrote:
  In recent history, we've been looking each week at stats from
  http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
  gauge on how our review pipeline is tracking.
 
  The main stats we've been tracking have been the since the last revision
  without -1 or -2. I've included some history at [1], but the summary is
  that our 3rd quartile has slipped from 13 days to 16 days over the last 4
  weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
  from 4 a month ago) and median is unchanged around 7 days.
 
  There was lots of discussion in our last meeting about what could be causing
  this[2]. However, the thing we wanted to bring to the list for the
  discussion is:
 
  Are we tracking the right metric? Should we be looking to something else to
  tell us how well our pipeline is performing?
 
  The meeting logs have quite a few suggestions about ways we could tweak the
  existing metrics, but if we're measuring the wrong thing that's not going to
  help.
 
  I think that what we are looking for is a metric that lets us know whether
  the majority of patches are getting feedback quickly. Maybe there's some
  other metric that would give us a good indication?
 
 If we review all patches quickly and land none, thats bad too :).
 
 For the reviewers specifically i think we need a metric(s) that:
  - doesn't go bad when submitters go awol, don't respond etc
- including when they come back - our stats shouldn't jump hugely
 because an old review was resurrected
  - when good means submitters will be getting feedback
  - flag inventory- things we'd be happy to have landed that haven't
- including things with a -1 from non-core reviewers (*)
 
 (*) I often see -1's on things core wouldn't -1 due to the learning
 curve involved in becoming core
 
 So, as Ben says, I think we need to address the its-not-a-vote issue
 as a priority, that has tripped us up in lots of ways
 
 I think we need to discount -workflow patches where that was set by
 the submitter, which AFAICT we don't do today.
 
 Looking at current stats:
 Longest waiting reviews (based on oldest rev without -1 or -2):
 
 54 days, 2 hours, 41 minutes https://review.openstack.org/106167
 (Keystone/LDAP integration)
 That patch had a -1 on Aug 16 1:23 AM: but was quickyl turned to +2.
 
 So this patch had a -1 then after discussion it became a +2. And its
 evolved multiple times.
 
 What should we be saying here? Clearly its had little review input
 over its life, so I think its sadly accurate.
 
 I wonder if a big chunk of our sliding quartile is just use not
 reviewing the oldest reviews.

I've been researching review process in OpenStack and other projects for
a while, and my impression is that at least three timing metrics are
relevant:

(1) Total time from submitting a patch to final closing of the review
process (landing that, or a subsequent patch, or finally abandoning).
This gives an idea of how the whole process is working.

(2) Time from submitting a patch to that patch being approved (+2 in
OpenStack, I guess) or declined (and a new patch is requested). This
gives an idea of how quick reviewers are providing definite feedback to
patch submitters, and is a metric for each patch cycle.

(3) Time from a patch being reviewed, with a new patch being requested,
to a new patch being submitted. This gives an idea of the reaction
time of patch submitter.

Usually, you want to keep (1) low, while (2) and (3) give you an idea of
what is happening if (1) gets high.

There is another relevant metric in some cases, which is

(4) The number of patch cycles per review cycle (that is, how many
patches are needed per patch landing in master). In some cases, that may
help to explain how (2) and (3) contribute to (1).

And a fifth metric gives you a throughput metric:

(5) BMI (backlog management index), number of new review processes by
number of closed review process for a certain period. It gives an idea
of whether the backlog is going up (1) or down (1), and is usually
very interesting when seen over time.

(1) alone is not enough to assess on how well the review process is,
because it could low, but (5) showing an increasing backlog because
simply new review requests come too quickly (eg, in periods when
developers are submitting a lot of patch proposals after a freeze). (1)
could also be high, but (5) show a decrease in the backlog, because for
example reviewers or submitters are overworked or slowly scheduled, but
still the project copes with the backlog. Depending on the relationship
of (1) and (5), maybe you need more reviewers, or reviewers scheduling
their reviews with more priority wrt other actions, or something else.

Note for example that in a project with low BMI (1) for a long period,
but with a high total delay in reviews (1), usually putting more
reviewers doesn't reduce 

Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Joshua Hesketh

On 9/3/14 10:43 AM, Jeremy Stanley wrote:

On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:

I thought there was now a thung where zuul can use a different account
per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).
It's currently in review[0], although I think from discussions with 
other zuul devs we're likely to refactor large parts of how we connect 
to systems and hence this may be delayed. If it's something that's 
needed soon(erish) we could probably do this step before the refactoring.


Cheers,
Josh

[0] https://review.openstack.org/#/c/97391/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Joshua Hesketh

I've moved it back up the review chain for you :-)

Rackspace Australia

On 9/3/14 6:34 PM, Robert Collins wrote:

We would benefit a great deal from having this sooner.

On 3 September 2014 20:11, Joshua Hesketh joshua.hesk...@rackspace.com wrote:

On 9/3/14 10:43 AM, Jeremy Stanley wrote:

On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:

I thought there was now a thung where zuul can use a different account
per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).

It's currently in review[0], although I think from discussions with other
zuul devs we're likely to refactor large parts of how we connect to systems
and hence this may be delayed. If it's something that's needed soon(erish)
we could probably do this step before the refactoring.

Cheers,
Josh

[0] https://review.openstack.org/#/c/97391/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Robert Collins
On 16 August 2014 02:43, Jeremy Stanley fu...@yuggoth.org wrote:
 On 2014-08-13 19:51:52 -0500 (-0500), Ben Nemec wrote:
 [...]
 make the check-tripleo job leave an actual vote rather than just a
 comment.
 [...]

 That, as previously discussed, will require some design work in
 Zuul. Gerrit uses a single field per account for verify votes, which
 means that if you want to have multiple concurrent votes for
 different pipelines/job collections then we either need to use more
 than one account for those or add additional columns for each.

 There have already been discussions around how to implement this,
 but in the TripleO case it might make more sense to revisit why we
 have those additional pipelines and instead focus on resolving the
 underlying issues which led to their use as a stop-gap.

I thought there was now a thung where zuul can use a different account
per pipeline?

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Jeremy Stanley
On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:
 I thought there was now a thung where zuul can use a different account
 per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Robert Collins
On 14 August 2014 11:03, James Polley j...@jamezpolley.com wrote:
 In recent history, we've been looking each week at stats from
 http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
 gauge on how our review pipeline is tracking.

 The main stats we've been tracking have been the since the last revision
 without -1 or -2. I've included some history at [1], but the summary is
 that our 3rd quartile has slipped from 13 days to 16 days over the last 4
 weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
 from 4 a month ago) and median is unchanged around 7 days.

 There was lots of discussion in our last meeting about what could be causing
 this[2]. However, the thing we wanted to bring to the list for the
 discussion is:

 Are we tracking the right metric? Should we be looking to something else to
 tell us how well our pipeline is performing?

 The meeting logs have quite a few suggestions about ways we could tweak the
 existing metrics, but if we're measuring the wrong thing that's not going to
 help.

 I think that what we are looking for is a metric that lets us know whether
 the majority of patches are getting feedback quickly. Maybe there's some
 other metric that would give us a good indication?

If we review all patches quickly and land none, thats bad too :).

For the reviewers specifically i think we need a metric(s) that:
 - doesn't go bad when submitters go awol, don't respond etc
   - including when they come back - our stats shouldn't jump hugely
because an old review was resurrected
 - when good means submitters will be getting feedback
 - flag inventory- things we'd be happy to have landed that haven't
   - including things with a -1 from non-core reviewers (*)

(*) I often see -1's on things core wouldn't -1 due to the learning
curve involved in becoming core

So, as Ben says, I think we need to address the its-not-a-vote issue
as a priority, that has tripped us up in lots of ways

I think we need to discount -workflow patches where that was set by
the submitter, which AFAICT we don't do today.

Looking at current stats:
Longest waiting reviews (based on oldest rev without -1 or -2):

54 days, 2 hours, 41 minutes https://review.openstack.org/106167
(Keystone/LDAP integration)
That patch had a -1 on Aug 16 1:23 AM: but was quickyl turned to +2.

So this patch had a -1 then after discussion it became a +2. And its
evolved multiple times.

What should we be saying here? Clearly its had little review input
over its life, so I think its sadly accurate.

I wonder if a big chunk of our sliding quartile is just use not
reviewing the oldest reviews.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-08-15 Thread Jeremy Stanley
On 2014-08-13 19:51:52 -0500 (-0500), Ben Nemec wrote:
[...]
 make the check-tripleo job leave an actual vote rather than just a
 comment.
[...]

That, as previously discussed, will require some design work in
Zuul. Gerrit uses a single field per account for verify votes, which
means that if you want to have multiple concurrent votes for
different pipelines/job collections then we either need to use more
than one account for those or add additional columns for each.

There have already been discussions around how to implement this,
but in the TripleO case it might make more sense to revisit why we
have those additional pipelines and instead focus on resolving the
underlying issues which led to their use as a stop-gap.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-08-13 Thread Ben Nemec
One thing I am very interested in finally following up on, especially in
light of the snazzy new Gerrit separation for CI jobs, is to make the
check-tripleo job leave an actual vote rather than just a comment.  This
would clean up the (usually) many reviews sitting with a failing CI run,
for the purposes of stats anyway.  It would also make it easier to find
reviews that need a recheck or are legitimately breaking CI.

On 08/13/2014 06:03 PM, James Polley wrote:
 In recent history, we've been looking each week at stats from
 http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
 gauge on how our review pipeline is tracking.
 
 The main stats we've been tracking have been the since the last revision
 without -1 or -2. I've included some history at [1], but the summary is
 that our 3rd quartile has slipped from 13 days to 16 days over the last 4
 weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
 from 4 a month ago) and median is unchanged around 7 days.
 
 There was lots of discussion in our last meeting about what could be
 causing this[2]. However, the thing we wanted to bring to the list for the
 discussion is:
 
 Are we tracking the right metric? Should we be looking to something else to
 tell us how well our pipeline is performing?
 
 The meeting logs have quite a few suggestions about ways we could tweak the
 existing metrics, but if we're measuring the wrong thing that's not going
 to help.
 
 I think that what we are looking for is a metric that lets us know whether
 the majority of patches are getting feedback quickly. Maybe there's some
 other metric that would give us a good indication?
 
 
 
 
 [1] Current Stats since the last revision without -1 or -2 :
 
 Average wait time: 10 days, 17 hours, 6 minutes
 1st quartile wait time: 1 days, 1 hours, 36 minutes
 Median wait time: 7 days, 5 hours, 33 minutes
 3rd quartile wait time: 16 days, 8 hours, 16 minutes
 
 At last week's meeting we had: 3rd quartile wait time: 15 days, 13 hours,
 47 minutes
 A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes
 The week before that was the mid-cycle, but the week before that:
 
 19:53:38 lifeless Stats since the last revision without -1 or -2 :
 19:53:38 lifeless Average wait time: 10 days, 17 hours, 49 minutes
 19:53:38 lifeless 1st quartile wait time: 4 days, 7 hours, 57 minutes
 19:53:38 lifeless Median wait time: 7 days, 10 hours, 52 minutes
 19:53:40 lifeless 3rd quartile wait time: 13 days, 13 hours, 25 minutes
 
 [2] Some of the things suggested as potential causes of the long 3rd median
 times:
 
 * We have small number of really old reviews that have only positive scores
 but aren't being landed
 * Some reviews get a -1 but then sit for a long time waiting for the author
 to reply

These aren't reflected in the stats above though.  Anything with a
negative vote gets kicked out of that category, and it's the one I think
we care about.

 * We have some really old reviews that suddenly get revived after a long
 period being in WIP or abandoned, which reviewstats seems to miscount
 * Reviewstats counts weekends, we don't (so a change that gets pushed at
 5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as
 having no wait time, but by reviewstats as ~36 hours)

I would also add to this list:

* Old reviews with failed CI runs that no one has bothered to
recheck/fix.  I find quite a few of these when I'm going through the
outstanding review list.  These would be addressed by the voting change
I mentioned above because they would have a negative vote then.

 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev