On 09/19/2014 08:13 AM, Sean Dague wrote: > I've spent the better part of the last 2 weeks in the Nova bug tracker > to try to turn it into something that doesn't cause people to run away > screaming. I don't remember exactly where we started at open bug count 2 > weeks ago (it was north of 1400, with > 200 bugs in new, but it might > have been north of 1600), but as of this email we're at < 1000 open bugs > (I'm counting Fix Committed as closed, even though LP does not), and ~0 > new bugs (depending on the time of the day). > > == Philosophy in Triaging == > > I'm going to lay out the philosophy of triaging I've had, because this > may also set the tone going forward. > > A bug tracker is a tool to help us make a better release. It does not > exist for it's own good, it exists to help. Which means when evaluating > what stays in and what leaves we need to evaluate if any particular > artifact will help us make a better release. But also more importantly > realize that there is a cost for carrying every artifact in the tracker. > Resolving duplicates gets non linearly harder as the number of artifacts > go up. Triaging gets non-linearly hard as the number of artifacts go up. > > With this I was being somewhat pragmatic about closing bugs. An old bug > that is just a stacktrace is typically not useful. An old bug that is a > vague sentence that we should refactor a particular module (with no > specifics on the details) is not useful. A bug reported against a very > old version of OpenStack where the code has changed a lot in the > relevant area, and there aren't responses from the author, is not > useful. Not useful bugs just add debt, and we should get rid of them. > That makes the chance of pulling a random bug off the tracker something > that you could actually look at fixing, instead of mostly just stalling out. > > So I closed a lot of stuff as Invalid / Opinion that fell into those camps. > > == Keeping New Bugs at close to 0 == > > After driving the bugs in the New state down to zero last week, I found > it's actually pretty easy to keep it at 0. > > We get 10 - 20 new bugs a day in Nova (during a weekday). Of those ~20% > aren't actually a bug, and can be closed immediately. ~30% look like a > bug, but don't have anywhere near enough information in them, and > flipping them to incomplete with questions quickly means we have a real > chance of getting the right info. ~10% are fixable in < 30 minutes worth > of work. And the rest are real bugs, that seem to have enough to dive > into it, and can be triaged into Confirmed, set a priority, and add the > appropriate tags for the area. > > But, more importantly, this means we can filter bug quality on the way > in. And we can also encourage bug reporters that are giving us good > stuff, or even easy stuff, as we respond quickly. > > Recommendation #1: we adopt a 0 new bugs policy to keep this from > getting away from us in the future.
We have this policy in TripleO, and to help keep it fresh in people's minds Roman Podolyaka (IIRC) wrote an untriaged-bot for the IRC channel that periodically posts a list of any New bugs. I've found it very helpful, so it's probably worth getting that into infra somewhere so other people can use it too. > > == Our worse bug reporters are often core reviewers == > > I'm going to pick on Dan Prince here, mostly because I have a recent > concrete example, however in triaging the bug queue much of the core > team is to blame (including myself). > > https://bugs.launchpad.net/nova/+bug/1368773 is a terrible bug. Also, it > was set incomplete and no response. I'm almost 100% sure it's a dupe of > the multiprocess bug we've been tracking down but it's so terse that you > can't get to the bottom of it. > > There were a ton of 2012 nova bugs that were basically "post it notes". > Oh, "we should refactor this function". Full stop. While those are fine > for personal tracking, their value goes to zero probably 3 months after > they are files, especially if the reporter stops working on the issue at > hand. Nova has plenty of "wouldn't it be great if we... " ideas. I'm not > convinced using bugs for those is useful unless we go and close them out > aggressively if they stall. > > Also, if Nova core can't file a good bug, it's hard to set the example > for others in our community. > > Recommendation #2: hey, Nova core, lets be better about filing the kinds > of bugs we want to see! mkay! > > Recommendation #3: Let's create a tag for "personal work items" or > something for these class of TODOs people are leaving themselves that > make them a ton easier to cull later when they stall and no one else has > enough context to pick them up. > > == Tags == > > The aggressive tagging that Tracy brought into the project has been > awesome. It definitely helps slice out into better functional areas. > Here is the top of our current official tag list (and bug count): > > 95 compute > 83 libvirt > 74 api > 68 vmware > 67 network > 41 db > 40 testing > 40 volumes > 36 ec2 > 35 icehouse-backport-potential > 32 low-hanging-fruit > 31 xenserver > 25 ironic > 23 hyper-v > 16 cells > 14 scheduler > 12 baremetal > 9 ceph > 9 security > 8 oslo > ... > > So, good stuff. However I think we probably want to take a further step > and attempt to get champions for tags. So that tag owners would ensure > their bug list looks sane, and actually spend some time fixing them. > It's pretty clear, for instance, that the ec2 bugs are just piling up, > and very few fixes coming in. Cells seems like it's in the same camp (a > bunch of recent bugs have been cells related, it looks like a lot more > deployments are trying it). > > Probably the most important thing in tag owners would be cleaning up the > bugs in the tag. Realizing that 2 bugs were actually the same bug. > Cleaning up descriptions / titles / etc so that people can move forward > on them. > > Recommendation #4: create tag champions > > == Soft Spots == > > After looking at probably close to 1000 bugs in 2 weeks I have a > particular impression of soft spots that we have. > > Quotas are kind of a mess. It's not clear that we're even eventually > consistent. There are a lot of bugs about creating servers, deleteing > servers, and leaking quota in the process. I know Jay and Sylvan are > diving hard on the resource tracker right now, I think this should be a > Kilo focus area because it creates terrible confusion and bugs for people. > > EC2 has definitely regressed, especially after block device mapping > changes, to the point that it's not clear it's functional outside of the > most basic server create commands. The EC2 code is largely unchanged > since 2012, and only lightly tested, we need to decide if this is > important or not, and either fix it or delete it. There have been many > past hands going up that said they would help, and then they never do > (you known who you are). > > The VM State machine model is .... Well it's at least suboptimal, but > it's also clear that it's massively leaky, and the way we handle it > internally means we end up in inconsistent wedges all the time. I expect > the complexity here causes a ton of bugs. We need some refactoring to > make things a ton more clear about what's supposed to be happening, and > how to rollback when they go wrong. I think the Tasks work was headed > down that path, but that seems stalled now. > > Cross interaction with Neutron and Cinder remains racey. We are pretty > optimistic on when resources will be available. Even the event interface > with Neutron hasn't fully addressed this. I think a really great Design > Summit session would be Nova + Neutron + Cinder to figure out a shared > architecture to address this. I'd expect this to be at least a double > session. > > Recommendation #5 - 8: we should get on those things :) > > == Triaging Inconsistencies == > > I found some inconsistencies in how people were triaging bugs, and the > state inconsistencies probably don't help with making the bugs seem > confusing: https://wiki.openstack.org/wiki/BugTriage provides some > guideance. > > Importantly: > > Incomplete is an Open state. For bugzilla folks this is NEEDSINFO. I saw > a bunch of 'closing' comments but a move to Incomplete. > > Triaged should be used if the solution to fix the bug is in the bug > itself. Triaged is Confirmed + Solution at enough details to fix it. > > Incomplete bugs should not have assignees or milestones, otherwise it > won't time out. > > == General Cleanup Rules == > > Here are some general cleanup rules that I was using: > > If an Incomplete bug has no response after 30 days it's fair game to > close (Invalid, Opinion, Won't Fix). > > If a bug is In Progress with no patch posted after 30 days, it is not In > Progress. Remove assignee, move back to last state (probably confirmed). > Move to Opinion if it's really a "post it note". > > If a bug is In Progress but the patches were abandoned, it's no longer > In Progress. Remove assignee, move back to last state (probably > confirmed). Move to Opinion if it's really a "post it note". > > == Rescuing Stalled Fixes == > > Over the course of this I found a bunch of the In Progress bugs were > real issues, with real fixes, that had stalled out for one of a number > of reasons. Often it had a -1 'needs unit tests' on it, and it's sort of > clear the author didn't really know how to do that for this patch. Other > times the author's first language was not english, and the patch commit > message was confusing enough that no one understood what it was fixing. > (One of these bugs I restored, rewrote the commit message, and then it > sailed through the process.) > > Recommendation #9: if you are going to -1 for unit tests, please go the > extra step of saying 'I think you should write a test that does X, Y, Z'. > > Recommendation #10: We need to find a better balance in rewriting commit > messages. Maybe we should just make it socially acceptable to rewrite > the commit message as part of review. > > .... > > I'm sure there are other thoughts, but my brain is running out of steam. > These were the things that popped to the top of my head. It's definitely > been really interesting to spend this much time with the tracker to > build a bigger picture of this feedback channel we have from our users. > Hopefully other folks found some of this handy. > > -Sean > _______________________________________________ OpenStack-dev mailing list OpenStackfirstname.lastname@example.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev