Re: [openstack-dev] [Nova] What's holding nova development back?
On Wed, Sep 17, 2014 at 8:03 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/16/2014 1:01 PM, Joe Gordon wrote: On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com mailto:jaypi...@gmail.com wrote: On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? We use launchpad's affected projects listings on the elastic recheck page to say what may be causing the bug. Tagging projects to bugs is a manual process, but one that works pretty well. UI: The elastic recheck UI definitely could use some improvements. I am very poor at writing UIs, so patches welcome! OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'm not great with UIs either but would a dropdown of the affected projects be helpful and then people can filter on their favorite project and then the page is sorted by top offenders as we have today? There are times when the top bugs are infra issues (pip timeouts for exapmle) so you have to scroll a ways before finding something for your project (nova isn't the only one). I think that would be helpful. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 9/16/2014 1:01 PM, Joe Gordon wrote: On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com mailto:jaypi...@gmail.com wrote: On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? We use launchpad's affected projects listings on the elastic recheck page to say what may be causing the bug. Tagging projects to bugs is a manual process, but one that works pretty well. UI: The elastic recheck UI definitely could use some improvements. I am very poor at writing UIs, so patches welcome! OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'm not great with UIs either but would a dropdown of the affected projects be helpful and then people can filter on their favorite project and then the page is sorted by top offenders as we have today? There are times when the top bugs are infra issues (pip timeouts for exapmle) so you have to scroll a ways before finding something for your project (nova isn't the only one). -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
And if u can also prove NP = P u get 1 million dollars[1, 2] Let me know when u got the proof, Thanks much, [1] http://www.claymath.org/millenium-problems/p-vs-np-problem [2] http://www.claymath.org/millennium-problems/millennium-prize-problems -Josh On Mon, Sep 15, 2014 at 5:07 PM, Jeremy Stanley fu...@yuggoth.org wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
I think bug days are a good idea. We've had them sporadically in the past, but never weekly. We stopped mostly because people stopped showing up. If we think we have critical mass again, or if it makes more sense to run one during the RC period, then let's do it. So... Who would show up for a bug day if we ran one? Michael On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 9/16/14, 11:12 AM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. I am in favor of that. This is similar to what I suggested in http://lists.openstack.org/pipermail/openstack-dev/2014-September/045440.ht ml Thanks Gary Regards, Daniel -- |: https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2 BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8% 3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=23d33d afdd513f39cce7f5f3ab73352c456981edc8f0aa6c4861d61f1ce0528c -o- https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD tysg45MkPhCZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsC sYn0%3D%0As=2d1a888a2988ac4dd3736b5e3cbd83af371bb5155b92ed769a7dd5516d7ed a31 :| |: https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3 D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=c2a7391 a88982f704eb0c1d17acfcb531f6388d637bc72d0fa6dbd5f2ee5077e -o- https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP Eq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=8f e922c5cb03f8a55c11b821bf9f4c011b6a3403db100266dba66e2e5f0c69ff :| |: https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/k=oIvRg1% 2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8 %3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=3eb04 79ea977e54c5203c70c9a8043195c3addd86cb4f2d0aca9ee34deff3f9f -o- https://urldefense.proofpoint.com/v1/url?u=http://search.cpan.org/~danberr /k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45M kPhCZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D %0As=a13745c2c9636ce6c906c4467ba453cd8a2011fde467959cde586abc69cc0717 :| |: https://urldefense.proofpoint.com/v1/url?u=http://entangle-photo.org/k=oI vRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZF xPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As= 53d5c4828d9a4c7529bb8bb589b686af7eb1026f0bb7355655b32b06350c85f2 -o- https://urldefense.proofpoint.com/v1/url?u=http://live.gnome.org/gtk-vnck =oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPh CZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A s=427473a2fc971ad2586cfc228d80b49c48a730603946888ed33085a30da98985 :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote: I think bug days are a good idea. We've had them sporadically in the past, but never weekly. We stopped mostly because people stopped showing up. If we think we have critical mass again, or if it makes more sense to run one during the RC period, then let's do it. So... Who would show up for a bug day if we ran one? I'm not a Nova dev, but FWIW, I can spend time doing triage and root cause analysis of areas involving virt drivers - libvirt, QEMU, KVM and any other related areas in Nova. PS: Next four weeks are going to be hectic for me personally due to some travel, but I should be more active and available after that. -- /kashyap On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. Regards, Daniel ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
Michael Still wrote: Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we need to build a cross-project team working on that. Having gate liaisons designated in every project should help bootstrap that team -- it doesn't mean it's a one-person-per-project job, but at least you have a contact person when you need an expert in some project that is also versed in the arts of the gate. I also think we need to do a slightly better job at visualizing issues. Like Dims said, even with tabs opened to the right places, it's non-trivial to determine which is the killer bug from which isn't. And without carefully checking IRC backlog in 4 different channels, it's also hard to find out that a bug is already taken care of. I woke up one morning with gate being obviously stuck on some issue, investigated it, only to realize after 30 minutes that the fix was already in the gate queue. That's a bit of a frustrating experience. Finally, it's not completely crazy to use a specific channel (#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa and -infra channels, but those channels aren't dedicated to that problem, so 25% of the issues are discussed on one, 25% on the other, 25% on the project-specific channel, and the remaining 25% on some random channel the right people happen to be in. Having a clear channel where all the gate liaisons hang out and all issues are discussed may go a long way into establishing a team to work on that (rather than continue to rely on the same set of willing individuals). -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/16/2014 05:44 AM, Thierry Carrez wrote: Michael Still wrote: Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we need to build a cross-project team working on that. Having gate liaisons designated in every project should help bootstrap that team -- it doesn't mean it's a one-person-per-project job, but at least you have a contact person when you need an expert in some project that is also versed in the arts of the gate. I also think we need to do a slightly better job at visualizing issues. Like Dims said, even with tabs opened to the right places, it's non-trivial to determine which is the killer bug from which isn't. And without carefully checking IRC backlog in 4 different channels, it's also hard to find out that a bug is already taken care of. I woke up one morning with gate being obviously stuck on some issue, investigated it, only to realize after 30 minutes that the fix was already in the gate queue. That's a bit of a frustrating experience. Finally, it's not completely crazy to use a specific channel (#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa and -infra channels, but those channels aren't dedicated to that problem, so 25% of the issues are discussed on one, 25% on the other, 25% on the project-specific channel, and the remaining 25% on some random channel the right people happen to be in. Having a clear channel where all the gate liaisons hang out and all issues are discussed may go a long way into establishing a team to work on that (rather than continue to rely on the same set of willing individuals). Honestly, I'm pretty anti 'add another channel'. Especially because there seems to be some assumption that you can address this problem without understanding our integration environment (devstack / tempest / d-g). This is not a problem in isolation, it's a problem about the synthesis of all the parts. The diving on these issues is already happening in a place, we should build on that, and not synthetically create some 3rd place esperanto channel thinking that will fix the issue. I've thought about the visualization problem a lot... some of the output included the os-loganalyze and elastic-recheck projects as well as pretty-tox in tempest to ensure we see which worker each test is running in so you can figure out what's happening simultaneously. Here's the root problem I ran into. What kinds of visualizations are useful changes at a pretty good clip. These bugs are hard to find and fix because they are typically the interaction of a bunch of moving parts. So the tools you need to fix them are some combination of visualizations, plus a reasonable mental model in your head of how all of OpenStack fits together (and how components expose to operators what they are doing). I actually think part 2 is actually the weak spot for most folks. Knowing that glanceclient's logging is rediculous, and you should ignore it (for instance), because it spews a ton of ERRORS for no good reason. Basically that's the key skill. Understanding the request flows that go through OpenStack, understanding how to read OpenStack logs, and being mindful that the issue might be caused by other things happening at the same time that you are trying to do a thing (so keep an eye out for those). -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote: I think bug days are a good idea. We've had them sporadically in the past, but never weekly. We stopped mostly because people stopped showing up. If we think we have critical mass again, or if it makes more sense to run one during the RC period, then let's do it. So... Who would show up for a bug day if we ran one? IMHO that question is attacking this the wrong way. We should have the nova core PTL team lead by example, by all agreeing to actively take part in formally scheduled bug days. Use this to set the expectations for the rest of the community, to encourage them to join in too, and not just rely on a handful of people to volunteer. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/16/2014 04:29 AM, Michael Still wrote: I think bug days are a good idea. We've had them sporadically in the past, but never weekly. We stopped mostly because people stopped showing up. If we think we have critical mass again, or if it makes more sense to run one during the RC period, then let's do it. So... Who would show up for a bug day if we ran one? I would. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/16/2014 04:12 AM, Daniel P. Berrange wrote: On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. +1, I've suggested similar in the past. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/16/2014 09:39 AM, Jay Pipes wrote: On 09/16/2014 04:12 AM, Daniel P. Berrange wrote: On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? I think we should have formal Bug squash wednesdays (or pick another day). By this I mean that the core reviewers will focus their attention on just reviews that are related to bug fixing. They will also try to work on bugs if they have time and encourage everyone else involved in Nova todo the same. We'd have a team of people in the Nova IRC channel to publicise co-ordinate bug squashing, perhaps with a list of top 20 bugs we want to attack this week. I wouldn't focus just on gate bugs here since many a pretty darn hard so would put off many people. Have a mix of bugs of varying difficulties to point people to. Make this a regular fortnightly or even weekly event which we publicise in advance on mailing lists, etc. +1, I've suggested similar in the past. +1 a weekly event would be great. I've spent the bulk of the last 2 weeks in the Nova bug tracker, and it's pretty interesting what's in there. Lots of stuff we should be fixing. Lots of really old gorp that we should shed because it's not helping. Also lots of inconsistencies in how triage is happening because it's not happening regularly enough. Plus, now that we are at 0 bugs in the New state in Nova, it's actually kind of sane to stay on top of that, and keep our New state empty. Not that it fixes everything, but it does prevent a bunch of gorp getting added to the pile as probably 1/2 - 1/3 of inbound bugs... aren't. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com wrote: On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? We use launchpad's affected projects listings on the elastic recheck page to say what may be causing the bug. Tagging projects to bugs is a manual process, but one that works pretty well. UI: The elastic recheck UI definitely could use some improvements. I am very poor at writing UIs, so patches welcome! OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/13/2014 11:07 PM, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. This is exactly what I was saying in my ranty email from 2 weeks ago [1]. Debt is everywhere and as any debt, it is unlikely going away on it's own. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. The thing is that - without doing work on the code - you cannot know where the real issues are. You cannot look at a codebase as big as Nova and say, hmmm looks like we need to fix the resource tracker. You can know that only if you are neck-deep in the stuff. And then you need to agree on what is really bad and what is just distasteful, and then focus the efforts on that. None of the things we've put in place (specs, the way we do and organize code review and bugs) acknowledge or help this part of the development process. I tried to explain this in my previous ranty email [1] but I guess I failed due to ranting :) so let me try again: Nova team needs to act as a development team. We are not in a place (yet?) where we can just overlook the addition of features based on weather they are appropriate for our use case. We have to work together on a set of important things to get Nova to where we think it needs to be and make sure we get it done - by actually doing it! (*) However - I don't think freezing development of features for a cycle is a viable option - this is just not how software in the real world gets done. It will likely be the worst possible thing we can do, no matter how appealing it seems to us as developers. But we do need to be extremely strict on what we let in, and under which conditions! As I mentioned to sdague on IRC the other day (yes, I am quoting myself :) ): Not all features are the same - there are features that are better, that are coded better, and are integrated better - we should be wanting those features always! Then there are features that are a net negative on the code - we should *never* want those features. And then there are features in the middle - we may want to cut those or push them back depending on a number of things that are important. Things like: code quality, can it fit withing the current constraints, can we let it in like that, or some work needs to happen first. Things which we haven't been really good at considering previously IMHO. But you can't really judge that unless you are actively developing Nova yourself, and have a tighter grip on the proposed code than what our current process gives. Peace! N. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html (*) The only effort like this going on at the moment in Nova is the Objects work done by dansmith (even thought there are several others proposed) - I will let the readers judge how much of an impact it was in only 2 short cycles, from just a single effort. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/14/2014 12:27 AM, Boris Pavlovic wrote: Michael, I am so glad that you started this topic. I really like idea of of taking a pause with features and concentrating on improvement of current code base. Even if the 1 k open bugs https://bugs.launchpad.net/nova are vital issue, there are other things that could be addressed to improve Nova team throughput. Like it was said in another thread: Nova code is current too big and complex to be understand by one person. It produces 2 issues: A) There is hard to find person who can observer full project and make global architecture decisions including work on cross projects interactions (So project doesn't have straight direction of development) B) It's really hard to find cores, and current cores are under too heavy load (because of project complexity) I believe that whole current Nova functionality can be implemented in much simpler manner. Just a brief comment on the sentence above. This is a common thing to hear from coders, and is very rarely rooted in reality IMHO. Nova does _a lot_ of things. Saying that given an exhaustive list of features it has, we can implement them in a much simpler manner is completely disregarding all the complexity of building software that works within real world constraints. Basically, complexity was added during the process of adding a lot of features for years, that didn't perfectly fit to architecture of Nova. And there wasn't much work on refactoring the architecture to cleanup these features. I agree with this of course - fixing architectural flaws is important and needs to be an ongoing part of the process, as I mention in my other mail to the thread. Halting all other development is not the way to do it though. N. So maybe it's proper time to think about what, why and how we are doing. That will allows us to find simpler solutions for current functionality. Best regards, Boris Pavlovic On Sun, Sep 14, 2014 at 1:07 AM, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. Unfortunately the reliability of the gate systems has the highest negative impact on productivity right at the point in the dev cycle where we need it to have the least impact too. If we're going to continue to raise the bar in terms of testing coverage then we need to have a serious look at the overall approach we use for testing because what we do today isn't going to scale, even if it is 100% reliable. We can't keep adding new CI jobs for each new nova.conf setting that introduces a new code path, because each job has major implications for resource consumption (number of test nodes, log storage), not to mention reliability. I think we need to figure out a way to get more targetted testing of features, so we can keep the overall number of jobs lower and the tests shorter. Instead of having a single tempest run that exercises all the Nova functionality in one run, we need to figure out how to split it up into independant functional areas. For example if we could isolate tests which are affected by choice of cinder storage backend, then we could run those subset of tests multiple times, once for each supported cinder backend. Without this, the combinatorial explosion of test jobs is going to kill us. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Mon, Sep 15, 2014 at 7:42 PM, Daniel P. Berrange berra...@redhat.com wrote: Unfortunately the reliability of the gate systems has the highest negative impact on productivity right at the point in the dev cycle where we need it to have the least impact too. Agreed. However, my instinct is that a lot of our CI unreliability isn't from the number of permutations, but from buggy code. We have our users telling us where to look to fix this in the form of many many bug reports. I find it hard to believe that we couldn't improve our gate reliability by taking fixing the bugs we currently have reported more seriously. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. Unfortunately the reliability of the gate systems has the highest negative impact on productivity right at the point in the dev cycle where we need it to have the least impact too. If we're going to continue to raise the bar in terms of testing coverage then we need to have a serious look at the overall approach we use for testing because what we do today isn't going to scale, even if it is 100% reliable. We can't keep adding new CI jobs for each new nova.conf setting that introduces a new code path, because each job has major implications for resource consumption (number of test nodes, log storage), not to mention reliability. I think we need to figure out a way to get more targetted testing of features, so we can keep the overall number of jobs lower and the tests shorter. Instead of having a single tempest run that exercises all the Nova functionality in one run, we need to figure out how to split it up into independant functional areas. For example if we could isolate tests which are affected by choice of cinder storage backend, then we could run those subset of tests multiple times, once for each supported cinder backend. Without this, the combinatorial explosion of test jobs is going to kill us. One of the top issues killing Nova patches last week was a unit test race (the wsgi worker one). There is no one to blame but Nova for that. Jay was really the only team member digging into it. I don't disagree on the disaggregation problem, however as lots of Nova devs are ignoring unit test fails at this point, unless that changes no other disaggregation is going make anything better. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 04:01 AM, Nikola Đipanov wrote: On 09/13/2014 11:07 PM, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. This is exactly what I was saying in my ranty email from 2 weeks ago [1]. Debt is everywhere and as any debt, it is unlikely going away on it's own. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. The thing is that - without doing work on the code - you cannot know where the real issues are. You cannot look at a codebase as big as Nova and say, hmmm looks like we need to fix the resource tracker. You can know that only if you are neck-deep in the stuff. And then you need to agree on what is really bad and what is just distasteful, and then focus the efforts on that. None of the things we've put in place (specs, the way we do and organize code review and bugs) acknowledge or help this part of the development process. I tried to explain this in my previous ranty email [1] but I guess I failed due to ranting :) so let me try again: Nova team needs to act as a development team. We are not in a place (yet?) where we can just overlook the addition of features based on weather they are appropriate for our use case. We have to work together on a set of important things to get Nova to where we think it needs to be and make sure we get it done - by actually doing it! (*) However - I don't think freezing development of features for a cycle is a viable option - this is just not how software in the real world gets done. It will likely be the worst possible thing we can do, no matter how appealing it seems to us as developers. But we do need to be extremely strict on what we let in, and under which conditions! As I mentioned to sdague on IRC the other day (yes, I am quoting myself :) ): Not all features are the same - there are features that are better, that are coded better, and are integrated better - we should be wanting those features always! Then there are features that are a net negative on the code - we should *never* want those features. And then there are features in the middle - we may want to cut those or push them back depending on a number of things that are important. Things like: code quality, can it fit withing the current constraints, can we let it in like that, or some work needs to happen first. Things which we haven't been really good at considering previously IMHO. But you can't really judge that unless you are actively developing Nova yourself, and have a tighter grip on the proposed code than what our current process gives. Peace! N. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html (*) The only effort like this going on at the moment in Nova is the Objects work done by dansmith (even thought there are several others proposed) - I will let the readers judge how much of an impact it was in only 2 short cycles, from just a single effort. +1 Well said, Nikola. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? Michael What do you think about having an irc channel for working through gate bugs? I've always found looking at gate failures frustrating because I seem to be expected to work through these by myself, and maybe somebody's already looking at it or has more information that I don't know about. There have been times already where a gate bug that could have left everything broken for a while wound up fixed pretty quickly because we were able to find the right person hanging out in irc. Sometimes all it takes is for someone with the right knowledge to be there. A hypothetical exchange: rechecker: I got this error where the tempest-foo test failed ... http://... tempest-expert: That test calls the compute-bar nova API nova-expert: That API calls the network-baz neutron API neutron-expert: When you call that API you need to also call this other API to poll for it to be done... is nova doing that? nova-expert: Nope. Fix on the way. - Brant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 05:52 PM, Brant Knudson wrote: On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? Michael What do you think about having an irc channel for working through gate bugs? I've always found looking at gate failures frustrating because I seem to be expected to work through these by myself, and maybe somebody's already looking at it or has more information that I don't know about. There have been times already where a gate bug that could have left everything broken for a while wound up fixed pretty quickly because we were able to find the right person hanging out in irc. Sometimes all it takes is for someone with the right knowledge to be there. A hypothetical exchange: rechecker: I got this error where the tempest-foo test failed ... http://... tempest-expert: That test calls the compute-bar nova API nova-expert: That API calls the network-baz neutron API neutron-expert: When you call that API you need to also call this other API to poll for it to be done... is nova doing that? nova-expert: Nope. Fix on the way. Honestly, the #openstack-qa channel is a completely appropriate place for that. Plus it already has a lot of the tempest experts. Realistically anyone that works on these kinds of fixes tend to be there. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 05:30 PM, Michael Still wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? A few suggestions: 1) Bug bounties Money talks. I know it sounds silly, but lots of developers get paid to work on features. Not as many have financial incentive to fix bugs. It doesn't need to be a huge amount. And I think the wall of fame respect reward for top bug fixers or gate unblockers would be a good incentive as well. The foundation has a budget. I can't think of a better way to effect positive change than allocating $10-20K to paying bug bounties. 2) Videos discussing gate tools and diagnostics techniques I hope I'm not bursting any of Sean Dague's bubble, but one thing we've been discussing, together with Dan Smith, is having a weekly or bi-weekly Youtube show where we discuss Nova development topics, with deep dives into common but hairy parts of the Nova codebase. The idea is to grow Nova contributors' knowledge of more parts of Nova than just one particular area they might be paid to work on. I think a weekly or bi-weekly show that focuses on bug and gate issues would be a really great idea, and I'd be happy to play a role in this. The Chef+OpenStack community does weekly Youtube recordings of their status meetings and AFAICT, it's pretty successful. 3) Provide a clearer way to understand what is a gate/CI/infra issue and what is a project bug Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. Perhaps there is a way to identify/categorize gate failures (in the form of E-R recheck queries) on some meta status page, that would either be populated manually or through some clever analysis to better direct would-be gate block fixers to where they need to focus? Anyway, just a few ideas, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
Sean, I have tabs opened to: http://status.openstack.org/elastic-recheck/gate.html http://status.openstack.org/elastic-recheck/data/uncategorized.html and periodically catch up on openstack-qa on IRC as well, i just did not realize this wsgi gate bug was hurting the gate this much. So, could we somehow indicate (email? or one of the web pages above?) where occassional helpers can watch and pitch in when needed. thanks, dims On Mon, Sep 15, 2014 at 5:55 PM, Sean Dague s...@dague.net wrote: On 09/15/2014 05:52 PM, Brant Knudson wrote: On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com wrote: On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com wrote: On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I consider both problems to be pretty much equally as important. I don't think solving review latency or test reliabilty in isolation is enough to save Nova. We need to tackle both problems as a priority. I tried to avoid getting into my concerns about testing in my mail on review team bottlenecks since I think we should address the problems independantly / in parallel. Agreed with this. I don't think we can afford to ignore either one of them. Yes, that was my point. I don't mind us debating how to rearrange hypervisor drivers. However, if we think that will solve all our problems we are confused. So, how do we get people to start taking bugs / gate failures more seriously? Michael What do you think about having an irc channel for working through gate bugs? I've always found looking at gate failures frustrating because I seem to be expected to work through these by myself, and maybe somebody's already looking at it or has more information that I don't know about. There have been times already where a gate bug that could have left everything broken for a while wound up fixed pretty quickly because we were able to find the right person hanging out in irc. Sometimes all it takes is for someone with the right knowledge to be there. A hypothetical exchange: rechecker: I got this error where the tempest-foo test failed ... http://... tempest-expert: That test calls the compute-bar nova API nova-expert: That API calls the network-baz neutron API neutron-expert: When you call that API you need to also call this other API to poll for it to be done... is nova doing that? nova-expert: Nope. Fix on the way. Honestly, the #openstack-qa channel is a completely appropriate place for that. Plus it already has a lot of the tempest experts. Realistically anyone that works on these kinds of fixes tend to be there. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: https://twitter.com/dims ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] What's holding nova development back?
Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
Michael, I am so glad that you started this topic. I really like idea of of taking a pause with features and concentrating on improvement of current code base. Even if the 1 k open bugs https://bugs.launchpad.net/nova are vital issue, there are other things that could be addressed to improve Nova team throughput. Like it was said in another thread: Nova code is current too big and complex to be understand by one person. It produces 2 issues: A) There is hard to find person who can observer full project and make global architecture decisions including work on cross projects interactions (So project doesn't have straight direction of development) B) It's really hard to find cores, and current cores are under too heavy load (because of project complexity) I believe that whole current Nova functionality can be implemented in much simpler manner. Basically, complexity was added during the process of adding a lot of features for years, that didn't perfectly fit to architecture of Nova. And there wasn't much work on refactoring the architecture to cleanup these features. So maybe it's proper time to think about what, why and how we are doing. That will allows us to find simpler solutions for current functionality. Best regards, Boris Pavlovic On Sun, Sep 14, 2014 at 1:07 AM, Michael Still mi...@stillhq.com wrote: Just an observation from the last week or so... The biggest problem nova faces at the moment isn't code review latency. Our biggest problem is failing to fix our bugs so that the gate is reliable. The number of rechecks we've done in the last week to try and land code is truly startling. I know that some people are focused by their employers on feature work, but those features aren't going to land in a world in which we have to hand walk everything through the gate. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev