Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-29 Thread Joe Gordon
On Wed, Sep 17, 2014 at 8:03 AM, Matt Riedemann 
wrote:

>
>
> On 9/16/2014 1:01 PM, Joe Gordon wrote:
>
>>
>> On Sep 15, 2014 8:31 PM, "Jay Pipes" > > wrote:
>>  >
>>  > On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
>>  >>
>>  >> On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
>>  >> [...]
>>  >>>
>>  >>> Sometimes it's pretty hard to determine whether something in the
>>  >>> E-R check page is due to something in the infra scripts, some
>>  >>> transient issue in the upstream CI platform (or part of it), or
>>  >>> actually a bug in one or more of the OpenStack projects.
>>  >>
>>  >> [...]
>>  >>
>>  >> Sounds like an NP-complete problem, but if you manage to solve it
>>  >> let me know and I'll turn it into the first line of triage for Infra
>>  >> bugs. ;)
>>  >
>>  >
>>  > LOL, thanks for making me take the last hour reading Wikipedia pages
>> about computational complexity theory! :P
>>  >
>>  > No, in all seriousness, I wasn't actually asking anyone to boil the
>> ocean, mathematically. I think doing a couple things just making the
>> categorization more obvious (a UI thing, really) and doing some
>> (hopefully simple?) inspection of some control group of patches that we
>> know do not introduce any code changes themselves and comparing to
>> another group of patches that we know *do* introduce code changes to
>> Nova, and then seeing if there are a set of E-R issues that consistently
>> appear in *both* groups. That set of E-R issues has a higher likelihood
>> of not being due to Nova, right?
>>
>> We use launchpad's affected projects listings on the elastic recheck
>> page to say what may be causing the bug.  Tagging projects to bugs is a
>> manual process, but one that works pretty well.
>>
>> UI: The elastic recheck UI definitely could use some improvements. I am
>> very poor at writing UIs, so patches welcome!
>>
>>  >
>>  > OK, so perhaps it's not the most scientific or well-thought out plan,
>> but hey, it's a spark for thought... ;)
>>  >
>>  > Best,
>>  > -jay
>>  >
>>  >
>>  > ___
>>  > OpenStack-dev mailing list
>>  > OpenStack-dev@lists.openstack.org
>> 
>>  > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> I'm not great with UIs either but would a dropdown of the affected
> projects be helpful and then people can filter on their "favorite" project
> and then the page is sorted by top offenders as we have today?
>
> There are times when the top bugs are infra issues (pip timeouts for
> exapmle) so you have to scroll a ways before finding something for your
> project (nova isn't the only one).



I think that would be helpful.


>
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-17 Thread Matt Riedemann



On 9/16/2014 1:01 PM, Joe Gordon wrote:


On Sep 15, 2014 8:31 PM, "Jay Pipes" mailto:jaypi...@gmail.com>> wrote:
 >
 > On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
 >>
 >> On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
 >> [...]
 >>>
 >>> Sometimes it's pretty hard to determine whether something in the
 >>> E-R check page is due to something in the infra scripts, some
 >>> transient issue in the upstream CI platform (or part of it), or
 >>> actually a bug in one or more of the OpenStack projects.
 >>
 >> [...]
 >>
 >> Sounds like an NP-complete problem, but if you manage to solve it
 >> let me know and I'll turn it into the first line of triage for Infra
 >> bugs. ;)
 >
 >
 > LOL, thanks for making me take the last hour reading Wikipedia pages
about computational complexity theory! :P
 >
 > No, in all seriousness, I wasn't actually asking anyone to boil the
ocean, mathematically. I think doing a couple things just making the
categorization more obvious (a UI thing, really) and doing some
(hopefully simple?) inspection of some control group of patches that we
know do not introduce any code changes themselves and comparing to
another group of patches that we know *do* introduce code changes to
Nova, and then seeing if there are a set of E-R issues that consistently
appear in *both* groups. That set of E-R issues has a higher likelihood
of not being due to Nova, right?

We use launchpad's affected projects listings on the elastic recheck
page to say what may be causing the bug.  Tagging projects to bugs is a
manual process, but one that works pretty well.

UI: The elastic recheck UI definitely could use some improvements. I am
very poor at writing UIs, so patches welcome!

 >
 > OK, so perhaps it's not the most scientific or well-thought out plan,
but hey, it's a spark for thought... ;)
 >
 > Best,
 > -jay
 >
 >
 > ___
 > OpenStack-dev mailing list
 > OpenStack-dev@lists.openstack.org

 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I'm not great with UIs either but would a dropdown of the affected 
projects be helpful and then people can filter on their "favorite" 
project and then the page is sorted by top offenders as we have today?


There are times when the top bugs are infra issues (pip timeouts for 
exapmle) so you have to scroll a ways before finding something for your 
project (nova isn't the only one).


--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Joe Gordon
On Sep 15, 2014 8:31 PM, "Jay Pipes"  wrote:
>
> On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
>>
>> On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
>> [...]
>>>
>>> Sometimes it's pretty hard to determine whether something in the
>>> E-R check page is due to something in the infra scripts, some
>>> transient issue in the upstream CI platform (or part of it), or
>>> actually a bug in one or more of the OpenStack projects.
>>
>> [...]
>>
>> Sounds like an NP-complete problem, but if you manage to solve it
>> let me know and I'll turn it into the first line of triage for Infra
>> bugs. ;)
>
>
> LOL, thanks for making me take the last hour reading Wikipedia pages
about computational complexity theory! :P
>
> No, in all seriousness, I wasn't actually asking anyone to boil the
ocean, mathematically. I think doing a couple things just making the
categorization more obvious (a UI thing, really) and doing some (hopefully
simple?) inspection of some control group of patches that we know do not
introduce any code changes themselves and comparing to another group of
patches that we know *do* introduce code changes to Nova, and then seeing
if there are a set of E-R issues that consistently appear in *both* groups.
That set of E-R issues has a higher likelihood of not being due to Nova,
right?

We use launchpad's affected projects listings on the elastic recheck page
to say what may be causing the bug.  Tagging projects to bugs is a manual
process, but one that works pretty well.

UI: The elastic recheck UI definitely could use some improvements. I am
very poor at writing UIs, so patches welcome!

>
> OK, so perhaps it's not the most scientific or well-thought out plan, but
hey, it's a spark for thought... ;)
>
> Best,
> -jay
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Sean Dague
On 09/16/2014 09:39 AM, Jay Pipes wrote:
> On 09/16/2014 04:12 AM, Daniel P. Berrange wrote:
>> On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
>>> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant 
>>> wrote:
 On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> Just an observation from the last week or so...
>>
>> The biggest problem nova faces at the moment isn't code review
>> latency. Our
>> biggest problem is failing to fix our bugs so that the gate is
>> reliable.
>> The number of rechecks we've done in the last week to try and land
>> code is
>> truly startling.
>
> I consider both problems to be pretty much equally as important. I
> don't
> think solving review latency or test reliabilty in isolation is
> enough to
> save Nova. We need to tackle both problems as a priority. I tried
> to avoid
> getting into my concerns about testing in my mail on review team
> bottlenecks
> since I think we should address the problems independantly / in
> parallel.

 Agreed with this.  I don't think we can afford to ignore either one
 of them.
>>>
>>> Yes, that was my point. I don't mind us debating how to rearrange
>>> hypervisor drivers. However, if we think that will solve all our
>>> problems we are confused.
>>>
>>> So, how do we get people to start taking bugs / gate failures more
>>> seriously?
>>
>> I think we should have formal "Bug squash wednesdays"  (or pick another
>> day). By this I mean that the core reviewers will focus their attention
>> on just reviews that are related to bug fixing. They will also try to
>> work on bugs if they have time and encourage everyone else involved in
>> Nova todo the same. We'd have a team of people in the Nova IRC channel
>> to publicise & co-ordinate bug squashing, perhaps  with a list of top
>> 20 bugs we want to attack this week. I wouldn't focus just on gate bugs
>> here since many a pretty darn hard & so would put off many people. Have
>> a mix of bugs of varying difficulties to point people to. Make this a
>> regular fortnightly or even weekly event which we publicise in advance
>> on mailing lists, etc.
> 
> +1, I've suggested similar in the past.

+1 a weekly event would be great.

I've spent the bulk of the last 2 weeks in the Nova bug tracker, and
it's pretty interesting what's in there. Lots of stuff we should be
fixing. Lots of really old gorp that we should shed because it's not
helping. Also lots of inconsistencies in how triage is happening because
it's not happening regularly enough.

Plus, now that we are at 0 bugs in the New state in Nova, it's actually
kind of sane to stay on top of that, and keep our New state empty. Not
that it fixes everything, but it does prevent a bunch of gorp getting
added to the pile as probably 1/2 - 1/3 of inbound bugs... aren't.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Jay Pipes

On 09/16/2014 04:12 AM, Daniel P. Berrange wrote:

On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:

On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  wrote:

On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:

On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency. Our
biggest problem is failing to fix our bugs so that the gate is reliable.
The number of rechecks we've done in the last week to try and land code is
truly startling.


I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.


Agreed with this.  I don't think we can afford to ignore either one of them.


Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more
seriously?


I think we should have formal "Bug squash wednesdays"  (or pick another
day). By this I mean that the core reviewers will focus their attention
on just reviews that are related to bug fixing. They will also try to
work on bugs if they have time and encourage everyone else involved in
Nova todo the same. We'd have a team of people in the Nova IRC channel
to publicise & co-ordinate bug squashing, perhaps  with a list of top
20 bugs we want to attack this week. I wouldn't focus just on gate bugs
here since many a pretty darn hard & so would put off many people. Have
a mix of bugs of varying difficulties to point people to. Make this a
regular fortnightly or even weekly event which we publicise in advance
on mailing lists, etc.


+1, I've suggested similar in the past.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Jay Pipes

On 09/16/2014 04:29 AM, Michael Still wrote:

I think bug days are a good idea. We've had them sporadically in the
past, but never weekly. We stopped mostly because people stopped
showing up.

If we think we have critical mass again, or if it makes more sense to
run one during the RC period, then let's do it.

So... Who would show up for a bug day if we ran one?


I would.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Daniel P. Berrange
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote:
> I think bug days are a good idea. We've had them sporadically in the
> past, but never weekly. We stopped mostly because people stopped
> showing up.
> 
> If we think we have critical mass again, or if it makes more sense to
> run one during the RC period, then let's do it.
> 
> So... Who would show up for a bug day if we ran one?

IMHO that question is attacking this the wrong way. We should have the
nova core & PTL team lead by example, by all agreeing to actively take
part in formally scheduled bug days. Use this to set the expectations
for the rest of the community, to encourage them to join in too, and
not just rely on a handful of people to volunteer.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Sean Dague
On 09/16/2014 05:44 AM, Thierry Carrez wrote:
> Michael Still wrote:
>> Yes, that was my point. I don't mind us debating how to rearrange
>> hypervisor drivers. However, if we think that will solve all our
>> problems we are confused.
>>
>> So, how do we get people to start taking bugs / gate failures more seriously?
> 
> I think we need to build a cross-project team working on that. Having
> gate liaisons designated in every project should help bootstrap that
> team -- it doesn't mean it's a one-person-per-project job, but at least
> you have a contact person when you need an expert in some project that
> is also versed in the arts of the gate.
> 
> I also think we need to do a slightly better job at visualizing issues.
> Like Dims said, even with tabs opened to the right places, it's
> non-trivial to determine which is the killer bug from which isn't. And
> without carefully checking IRC backlog in 4 different channels, it's
> also hard to find out that a bug is already taken care of. I woke up one
> morning with gate being obviously stuck on some issue, investigated it,
> only to realize after 30 minutes that the fix was already in the gate
> queue. That's a bit of a frustrating experience.
>
> Finally, it's not completely crazy to use a specific channel
> (#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa
> and -infra channels, but those channels aren't dedicated to that
> problem, so 25% of the issues are discussed on one, 25% on the other,
> 25% on the project-specific channel, and the remaining 25% on some
> random channel the right people happen to be in. Having a clear channel
> where all the gate liaisons hang out and all issues are discussed may go
> a long way into establishing a team to work on that (rather than
> continue to rely on the same set of willing individuals).

Honestly, I'm pretty anti 'add another channel'. Especially because
there seems to be some assumption that you can address this problem
without understanding our integration environment (devstack / tempest /
d-g). This is not a problem in isolation, it's a problem about the
synthesis of all the parts. The diving on these issues is already
happening in a place, we should build on that, and not synthetically
create some 3rd place esperanto channel thinking that will fix the issue.

I've thought about the visualization problem a lot... some of the output
included the os-loganalyze and elastic-recheck projects as well as
pretty-tox in tempest to ensure we see which worker each test is running
in so you can figure out what's happening simultaneously.

Here's the root problem I ran into. What kinds of visualizations are
useful changes at a pretty good clip. These bugs are hard to find and
fix because they are typically the interaction of a bunch of moving parts.

So the tools you need to fix them are some combination of
visualizations, plus a reasonable mental model in your head of how all
of OpenStack fits together (and how components expose to operators what
they are doing). I actually think part 2 is actually the weak spot for
most folks. Knowing that glanceclient's logging is rediculous, and you
should ignore it (for instance), because it spews a ton of ERRORS for no
good reason.

Basically that's the key skill. Understanding the request flows that go
through OpenStack, understanding how to read OpenStack logs, and being
mindful that the issue might be caused by other things happening at the
same time that you are trying to do a thing (so keep an eye out for those).

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Thierry Carrez
Michael Still wrote:
> Yes, that was my point. I don't mind us debating how to rearrange
> hypervisor drivers. However, if we think that will solve all our
> problems we are confused.
> 
> So, how do we get people to start taking bugs / gate failures more seriously?

I think we need to build a cross-project team working on that. Having
gate liaisons designated in every project should help bootstrap that
team -- it doesn't mean it's a one-person-per-project job, but at least
you have a contact person when you need an expert in some project that
is also versed in the arts of the gate.

I also think we need to do a slightly better job at visualizing issues.
Like Dims said, even with tabs opened to the right places, it's
non-trivial to determine which is the killer bug from which isn't. And
without carefully checking IRC backlog in 4 different channels, it's
also hard to find out that a bug is already taken care of. I woke up one
morning with gate being obviously stuck on some issue, investigated it,
only to realize after 30 minutes that the fix was already in the gate
queue. That's a bit of a frustrating experience.

Finally, it's not completely crazy to use a specific channel
(#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa
and -infra channels, but those channels aren't dedicated to that
problem, so 25% of the issues are discussed on one, 25% on the other,
25% on the project-specific channel, and the remaining 25% on some
random channel the right people happen to be in. Having a clear channel
where all the gate liaisons hang out and all issues are discussed may go
a long way into establishing a team to work on that (rather than
continue to rely on the same set of willing individuals).

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Kashyap Chamarthy
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote:
> I think bug days are a good idea. We've had them sporadically in the
> past, but never weekly. We stopped mostly because people stopped
> showing up.
> 
> If we think we have critical mass again, or if it makes more sense to
> run one during the RC period, then let's do it.
> 
> So... Who would show up for a bug day if we ran one?

I'm not a Nova dev, but FWIW, I can spend time doing triage and root
cause analysis of areas involving virt drivers - libvirt, QEMU, KVM and
any other related areas in Nova.

PS: Next four weeks are going to be hectic for me personally due to some
travel, but I should be more active and available after that.

--
/kashyap

> 
> On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange  
> wrote:
> > On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
> >> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  
> >> wrote:
> >> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> >> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> >> >>> Just an observation from the last week or so...
> >> >>>
> >> >>> The biggest problem nova faces at the moment isn't code review 
> >> >>> latency. Our
> >> >>> biggest problem is failing to fix our bugs so that the gate is 
> >> >>> reliable.
> >> >>> The number of rechecks we've done in the last week to try and land 
> >> >>> code is
> >> >>> truly startling.
> >> >>
> >> >> I consider both problems to be pretty much equally as important. I don't
> >> >> think solving review latency or test reliabilty in isolation is enough 
> >> >> to
> >> >> save Nova. We need to tackle both problems as a priority. I tried to 
> >> >> avoid
> >> >> getting into my concerns about testing in my mail on review team 
> >> >> bottlenecks
> >> >> since I think we should address the problems independantly / in 
> >> >> parallel.
> >> >
> >> > Agreed with this.  I don't think we can afford to ignore either one of 
> >> > them.
> >>
> >> Yes, that was my point. I don't mind us debating how to rearrange
> >> hypervisor drivers. However, if we think that will solve all our
> >> problems we are confused.
> >>
> >> So, how do we get people to start taking bugs / gate failures more
> >> seriously?
> >
> > I think we should have formal "Bug squash wednesdays"  (or pick another
> > day). By this I mean that the core reviewers will focus their attention
> > on just reviews that are related to bug fixing. They will also try to
> > work on bugs if they have time and encourage everyone else involved in
> > Nova todo the same. We'd have a team of people in the Nova IRC channel
> > to publicise & co-ordinate bug squashing, perhaps  with a list of top
> > 20 bugs we want to attack this week. I wouldn't focus just on gate bugs
> > here since many a pretty darn hard & so would put off many people. Have
> > a mix of bugs of varying difficulties to point people to. Make this a
> > regular fortnightly or even weekly event which we publicise in advance
> > on mailing lists, etc.
> >
> > Regards,
> > Daniel


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Gary Kotton


On 9/16/14, 11:12 AM, "Daniel P. Berrange"  wrote:

>On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
>> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant 
>>wrote:
>> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
>> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> >>> Just an observation from the last week or so...
>> >>>
>> >>> The biggest problem nova faces at the moment isn't code review
>>latency. Our
>> >>> biggest problem is failing to fix our bugs so that the gate is
>>reliable.
>> >>> The number of rechecks we've done in the last week to try and land
>>code is
>> >>> truly startling.
>> >>
>> >> I consider both problems to be pretty much equally as important. I
>>don't
>> >> think solving review latency or test reliabilty in isolation is
>>enough to
>> >> save Nova. We need to tackle both problems as a priority. I tried to
>>avoid
>> >> getting into my concerns about testing in my mail on review team
>>bottlenecks
>> >> since I think we should address the problems independantly / in
>>parallel.
>> >
>> > Agreed with this.  I don't think we can afford to ignore either one
>>of them.
>> 
>> Yes, that was my point. I don't mind us debating how to rearrange
>> hypervisor drivers. However, if we think that will solve all our
>> problems we are confused.
>> 
>> So, how do we get people to start taking bugs / gate failures more
>> seriously?
>
>I think we should have formal "Bug squash wednesdays"  (or pick another
>day). By this I mean that the core reviewers will focus their attention
>on just reviews that are related to bug fixing. They will also try to
>work on bugs if they have time and encourage everyone else involved in
>Nova todo the same. We'd have a team of people in the Nova IRC channel
>to publicise & co-ordinate bug squashing, perhaps  with a list of top
>20 bugs we want to attack this week. I wouldn't focus just on gate bugs
>here since many a pretty darn hard & so would put off many people. Have
>a mix of bugs of varying difficulties to point people to. Make this a
>regular fortnightly or even weekly event which we publicise in advance
>on mailing lists, etc.

I am in favor of that. This is similar to what I suggested in
http://lists.openstack.org/pipermail/openstack-dev/2014-September/045440.ht
ml

Thanks
Gary
>
>Regards,
>Daniel
>-- 
>|: 
>https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/&k=oIvRg1%2
>BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
>3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A&s=23d33d
>afdd513f39cce7f5f3ab73352c456981edc8f0aa6c4861d61f1ce0528c  -o-
>https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
>errange/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfD
>tysg45MkPhCZFxPEq8%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsC
>sYn0%3D%0A&s=2d1a888a2988ac4dd3736b5e3cbd83af371bb5155b92ed769a7dd5516d7ed
>a31 :|
>|: 
>https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/&k=oIvRg1%2B
>dGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
>D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A&s=c2a7391
>a88982f704eb0c1d17acfcb531f6388d637bc72d0fa6dbd5f2ee5077e
>-o- 
>https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/&k=oIvR
>g1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP
>Eq8%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A&s=8f
>e922c5cb03f8a55c11b821bf9f4c011b6a3403db100266dba66e2e5f0c69ff :|
>|: 
>https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/&k=oIvRg1%
>2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8
>%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A&s=3eb04
>79ea977e54c5203c70c9a8043195c3addd86cb4f2d0aca9ee34deff3f9f   -o-
>
>https://urldefense.proofpoint.com/v1/url?u=http://search.cpan.org/~danberr
>/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45M
>kPhCZFxPEq8%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D
>%0A&s=a13745c2c9636ce6c906c4467ba453cd8a2011fde467959cde586abc69cc0717 :|
>|: 
>https://urldefense.proofpoint.com/v1/url?u=http://entangle-photo.org/&k=oI
>vRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZF
>xPEq8%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A&s=
>53d5c4828d9a4c7529bb8bb589b686af7eb1026f0bb7355655b32b06350c85f2
>-o-   
>https://urldefense.proofpoint.com/v1/url?u=http://live.gnome.org/gtk-vnc&k
>=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPh
>CZFxPEq8%3D%0A&m=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A
>&s=427473a2fc971ad2586cfc228d80b49c48a730603946888ed33085a30da98985 :|
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___

Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Michael Still
I think bug days are a good idea. We've had them sporadically in the
past, but never weekly. We stopped mostly because people stopped
showing up.

If we think we have critical mass again, or if it makes more sense to
run one during the RC period, then let's do it.

So... Who would show up for a bug day if we ran one?

Michael

On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange  wrote:
> On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
>> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  wrote:
>> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
>> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> >>> Just an observation from the last week or so...
>> >>>
>> >>> The biggest problem nova faces at the moment isn't code review latency. 
>> >>> Our
>> >>> biggest problem is failing to fix our bugs so that the gate is reliable.
>> >>> The number of rechecks we've done in the last week to try and land code 
>> >>> is
>> >>> truly startling.
>> >>
>> >> I consider both problems to be pretty much equally as important. I don't
>> >> think solving review latency or test reliabilty in isolation is enough to
>> >> save Nova. We need to tackle both problems as a priority. I tried to avoid
>> >> getting into my concerns about testing in my mail on review team 
>> >> bottlenecks
>> >> since I think we should address the problems independantly / in parallel.
>> >
>> > Agreed with this.  I don't think we can afford to ignore either one of 
>> > them.
>>
>> Yes, that was my point. I don't mind us debating how to rearrange
>> hypervisor drivers. However, if we think that will solve all our
>> problems we are confused.
>>
>> So, how do we get people to start taking bugs / gate failures more
>> seriously?
>
> I think we should have formal "Bug squash wednesdays"  (or pick another
> day). By this I mean that the core reviewers will focus their attention
> on just reviews that are related to bug fixing. They will also try to
> work on bugs if they have time and encourage everyone else involved in
> Nova todo the same. We'd have a team of people in the Nova IRC channel
> to publicise & co-ordinate bug squashing, perhaps  with a list of top
> 20 bugs we want to attack this week. I wouldn't focus just on gate bugs
> here since many a pretty darn hard & so would put off many people. Have
> a mix of bugs of varying difficulties to point people to. Make this a
> regular fortnightly or even weekly event which we publicise in advance
> on mailing lists, etc.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Daniel P. Berrange
On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  wrote:
> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> >>> Just an observation from the last week or so...
> >>>
> >>> The biggest problem nova faces at the moment isn't code review latency. 
> >>> Our
> >>> biggest problem is failing to fix our bugs so that the gate is reliable.
> >>> The number of rechecks we've done in the last week to try and land code is
> >>> truly startling.
> >>
> >> I consider both problems to be pretty much equally as important. I don't
> >> think solving review latency or test reliabilty in isolation is enough to
> >> save Nova. We need to tackle both problems as a priority. I tried to avoid
> >> getting into my concerns about testing in my mail on review team 
> >> bottlenecks
> >> since I think we should address the problems independantly / in parallel.
> >
> > Agreed with this.  I don't think we can afford to ignore either one of them.
> 
> Yes, that was my point. I don't mind us debating how to rearrange
> hypervisor drivers. However, if we think that will solve all our
> problems we are confused.
> 
> So, how do we get people to start taking bugs / gate failures more
> seriously?

I think we should have formal "Bug squash wednesdays"  (or pick another
day). By this I mean that the core reviewers will focus their attention
on just reviews that are related to bug fixing. They will also try to
work on bugs if they have time and encourage everyone else involved in
Nova todo the same. We'd have a team of people in the Nova IRC channel
to publicise & co-ordinate bug squashing, perhaps  with a list of top
20 bugs we want to attack this week. I wouldn't focus just on gate bugs
here since many a pretty darn hard & so would put off many people. Have
a mix of bugs of varying difficulties to point people to. Make this a
regular fortnightly or even weekly event which we publicise in advance
on mailing lists, etc.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Joshua Harlow

And if u can also prove NP = P u get 1 million dollars[1, 2]

Let me know when u got the proof,

Thanks much,

[1] http://www.claymath.org/millenium-problems/p-vs-np-problem
[2] 
http://www.claymath.org/millennium-problems/millennium-prize-problems


-Josh

On Mon, Sep 15, 2014 at 5:07 PM, Jeremy Stanley  
wrote:

On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]

 Sometimes it's pretty hard to determine whether something in the
 E-R check page is due to something in the infra scripts, some
 transient issue in the upstream CI platform (or part of it), or
 actually a bug in one or more of the OpenStack projects.

[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)
--
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Kashyap Chamarthy
On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> Just an observation from the last week or so...
> 
> The biggest problem nova faces at the moment isn't code review
> latency. Our biggest problem is failing to fix our bugs so that the
> gate is reliable.

As of writing this, from a quick look at Tracy Jones' Nova bugs page[1]:

  - Total bug Count: 877

  - Bugs with no owner: 410

  - Bugs in "undecided" state: 145

  - Bugs in progress: 276
 - Bugs ready for review (fix proposed): 201 bugs
 - In progress, but all Abandoned: 7
 - In progress, but all Merged: 2

  - Bugs not updated in last one month: 116

  - Bugs that are never updated: 74


  [1] http://54.201.139.117/nova-bugs.html

--
/kashyap

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 08:07 PM, Jeremy Stanley wrote:

On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]

Sometimes it's pretty hard to determine whether something in the
E-R check page is due to something in the infra scripts, some
transient issue in the upstream CI platform (or part of it), or
actually a bug in one or more of the OpenStack projects.

[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)


LOL, thanks for making me take the last hour reading Wikipedia pages 
about computational complexity theory! :P


No, in all seriousness, I wasn't actually asking anyone to boil the 
ocean, mathematically. I think doing a couple things just making the 
categorization more obvious (a UI thing, really) and doing some 
(hopefully simple?) inspection of some control group of patches that we 
know do not introduce any code changes themselves and comparing to 
another group of patches that we know *do* introduce code changes to 
Nova, and then seeing if there are a set of E-R issues that consistently 
appear in *both* groups. That set of E-R issues has a higher likelihood 
of not being due to Nova, right?


OK, so perhaps it's not the most scientific or well-thought out plan, 
but hey, it's a spark for thought... ;)


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jeremy Stanley
On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]
> Sometimes it's pretty hard to determine whether something in the
> E-R check page is due to something in the infra scripts, some
> transient issue in the upstream CI platform (or part of it), or
> actually a bug in one or more of the OpenStack projects.
[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Davanum Srinivas
Sean,

I have tabs opened to:
http://status.openstack.org/elastic-recheck/gate.html
http://status.openstack.org/elastic-recheck/data/uncategorized.html

and periodically catch up on openstack-qa on IRC as well, i just did
not realize this wsgi gate bug was hurting the gate this much.

So, could we somehow indicate (email? or one of the web pages above?)
where occassional helpers can watch and pitch in when needed.

thanks,
dims


On Mon, Sep 15, 2014 at 5:55 PM, Sean Dague  wrote:
> On 09/15/2014 05:52 PM, Brant Knudson wrote:
>>
>>
>> On Mon, Sep 15, 2014 at 4:30 PM, Michael Still > > wrote:
>>
>> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant > > wrote:
>> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
>> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> >>> Just an observation from the last week or so...
>> >>>
>> >>> The biggest problem nova faces at the moment isn't code review 
>> latency. Our
>> >>> biggest problem is failing to fix our bugs so that the gate is 
>> reliable.
>> >>> The number of rechecks we've done in the last week to try and land 
>> code is
>> >>> truly startling.
>> >>
>> >> I consider both problems to be pretty much equally as important. I 
>> don't
>> >> think solving review latency or test reliabilty in isolation is 
>> enough to
>> >> save Nova. We need to tackle both problems as a priority. I tried to 
>> avoid
>> >> getting into my concerns about testing in my mail on review team 
>> bottlenecks
>> >> since I think we should address the problems independantly / in 
>> parallel.
>> >
>> > Agreed with this.  I don't think we can afford to ignore either one of 
>> them.
>>
>> Yes, that was my point. I don't mind us debating how to rearrange
>> hypervisor drivers. However, if we think that will solve all our
>> problems we are confused.
>>
>> So, how do we get people to start taking bugs / gate failures more
>> seriously?
>>
>> Michael
>>
>>
>> What do you think about having an irc channel for working through gate
>> bugs? I've always found looking at gate failures frustrating because I
>> seem to be expected to work through these by myself, and maybe
>> somebody's already looking at it or has more information that I don't
>> know about. There have been times already where a gate bug that could
>> have left everything broken for a while wound up fixed pretty quickly
>> because we were able to find the right person hanging out in irc.
>> Sometimes all it takes is for someone with the right knowledge to be
>> there. A hypothetical exchange:
>>
>> rechecker: I got this error where the tempest-foo test failed ... http://...
>> tempest-expert: That test calls the compute-bar nova API
>> nova-expert: That API calls the network-baz neutron API
>> neutron-expert: When you call that API you need to also call this other
>> API to poll for it to be done... is nova doing that?
>> nova-expert: Nope. Fix on the way.
>
> Honestly, the #openstack-qa channel is a completely appropriate place
> for that. Plus it already has a lot of the tempest experts.
> Realistically anyone that works on these kinds of fixes tend to be there.
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: https://twitter.com/dims

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 05:30 PM, Michael Still wrote:

On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  wrote:

On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:

On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency. Our
biggest problem is failing to fix our bugs so that the gate is reliable.
The number of rechecks we've done in the last week to try and land code is
truly startling.


I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.


Agreed with this.  I don't think we can afford to ignore either one of them.


Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more seriously?


A few suggestions:

1) Bug bounties

Money talks. I know it sounds silly, but lots of developers get paid to 
work on features. Not as many have financial incentive to fix bugs.


It doesn't need to be a huge amount. And I think the "wall of fame 
respect" reward for top bug fixers or gate unblockers would be a good 
incentive as well.


The foundation has a budget. I can't think of a better way to effect 
positive change than allocating $10-20K to paying bug bounties.


2) Videos discussing gate tools and diagnostics techniques

I hope I'm not bursting any of Sean Dague's bubble, but one thing we've 
been discussing, together with Dan Smith, is having a weekly or 
bi-weekly Youtube show where we discuss Nova development topics, with 
deep dives into common but hairy parts of the Nova codebase. The idea is 
to grow Nova contributors' knowledge of more parts of Nova than just one 
particular area they might be paid to work on.


I think a weekly or bi-weekly show that focuses on bug and gate issues 
would be a really great idea, and I'd be happy to play a role in this. 
The Chef+OpenStack community does weekly Youtube recordings of their 
status meetings and AFAICT, it's pretty successful.


3) Provide a clearer way to understand what is a gate/CI/infra issue and 
what is a project bug


Sometimes it's pretty hard to determine whether something in the E-R 
check page is due to something in the infra scripts, some transient 
issue in the upstream CI platform (or part of it), or actually a bug in 
one or more of the OpenStack projects.


Perhaps there is a way to identify/categorize gate failures (in the form 
of E-R recheck queries) on some "meta status" page, that would either be 
populated manually or through some clever analysis to better direct 
would-be gate block fixers to where they need to focus?


Anyway, just a few ideas,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Sean Dague
On 09/15/2014 05:52 PM, Brant Knudson wrote:
> 
> 
> On Mon, Sep 15, 2014 at 4:30 PM, Michael Still  > wrote:
> 
> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  > wrote:
> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> >>> Just an observation from the last week or so...
> >>>
> >>> The biggest problem nova faces at the moment isn't code review 
> latency. Our
> >>> biggest problem is failing to fix our bugs so that the gate is 
> reliable.
> >>> The number of rechecks we've done in the last week to try and land 
> code is
> >>> truly startling.
> >>
> >> I consider both problems to be pretty much equally as important. I 
> don't
> >> think solving review latency or test reliabilty in isolation is enough 
> to
> >> save Nova. We need to tackle both problems as a priority. I tried to 
> avoid
> >> getting into my concerns about testing in my mail on review team 
> bottlenecks
> >> since I think we should address the problems independantly / in 
> parallel.
> >
> > Agreed with this.  I don't think we can afford to ignore either one of 
> them.
> 
> Yes, that was my point. I don't mind us debating how to rearrange
> hypervisor drivers. However, if we think that will solve all our
> problems we are confused.
> 
> So, how do we get people to start taking bugs / gate failures more
> seriously?
> 
> Michael
> 
> 
> What do you think about having an irc channel for working through gate
> bugs? I've always found looking at gate failures frustrating because I
> seem to be expected to work through these by myself, and maybe
> somebody's already looking at it or has more information that I don't
> know about. There have been times already where a gate bug that could
> have left everything broken for a while wound up fixed pretty quickly
> because we were able to find the right person hanging out in irc.
> Sometimes all it takes is for someone with the right knowledge to be
> there. A hypothetical exchange:
> 
> rechecker: I got this error where the tempest-foo test failed ... http://...
> tempest-expert: That test calls the compute-bar nova API
> nova-expert: That API calls the network-baz neutron API
> neutron-expert: When you call that API you need to also call this other
> API to poll for it to be done... is nova doing that?
> nova-expert: Nope. Fix on the way.

Honestly, the #openstack-qa channel is a completely appropriate place
for that. Plus it already has a lot of the tempest experts.
Realistically anyone that works on these kinds of fixes tend to be there.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Brant Knudson
On Mon, Sep 15, 2014 at 4:30 PM, Michael Still  wrote:

> On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant 
> wrote:
> > On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> >> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> >>> Just an observation from the last week or so...
> >>>
> >>> The biggest problem nova faces at the moment isn't code review
> latency. Our
> >>> biggest problem is failing to fix our bugs so that the gate is
> reliable.
> >>> The number of rechecks we've done in the last week to try and land
> code is
> >>> truly startling.
> >>
> >> I consider both problems to be pretty much equally as important. I don't
> >> think solving review latency or test reliabilty in isolation is enough
> to
> >> save Nova. We need to tackle both problems as a priority. I tried to
> avoid
> >> getting into my concerns about testing in my mail on review team
> bottlenecks
> >> since I think we should address the problems independantly / in
> parallel.
> >
> > Agreed with this.  I don't think we can afford to ignore either one of
> them.
>
> Yes, that was my point. I don't mind us debating how to rearrange
> hypervisor drivers. However, if we think that will solve all our
> problems we are confused.
>
> So, how do we get people to start taking bugs / gate failures more
> seriously?
>
> Michael
>
>
What do you think about having an irc channel for working through gate
bugs? I've always found looking at gate failures frustrating because I seem
to be expected to work through these by myself, and maybe somebody's
already looking at it or has more information that I don't know about.
There have been times already where a gate bug that could have left
everything broken for a while wound up fixed pretty quickly because we were
able to find the right person hanging out in irc. Sometimes all it takes is
for someone with the right knowledge to be there. A hypothetical exchange:

rechecker: I got this error where the tempest-foo test failed ... http://...
tempest-expert: That test calls the compute-bar nova API
nova-expert: That API calls the network-baz neutron API
neutron-expert: When you call that API you need to also call this other API
to poll for it to be done... is nova doing that?
nova-expert: Nope. Fix on the way.

- Brant
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Michael Still
On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant  wrote:
> On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
>> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>>> Just an observation from the last week or so...
>>>
>>> The biggest problem nova faces at the moment isn't code review latency. Our
>>> biggest problem is failing to fix our bugs so that the gate is reliable.
>>> The number of rechecks we've done in the last week to try and land code is
>>> truly startling.
>>
>> I consider both problems to be pretty much equally as important. I don't
>> think solving review latency or test reliabilty in isolation is enough to
>> save Nova. We need to tackle both problems as a priority. I tried to avoid
>> getting into my concerns about testing in my mail on review team bottlenecks
>> since I think we should address the problems independantly / in parallel.
>
> Agreed with this.  I don't think we can afford to ignore either one of them.

Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more seriously?

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 04:01 AM, Nikola Đipanov wrote:

On 09/13/2014 11:07 PM, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency.
Our biggest problem is failing to fix our bugs so that the gate is
reliable. The number of rechecks we've done in the last week to try and
land code is truly startling.



This is exactly what I was saying in my ranty email from 2 weeks ago
[1]. Debt is everywhere and as any debt, it is unlikely going away on
it's own.



I know that some people are focused by their employers on feature work,
but those features aren't going to land in a world in which we have to
hand walk everything through the gate.



The thing is that - without doing work on the code - you cannot know
where the real issues are. You cannot look at a codebase as big as Nova
and say, "hmmm looks like we need to fix the resource tracker". You can
know that only if you are neck-deep in the stuff. And then you need to
agree on what is really bad and what is just distasteful, and then focus
the efforts on that. None of the things we've put in place (specs, the
way we do and organize code review and bugs) acknowledge or help this
part of the development process.

I tried to explain this in my previous ranty email [1] but I guess I
failed due to ranting :) so let me try again: "Nova team needs to act as
a development team".

We are not in a place (yet?) where we can just overlook the addition of
features based on weather they are appropriate for our use case. We have
to work together on a set of important things to get Nova to where we
think it needs to be and make sure we get it done - by actually doing
it! (*)

However - I don't think freezing development of features for a cycle is
a viable option - this is just not how software in the real world gets
done. It will likely be the worst possible thing we can do, no matter
how appealing it seems to us as developers.

But we do need to be extremely strict on what we let in, and under which
conditions! As I mentioned to sdague on IRC the other day (yes, I am
quoting myself :) ): "Not all features are the same" - there are
features that are better, that are coded better, and are integrated
better - we should be wanting those features always! Then there are
features that are a net negative on the code - we should *never* want
those features. And then there are features in the middle - we may want
to cut those or push them back depending on a number of things that are
important. Things like: code quality, can it fit withing the current
constraints, can we let it in like that, or some work needs to happen
first. Things which we haven't been really good at considering
previously IMHO.

But you can't really judge that unless you are actively developing Nova
yourself, and have a tighter grip on the proposed code than what our
current process gives.

Peace!
N.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html

(*) The only effort like this going on at the moment in Nova is the
Objects work done by dansmith (even thought there are several others
proposed) - I will let the readers judge how much of an impact it was in
only 2 short cycles, from just a single effort.


+1 Well said, Nikola.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Russell Bryant
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> Just an observation from the last week or so...
>>
>> The biggest problem nova faces at the moment isn't code review latency. Our
>> biggest problem is failing to fix our bugs so that the gate is reliable.
>> The number of rechecks we've done in the last week to try and land code is
>> truly startling.
> 
> I consider both problems to be pretty much equally as important. I don't
> think solving review latency or test reliabilty in isolation is enough to
> save Nova. We need to tackle both problems as a priority. I tried to avoid
> getting into my concerns about testing in my mail on review team bottlenecks
> since I think we should address the problems independantly / in parallel.

Agreed with this.  I don't think we can afford to ignore either one of them.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Sean Dague
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> Just an observation from the last week or so...
>>
>> The biggest problem nova faces at the moment isn't code review latency. Our
>> biggest problem is failing to fix our bugs so that the gate is reliable.
>> The number of rechecks we've done in the last week to try and land code is
>> truly startling.
> 
> I consider both problems to be pretty much equally as important. I don't
> think solving review latency or test reliabilty in isolation is enough to
> save Nova. We need to tackle both problems as a priority. I tried to avoid
> getting into my concerns about testing in my mail on review team bottlenecks
> since I think we should address the problems independantly / in parallel.
> 
>> I know that some people are focused by their employers on feature work, but
>> those features aren't going to land in a world in which we have to hand
>> walk everything through the gate.
> 
> Unfortunately the reliability of the gate systems has the highest negative
> impact on productivity right at the point in the dev cycle where we need
> it to have the least impact too.
> 
> If we're going to continue to raise the bar in terms of testing coverage
> then we need to have a serious look at the overall approach we use for
> testing because what we do today isn't going to scale, even if it is
> 100% reliable. We can't keep adding new CI jobs for each new nova.conf
> setting that introduces a new code path, because each job has major
> implications for resource consumption (number of test nodes, log storage),
> not to mention reliability. I think we need to figure out a way to get
> more targetted testing of features, so we can keep the overall number
> of jobs lower and the tests shorter.
> 
> Instead of having a single tempest run that exercises all the Nova
> functionality in one run, we need to figure out how to split it up
> into independant functional areas. For example if we could isolate
> tests which are affected by choice of cinder storage backend, then
> we could run those subset of tests multiple times, once for each
> supported cinder backend. Without this, the combinatorial explosion
> of test jobs is going to kill us.

One of the top issues killing Nova patches last week was a unit test
race (the wsgi worker one). There is no one to blame but Nova for that.
Jay was really the only team member digging into it.

I don't disagree on the disaggregation problem, however as lots of Nova
devs are ignoring unit test fails at this point, unless that changes no
other disaggregation is going make anything better.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Michael Still
On Mon, Sep 15, 2014 at 7:42 PM, Daniel P. Berrange  wrote:

> Unfortunately the reliability of the gate systems has the highest negative
> impact on productivity right at the point in the dev cycle where we need
> it to have the least impact too.

Agreed.

However, my instinct is that a lot of our CI unreliability isn't from
the number of permutations, but from buggy code. We have our users
telling us where to look to fix this in the form of many many bug
reports. I find it hard to believe that we couldn't improve our gate
reliability by taking fixing the bugs we currently have reported more
seriously.

Michael



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Daniel P. Berrange
On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> Just an observation from the last week or so...
> 
> The biggest problem nova faces at the moment isn't code review latency. Our
> biggest problem is failing to fix our bugs so that the gate is reliable.
> The number of rechecks we've done in the last week to try and land code is
> truly startling.

I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.

> I know that some people are focused by their employers on feature work, but
> those features aren't going to land in a world in which we have to hand
> walk everything through the gate.

Unfortunately the reliability of the gate systems has the highest negative
impact on productivity right at the point in the dev cycle where we need
it to have the least impact too.

If we're going to continue to raise the bar in terms of testing coverage
then we need to have a serious look at the overall approach we use for
testing because what we do today isn't going to scale, even if it is
100% reliable. We can't keep adding new CI jobs for each new nova.conf
setting that introduces a new code path, because each job has major
implications for resource consumption (number of test nodes, log storage),
not to mention reliability. I think we need to figure out a way to get
more targetted testing of features, so we can keep the overall number
of jobs lower and the tests shorter.

Instead of having a single tempest run that exercises all the Nova
functionality in one run, we need to figure out how to split it up
into independant functional areas. For example if we could isolate
tests which are affected by choice of cinder storage backend, then
we could run those subset of tests multiple times, once for each
supported cinder backend. Without this, the combinatorial explosion
of test jobs is going to kill us.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Nikola Đipanov
On 09/14/2014 12:27 AM, Boris Pavlovic wrote:
> Michael, 
> 
> I am so glad that you started this topic.
> I really like idea of  of taking a pause with features and concentrating
> on improvement of current code base. 
> 
> Even if the >1 k open bugs https://bugs.launchpad.net/nova are vital
> issue, there are other things that could be addressed to improve Nova
> team throughput. 
> 
> Like it was said in another thread: "Nova code is current too big and
> complex to be understand by one person".
> It produces 2 issues: 
> A) There is hard to find person who can observer full project and make
> global architecture decisions including work on cross projects interactions
> (So project doesn't have straight direction of development)
> B) It's really hard to find cores, and current cores are under too heavy
> load (because of project complexity)
> 
> I believe that whole current Nova functionality can be implemented in
> much simpler manner.

Just a brief comment on the sentence above.

This is a common thing to hear from coders, and is very rarely rooted in
reality IMHO. Nova does _a lot_ of things. Saying that given an
exhaustive list of features it has, we can implement them in a much
simpler manner is completely disregarding all the complexity of building
software that works within real world constraints.

> Basically, complexity was added during the process of adding a lot of
> features for years, that didn't perfectly fit to architecture of Nova. 
> And there wasn't much work on refactoring the architecture to cleanup
> these features. 
> 

I agree with this of course - fixing architectural flaws is important
and needs to be an ongoing part of the process, as I mention in my other
mail to the thread. Halting all other development is not the way to do
it though.

N.

> So maybe it's proper time to think about "what", "why" and "how" we are
> doing. 
> That will allows us to find simpler solutions for current functionality. 
> 
> 
> Best regards,
> Boris Pavlovic 
> 
> 
> On Sun, Sep 14, 2014 at 1:07 AM, Michael Still  > wrote:
> 
> Just an observation from the last week or so...
> 
> The biggest problem nova faces at the moment isn't code review
> latency. Our biggest problem is failing to fix our bugs so that the
> gate is reliable. The number of rechecks we've done in the last week
> to try and land code is truly startling.
> 
> I know that some people are focused by their employers on feature
> work, but those features aren't going to land in a world in which we
> have to hand walk everything through the gate.
> 
> Michael
> 
> 
> -- 
> Rackspace Australia
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Nikola Đipanov
On 09/13/2014 11:07 PM, Michael Still wrote:
> Just an observation from the last week or so...
> 
> The biggest problem nova faces at the moment isn't code review latency.
> Our biggest problem is failing to fix our bugs so that the gate is
> reliable. The number of rechecks we've done in the last week to try and
> land code is truly startling.
> 

This is exactly what I was saying in my ranty email from 2 weeks ago
[1]. Debt is everywhere and as any debt, it is unlikely going away on
it's own.


> I know that some people are focused by their employers on feature work,
> but those features aren't going to land in a world in which we have to
> hand walk everything through the gate.
> 

The thing is that - without doing work on the code - you cannot know
where the real issues are. You cannot look at a codebase as big as Nova
and say, "hmmm looks like we need to fix the resource tracker". You can
know that only if you are neck-deep in the stuff. And then you need to
agree on what is really bad and what is just distasteful, and then focus
the efforts on that. None of the things we've put in place (specs, the
way we do and organize code review and bugs) acknowledge or help this
part of the development process.

I tried to explain this in my previous ranty email [1] but I guess I
failed due to ranting :) so let me try again: "Nova team needs to act as
a development team".

We are not in a place (yet?) where we can just overlook the addition of
features based on weather they are appropriate for our use case. We have
to work together on a set of important things to get Nova to where we
think it needs to be and make sure we get it done - by actually doing
it! (*)

However - I don't think freezing development of features for a cycle is
a viable option - this is just not how software in the real world gets
done. It will likely be the worst possible thing we can do, no matter
how appealing it seems to us as developers.

But we do need to be extremely strict on what we let in, and under which
conditions! As I mentioned to sdague on IRC the other day (yes, I am
quoting myself :) ): "Not all features are the same" - there are
features that are better, that are coded better, and are integrated
better - we should be wanting those features always! Then there are
features that are a net negative on the code - we should *never* want
those features. And then there are features in the middle - we may want
to cut those or push them back depending on a number of things that are
important. Things like: code quality, can it fit withing the current
constraints, can we let it in like that, or some work needs to happen
first. Things which we haven't been really good at considering
previously IMHO.

But you can't really judge that unless you are actively developing Nova
yourself, and have a tighter grip on the proposed code than what our
current process gives.

Peace!
N.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html

(*) The only effort like this going on at the moment in Nova is the
Objects work done by dansmith (even thought there are several others
proposed) - I will let the readers judge how much of an impact it was in
only 2 short cycles, from just a single effort.

> Michael
> 
> 
> -- 
> Rackspace Australia
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-13 Thread Boris Pavlovic
Michael,

I am so glad that you started this topic.
I really like idea of  of taking a pause with features and concentrating on
improvement of current code base.

Even if the >1 k open bugs https://bugs.launchpad.net/nova are vital issue,
there are other things that could be addressed to improve Nova team
throughput.

Like it was said in another thread: "Nova code is current too big and
complex to be understand by one person".
It produces 2 issues:
A) There is hard to find person who can observer full project and make
global architecture decisions including work on cross projects interactions
(So project doesn't have straight direction of development)
B) It's really hard to find cores, and current cores are under too heavy
load (because of project complexity)

I believe that whole current Nova functionality can be implemented in much
simpler manner.
Basically, complexity was added during the process of adding a lot of
features for years, that didn't perfectly fit to architecture of Nova.
And there wasn't much work on refactoring the architecture to cleanup these
features.

So maybe it's proper time to think about "what", "why" and "how" we are
doing.
That will allows us to find simpler solutions for current functionality.


Best regards,
Boris Pavlovic


On Sun, Sep 14, 2014 at 1:07 AM, Michael Still  wrote:

> Just an observation from the last week or so...
>
> The biggest problem nova faces at the moment isn't code review latency.
> Our biggest problem is failing to fix our bugs so that the gate is
> reliable. The number of rechecks we've done in the last week to try and
> land code is truly startling.
>
> I know that some people are focused by their employers on feature work,
> but those features aren't going to land in a world in which we have to hand
> walk everything through the gate.
>
> Michael
>
>
> --
> Rackspace Australia
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev