Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-29 Thread Joe Gordon
On Wed, Sep 17, 2014 at 8:03 AM, Matt Riedemann mrie...@linux.vnet.ibm.com
wrote:



 On 9/16/2014 1:01 PM, Joe Gordon wrote:


 On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com
 mailto:jaypi...@gmail.com wrote:
  
   On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
  
   On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
   [...]
  
   Sometimes it's pretty hard to determine whether something in the
   E-R check page is due to something in the infra scripts, some
   transient issue in the upstream CI platform (or part of it), or
   actually a bug in one or more of the OpenStack projects.
  
   [...]
  
   Sounds like an NP-complete problem, but if you manage to solve it
   let me know and I'll turn it into the first line of triage for Infra
   bugs. ;)
  
  
   LOL, thanks for making me take the last hour reading Wikipedia pages
 about computational complexity theory! :P
  
   No, in all seriousness, I wasn't actually asking anyone to boil the
 ocean, mathematically. I think doing a couple things just making the
 categorization more obvious (a UI thing, really) and doing some
 (hopefully simple?) inspection of some control group of patches that we
 know do not introduce any code changes themselves and comparing to
 another group of patches that we know *do* introduce code changes to
 Nova, and then seeing if there are a set of E-R issues that consistently
 appear in *both* groups. That set of E-R issues has a higher likelihood
 of not being due to Nova, right?

 We use launchpad's affected projects listings on the elastic recheck
 page to say what may be causing the bug.  Tagging projects to bugs is a
 manual process, but one that works pretty well.

 UI: The elastic recheck UI definitely could use some improvements. I am
 very poor at writing UIs, so patches welcome!

  
   OK, so perhaps it's not the most scientific or well-thought out plan,
 but hey, it's a spark for thought... ;)
  
   Best,
   -jay
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 I'm not great with UIs either but would a dropdown of the affected
 projects be helpful and then people can filter on their favorite project
 and then the page is sorted by top offenders as we have today?

 There are times when the top bugs are infra issues (pip timeouts for
 exapmle) so you have to scroll a ways before finding something for your
 project (nova isn't the only one).



I think that would be helpful.




 --

 Thanks,

 Matt Riedemann



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-17 Thread Matt Riedemann



On 9/16/2014 1:01 PM, Joe Gordon wrote:


On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com
mailto:jaypi...@gmail.com wrote:
 
  On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
 
  On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
  [...]
 
  Sometimes it's pretty hard to determine whether something in the
  E-R check page is due to something in the infra scripts, some
  transient issue in the upstream CI platform (or part of it), or
  actually a bug in one or more of the OpenStack projects.
 
  [...]
 
  Sounds like an NP-complete problem, but if you manage to solve it
  let me know and I'll turn it into the first line of triage for Infra
  bugs. ;)
 
 
  LOL, thanks for making me take the last hour reading Wikipedia pages
about computational complexity theory! :P
 
  No, in all seriousness, I wasn't actually asking anyone to boil the
ocean, mathematically. I think doing a couple things just making the
categorization more obvious (a UI thing, really) and doing some
(hopefully simple?) inspection of some control group of patches that we
know do not introduce any code changes themselves and comparing to
another group of patches that we know *do* introduce code changes to
Nova, and then seeing if there are a set of E-R issues that consistently
appear in *both* groups. That set of E-R issues has a higher likelihood
of not being due to Nova, right?

We use launchpad's affected projects listings on the elastic recheck
page to say what may be causing the bug.  Tagging projects to bugs is a
manual process, but one that works pretty well.

UI: The elastic recheck UI definitely could use some improvements. I am
very poor at writing UIs, so patches welcome!

 
  OK, so perhaps it's not the most scientific or well-thought out plan,
but hey, it's a spark for thought... ;)
 
  Best,
  -jay
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
mailto:OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I'm not great with UIs either but would a dropdown of the affected 
projects be helpful and then people can filter on their favorite 
project and then the page is sorted by top offenders as we have today?


There are times when the top bugs are infra issues (pip timeouts for 
exapmle) so you have to scroll a ways before finding something for your 
project (nova isn't the only one).


--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Joshua Harlow

And if u can also prove NP = P u get 1 million dollars[1, 2]

Let me know when u got the proof,

Thanks much,

[1] http://www.claymath.org/millenium-problems/p-vs-np-problem
[2] 
http://www.claymath.org/millennium-problems/millennium-prize-problems


-Josh

On Mon, Sep 15, 2014 at 5:07 PM, Jeremy Stanley fu...@yuggoth.org 
wrote:

On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]

 Sometimes it's pretty hard to determine whether something in the
 E-R check page is due to something in the infra scripts, some
 transient issue in the upstream CI platform (or part of it), or
 actually a bug in one or more of the OpenStack projects.

[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)
--
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Daniel P. Berrange
On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review latency. 
  Our
  biggest problem is failing to fix our bugs so that the gate is reliable.
  The number of rechecks we've done in the last week to try and land code is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I don't
  think solving review latency or test reliabilty in isolation is enough to
  save Nova. We need to tackle both problems as a priority. I tried to avoid
  getting into my concerns about testing in my mail on review team 
  bottlenecks
  since I think we should address the problems independantly / in parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one of them.
 
 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.
 
 So, how do we get people to start taking bugs / gate failures more
 seriously?

I think we should have formal Bug squash wednesdays  (or pick another
day). By this I mean that the core reviewers will focus their attention
on just reviews that are related to bug fixing. They will also try to
work on bugs if they have time and encourage everyone else involved in
Nova todo the same. We'd have a team of people in the Nova IRC channel
to publicise  co-ordinate bug squashing, perhaps  with a list of top
20 bugs we want to attack this week. I wouldn't focus just on gate bugs
here since many a pretty darn hard  so would put off many people. Have
a mix of bugs of varying difficulties to point people to. Make this a
regular fortnightly or even weekly event which we publicise in advance
on mailing lists, etc.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Michael Still
I think bug days are a good idea. We've had them sporadically in the
past, but never weekly. We stopped mostly because people stopped
showing up.

If we think we have critical mass again, or if it makes more sense to
run one during the RC period, then let's do it.

So... Who would show up for a bug day if we ran one?

Michael

On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review latency. 
  Our
  biggest problem is failing to fix our bugs so that the gate is reliable.
  The number of rechecks we've done in the last week to try and land code 
  is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I don't
  think solving review latency or test reliabilty in isolation is enough to
  save Nova. We need to tackle both problems as a priority. I tried to avoid
  getting into my concerns about testing in my mail on review team 
  bottlenecks
  since I think we should address the problems independantly / in parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one of 
  them.

 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.

 So, how do we get people to start taking bugs / gate failures more
 seriously?

 I think we should have formal Bug squash wednesdays  (or pick another
 day). By this I mean that the core reviewers will focus their attention
 on just reviews that are related to bug fixing. They will also try to
 work on bugs if they have time and encourage everyone else involved in
 Nova todo the same. We'd have a team of people in the Nova IRC channel
 to publicise  co-ordinate bug squashing, perhaps  with a list of top
 20 bugs we want to attack this week. I wouldn't focus just on gate bugs
 here since many a pretty darn hard  so would put off many people. Have
 a mix of bugs of varying difficulties to point people to. Make this a
 regular fortnightly or even weekly event which we publicise in advance
 on mailing lists, etc.

 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Gary Kotton


On 9/16/14, 11:12 AM, Daniel P. Berrange berra...@redhat.com wrote:

On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com
wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review
latency. Our
  biggest problem is failing to fix our bugs so that the gate is
reliable.
  The number of rechecks we've done in the last week to try and land
code is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I
don't
  think solving review latency or test reliabilty in isolation is
enough to
  save Nova. We need to tackle both problems as a priority. I tried to
avoid
  getting into my concerns about testing in my mail on review team
bottlenecks
  since I think we should address the problems independantly / in
parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one
of them.
 
 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.
 
 So, how do we get people to start taking bugs / gate failures more
 seriously?

I think we should have formal Bug squash wednesdays  (or pick another
day). By this I mean that the core reviewers will focus their attention
on just reviews that are related to bug fixing. They will also try to
work on bugs if they have time and encourage everyone else involved in
Nova todo the same. We'd have a team of people in the Nova IRC channel
to publicise  co-ordinate bug squashing, perhaps  with a list of top
20 bugs we want to attack this week. I wouldn't focus just on gate bugs
here since many a pretty darn hard  so would put off many people. Have
a mix of bugs of varying difficulties to point people to. Make this a
regular fortnightly or even weekly event which we publicise in advance
on mailing lists, etc.

I am in favor of that. This is similar to what I suggested in
http://lists.openstack.org/pipermail/openstack-dev/2014-September/045440.ht
ml

Thanks
Gary

Regards,
Daniel
-- 
|: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=23d33d
afdd513f39cce7f5f3ab73352c456981edc8f0aa6c4861d61f1ce0528c  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
tysg45MkPhCZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsC
sYn0%3D%0As=2d1a888a2988ac4dd3736b5e3cbd83af371bb5155b92ed769a7dd5516d7ed
a31 :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B
dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=c2a7391
a88982f704eb0c1d17acfcb531f6388d637bc72d0fa6dbd5f2ee5077e
-o- 
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR
g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP
Eq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=8f
e922c5cb03f8a55c11b821bf9f4c011b6a3403db100266dba66e2e5f0c69ff :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/k=oIvRg1%
2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8
%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=3eb04
79ea977e54c5203c70c9a8043195c3addd86cb4f2d0aca9ee34deff3f9f   -o-

https://urldefense.proofpoint.com/v1/url?u=http://search.cpan.org/~danberr
/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45M
kPhCZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D
%0As=a13745c2c9636ce6c906c4467ba453cd8a2011fde467959cde586abc69cc0717 :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://entangle-photo.org/k=oI
vRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZF
xPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0As=
53d5c4828d9a4c7529bb8bb589b686af7eb1026f0bb7355655b32b06350c85f2
-o-   
https://urldefense.proofpoint.com/v1/url?u=http://live.gnome.org/gtk-vnck
=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPh
CZFxPEq8%3D%0Am=%2BBB%2BI2z%2F47JBRJ4B1mOkFOq0SW%2F4bOrVdaRdWsCsYn0%3D%0A
s=427473a2fc971ad2586cfc228d80b49c48a730603946888ed33085a30da98985 :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Kashyap Chamarthy
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote:
 I think bug days are a good idea. We've had them sporadically in the
 past, but never weekly. We stopped mostly because people stopped
 showing up.
 
 If we think we have critical mass again, or if it makes more sense to
 run one during the RC period, then let's do it.
 
 So... Who would show up for a bug day if we ran one?

I'm not a Nova dev, but FWIW, I can spend time doing triage and root
cause analysis of areas involving virt drivers - libvirt, QEMU, KVM and
any other related areas in Nova.

PS: Next four weeks are going to be hectic for me personally due to some
travel, but I should be more active and available after that.

--
/kashyap

 
 On Tue, Sep 16, 2014 at 6:12 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
  On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com 
  wrote:
   On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
   On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
   Just an observation from the last week or so...
  
   The biggest problem nova faces at the moment isn't code review 
   latency. Our
   biggest problem is failing to fix our bugs so that the gate is 
   reliable.
   The number of rechecks we've done in the last week to try and land 
   code is
   truly startling.
  
   I consider both problems to be pretty much equally as important. I don't
   think solving review latency or test reliabilty in isolation is enough 
   to
   save Nova. We need to tackle both problems as a priority. I tried to 
   avoid
   getting into my concerns about testing in my mail on review team 
   bottlenecks
   since I think we should address the problems independantly / in 
   parallel.
  
   Agreed with this.  I don't think we can afford to ignore either one of 
   them.
 
  Yes, that was my point. I don't mind us debating how to rearrange
  hypervisor drivers. However, if we think that will solve all our
  problems we are confused.
 
  So, how do we get people to start taking bugs / gate failures more
  seriously?
 
  I think we should have formal Bug squash wednesdays  (or pick another
  day). By this I mean that the core reviewers will focus their attention
  on just reviews that are related to bug fixing. They will also try to
  work on bugs if they have time and encourage everyone else involved in
  Nova todo the same. We'd have a team of people in the Nova IRC channel
  to publicise  co-ordinate bug squashing, perhaps  with a list of top
  20 bugs we want to attack this week. I wouldn't focus just on gate bugs
  here since many a pretty darn hard  so would put off many people. Have
  a mix of bugs of varying difficulties to point people to. Make this a
  regular fortnightly or even weekly event which we publicise in advance
  on mailing lists, etc.
 
  Regards,
  Daniel


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Thierry Carrez
Michael Still wrote:
 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.
 
 So, how do we get people to start taking bugs / gate failures more seriously?

I think we need to build a cross-project team working on that. Having
gate liaisons designated in every project should help bootstrap that
team -- it doesn't mean it's a one-person-per-project job, but at least
you have a contact person when you need an expert in some project that
is also versed in the arts of the gate.

I also think we need to do a slightly better job at visualizing issues.
Like Dims said, even with tabs opened to the right places, it's
non-trivial to determine which is the killer bug from which isn't. And
without carefully checking IRC backlog in 4 different channels, it's
also hard to find out that a bug is already taken care of. I woke up one
morning with gate being obviously stuck on some issue, investigated it,
only to realize after 30 minutes that the fix was already in the gate
queue. That's a bit of a frustrating experience.

Finally, it's not completely crazy to use a specific channel
(#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa
and -infra channels, but those channels aren't dedicated to that
problem, so 25% of the issues are discussed on one, 25% on the other,
25% on the project-specific channel, and the remaining 25% on some
random channel the right people happen to be in. Having a clear channel
where all the gate liaisons hang out and all issues are discussed may go
a long way into establishing a team to work on that (rather than
continue to rely on the same set of willing individuals).

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Sean Dague
On 09/16/2014 05:44 AM, Thierry Carrez wrote:
 Michael Still wrote:
 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.

 So, how do we get people to start taking bugs / gate failures more seriously?
 
 I think we need to build a cross-project team working on that. Having
 gate liaisons designated in every project should help bootstrap that
 team -- it doesn't mean it's a one-person-per-project job, but at least
 you have a contact person when you need an expert in some project that
 is also versed in the arts of the gate.
 
 I also think we need to do a slightly better job at visualizing issues.
 Like Dims said, even with tabs opened to the right places, it's
 non-trivial to determine which is the killer bug from which isn't. And
 without carefully checking IRC backlog in 4 different channels, it's
 also hard to find out that a bug is already taken care of. I woke up one
 morning with gate being obviously stuck on some issue, investigated it,
 only to realize after 30 minutes that the fix was already in the gate
 queue. That's a bit of a frustrating experience.

 Finally, it's not completely crazy to use a specific channel
 (#openstack-gate ?) for that. Yes, there is a lot of overlap with -qa
 and -infra channels, but those channels aren't dedicated to that
 problem, so 25% of the issues are discussed on one, 25% on the other,
 25% on the project-specific channel, and the remaining 25% on some
 random channel the right people happen to be in. Having a clear channel
 where all the gate liaisons hang out and all issues are discussed may go
 a long way into establishing a team to work on that (rather than
 continue to rely on the same set of willing individuals).

Honestly, I'm pretty anti 'add another channel'. Especially because
there seems to be some assumption that you can address this problem
without understanding our integration environment (devstack / tempest /
d-g). This is not a problem in isolation, it's a problem about the
synthesis of all the parts. The diving on these issues is already
happening in a place, we should build on that, and not synthetically
create some 3rd place esperanto channel thinking that will fix the issue.

I've thought about the visualization problem a lot... some of the output
included the os-loganalyze and elastic-recheck projects as well as
pretty-tox in tempest to ensure we see which worker each test is running
in so you can figure out what's happening simultaneously.

Here's the root problem I ran into. What kinds of visualizations are
useful changes at a pretty good clip. These bugs are hard to find and
fix because they are typically the interaction of a bunch of moving parts.

So the tools you need to fix them are some combination of
visualizations, plus a reasonable mental model in your head of how all
of OpenStack fits together (and how components expose to operators what
they are doing). I actually think part 2 is actually the weak spot for
most folks. Knowing that glanceclient's logging is rediculous, and you
should ignore it (for instance), because it spews a ton of ERRORS for no
good reason.

Basically that's the key skill. Understanding the request flows that go
through OpenStack, understanding how to read OpenStack logs, and being
mindful that the issue might be caused by other things happening at the
same time that you are trying to do a thing (so keep an eye out for those).

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Daniel P. Berrange
On Tue, Sep 16, 2014 at 06:29:53PM +1000, Michael Still wrote:
 I think bug days are a good idea. We've had them sporadically in the
 past, but never weekly. We stopped mostly because people stopped
 showing up.
 
 If we think we have critical mass again, or if it makes more sense to
 run one during the RC period, then let's do it.
 
 So... Who would show up for a bug day if we ran one?

IMHO that question is attacking this the wrong way. We should have the
nova core  PTL team lead by example, by all agreeing to actively take
part in formally scheduled bug days. Use this to set the expectations
for the rest of the community, to encourage them to join in too, and
not just rely on a handful of people to volunteer.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Jay Pipes

On 09/16/2014 04:29 AM, Michael Still wrote:

I think bug days are a good idea. We've had them sporadically in the
past, but never weekly. We stopped mostly because people stopped
showing up.

If we think we have critical mass again, or if it makes more sense to
run one during the RC period, then let's do it.

So... Who would show up for a bug day if we ran one?


I would.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Jay Pipes

On 09/16/2014 04:12 AM, Daniel P. Berrange wrote:

On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:

On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote:

On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:

On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency. Our
biggest problem is failing to fix our bugs so that the gate is reliable.
The number of rechecks we've done in the last week to try and land code is
truly startling.


I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.


Agreed with this.  I don't think we can afford to ignore either one of them.


Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more
seriously?


I think we should have formal Bug squash wednesdays  (or pick another
day). By this I mean that the core reviewers will focus their attention
on just reviews that are related to bug fixing. They will also try to
work on bugs if they have time and encourage everyone else involved in
Nova todo the same. We'd have a team of people in the Nova IRC channel
to publicise  co-ordinate bug squashing, perhaps  with a list of top
20 bugs we want to attack this week. I wouldn't focus just on gate bugs
here since many a pretty darn hard  so would put off many people. Have
a mix of bugs of varying difficulties to point people to. Make this a
regular fortnightly or even weekly event which we publicise in advance
on mailing lists, etc.


+1, I've suggested similar in the past.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Sean Dague
On 09/16/2014 09:39 AM, Jay Pipes wrote:
 On 09/16/2014 04:12 AM, Daniel P. Berrange wrote:
 On Tue, Sep 16, 2014 at 07:30:26AM +1000, Michael Still wrote:
 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com
 wrote:
 On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
 On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
 Just an observation from the last week or so...

 The biggest problem nova faces at the moment isn't code review
 latency. Our
 biggest problem is failing to fix our bugs so that the gate is
 reliable.
 The number of rechecks we've done in the last week to try and land
 code is
 truly startling.

 I consider both problems to be pretty much equally as important. I
 don't
 think solving review latency or test reliabilty in isolation is
 enough to
 save Nova. We need to tackle both problems as a priority. I tried
 to avoid
 getting into my concerns about testing in my mail on review team
 bottlenecks
 since I think we should address the problems independantly / in
 parallel.

 Agreed with this.  I don't think we can afford to ignore either one
 of them.

 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.

 So, how do we get people to start taking bugs / gate failures more
 seriously?

 I think we should have formal Bug squash wednesdays  (or pick another
 day). By this I mean that the core reviewers will focus their attention
 on just reviews that are related to bug fixing. They will also try to
 work on bugs if they have time and encourage everyone else involved in
 Nova todo the same. We'd have a team of people in the Nova IRC channel
 to publicise  co-ordinate bug squashing, perhaps  with a list of top
 20 bugs we want to attack this week. I wouldn't focus just on gate bugs
 here since many a pretty darn hard  so would put off many people. Have
 a mix of bugs of varying difficulties to point people to. Make this a
 regular fortnightly or even weekly event which we publicise in advance
 on mailing lists, etc.
 
 +1, I've suggested similar in the past.

+1 a weekly event would be great.

I've spent the bulk of the last 2 weeks in the Nova bug tracker, and
it's pretty interesting what's in there. Lots of stuff we should be
fixing. Lots of really old gorp that we should shed because it's not
helping. Also lots of inconsistencies in how triage is happening because
it's not happening regularly enough.

Plus, now that we are at 0 bugs in the New state in Nova, it's actually
kind of sane to stay on top of that, and keep our New state empty. Not
that it fixes everything, but it does prevent a bunch of gorp getting
added to the pile as probably 1/2 - 1/3 of inbound bugs... aren't.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Joe Gordon
On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com wrote:

 On 09/15/2014 08:07 PM, Jeremy Stanley wrote:

 On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
 [...]

 Sometimes it's pretty hard to determine whether something in the
 E-R check page is due to something in the infra scripts, some
 transient issue in the upstream CI platform (or part of it), or
 actually a bug in one or more of the OpenStack projects.

 [...]

 Sounds like an NP-complete problem, but if you manage to solve it
 let me know and I'll turn it into the first line of triage for Infra
 bugs. ;)


 LOL, thanks for making me take the last hour reading Wikipedia pages
about computational complexity theory! :P

 No, in all seriousness, I wasn't actually asking anyone to boil the
ocean, mathematically. I think doing a couple things just making the
categorization more obvious (a UI thing, really) and doing some (hopefully
simple?) inspection of some control group of patches that we know do not
introduce any code changes themselves and comparing to another group of
patches that we know *do* introduce code changes to Nova, and then seeing
if there are a set of E-R issues that consistently appear in *both* groups.
That set of E-R issues has a higher likelihood of not being due to Nova,
right?

We use launchpad's affected projects listings on the elastic recheck page
to say what may be causing the bug.  Tagging projects to bugs is a manual
process, but one that works pretty well.

UI: The elastic recheck UI definitely could use some improvements. I am
very poor at writing UIs, so patches welcome!


 OK, so perhaps it's not the most scientific or well-thought out plan, but
hey, it's a spark for thought... ;)

 Best,
 -jay


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Nikola Đipanov
On 09/13/2014 11:07 PM, Michael Still wrote:
 Just an observation from the last week or so...
 
 The biggest problem nova faces at the moment isn't code review latency.
 Our biggest problem is failing to fix our bugs so that the gate is
 reliable. The number of rechecks we've done in the last week to try and
 land code is truly startling.
 

This is exactly what I was saying in my ranty email from 2 weeks ago
[1]. Debt is everywhere and as any debt, it is unlikely going away on
it's own.


 I know that some people are focused by their employers on feature work,
 but those features aren't going to land in a world in which we have to
 hand walk everything through the gate.
 

The thing is that - without doing work on the code - you cannot know
where the real issues are. You cannot look at a codebase as big as Nova
and say, hmmm looks like we need to fix the resource tracker. You can
know that only if you are neck-deep in the stuff. And then you need to
agree on what is really bad and what is just distasteful, and then focus
the efforts on that. None of the things we've put in place (specs, the
way we do and organize code review and bugs) acknowledge or help this
part of the development process.

I tried to explain this in my previous ranty email [1] but I guess I
failed due to ranting :) so let me try again: Nova team needs to act as
a development team.

We are not in a place (yet?) where we can just overlook the addition of
features based on weather they are appropriate for our use case. We have
to work together on a set of important things to get Nova to where we
think it needs to be and make sure we get it done - by actually doing
it! (*)

However - I don't think freezing development of features for a cycle is
a viable option - this is just not how software in the real world gets
done. It will likely be the worst possible thing we can do, no matter
how appealing it seems to us as developers.

But we do need to be extremely strict on what we let in, and under which
conditions! As I mentioned to sdague on IRC the other day (yes, I am
quoting myself :) ): Not all features are the same - there are
features that are better, that are coded better, and are integrated
better - we should be wanting those features always! Then there are
features that are a net negative on the code - we should *never* want
those features. And then there are features in the middle - we may want
to cut those or push them back depending on a number of things that are
important. Things like: code quality, can it fit withing the current
constraints, can we let it in like that, or some work needs to happen
first. Things which we haven't been really good at considering
previously IMHO.

But you can't really judge that unless you are actively developing Nova
yourself, and have a tighter grip on the proposed code than what our
current process gives.

Peace!
N.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html

(*) The only effort like this going on at the moment in Nova is the
Objects work done by dansmith (even thought there are several others
proposed) - I will let the readers judge how much of an impact it was in
only 2 short cycles, from just a single effort.

 Michael
 
 
 -- 
 Rackspace Australia
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Nikola Đipanov
On 09/14/2014 12:27 AM, Boris Pavlovic wrote:
 Michael, 
 
 I am so glad that you started this topic.
 I really like idea of  of taking a pause with features and concentrating
 on improvement of current code base. 
 
 Even if the 1 k open bugs https://bugs.launchpad.net/nova are vital
 issue, there are other things that could be addressed to improve Nova
 team throughput. 
 
 Like it was said in another thread: Nova code is current too big and
 complex to be understand by one person.
 It produces 2 issues: 
 A) There is hard to find person who can observer full project and make
 global architecture decisions including work on cross projects interactions
 (So project doesn't have straight direction of development)
 B) It's really hard to find cores, and current cores are under too heavy
 load (because of project complexity)
 
 I believe that whole current Nova functionality can be implemented in
 much simpler manner.

Just a brief comment on the sentence above.

This is a common thing to hear from coders, and is very rarely rooted in
reality IMHO. Nova does _a lot_ of things. Saying that given an
exhaustive list of features it has, we can implement them in a much
simpler manner is completely disregarding all the complexity of building
software that works within real world constraints.

 Basically, complexity was added during the process of adding a lot of
 features for years, that didn't perfectly fit to architecture of Nova. 
 And there wasn't much work on refactoring the architecture to cleanup
 these features. 
 

I agree with this of course - fixing architectural flaws is important
and needs to be an ongoing part of the process, as I mention in my other
mail to the thread. Halting all other development is not the way to do
it though.

N.

 So maybe it's proper time to think about what, why and how we are
 doing. 
 That will allows us to find simpler solutions for current functionality. 
 
 
 Best regards,
 Boris Pavlovic 
 
 
 On Sun, Sep 14, 2014 at 1:07 AM, Michael Still mi...@stillhq.com
 mailto:mi...@stillhq.com wrote:
 
 Just an observation from the last week or so...
 
 The biggest problem nova faces at the moment isn't code review
 latency. Our biggest problem is failing to fix our bugs so that the
 gate is reliable. The number of rechecks we've done in the last week
 to try and land code is truly startling.
 
 I know that some people are focused by their employers on feature
 work, but those features aren't going to land in a world in which we
 have to hand walk everything through the gate.
 
 Michael
 
 
 -- 
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Daniel P. Berrange
On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
 Just an observation from the last week or so...
 
 The biggest problem nova faces at the moment isn't code review latency. Our
 biggest problem is failing to fix our bugs so that the gate is reliable.
 The number of rechecks we've done in the last week to try and land code is
 truly startling.

I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.

 I know that some people are focused by their employers on feature work, but
 those features aren't going to land in a world in which we have to hand
 walk everything through the gate.

Unfortunately the reliability of the gate systems has the highest negative
impact on productivity right at the point in the dev cycle where we need
it to have the least impact too.

If we're going to continue to raise the bar in terms of testing coverage
then we need to have a serious look at the overall approach we use for
testing because what we do today isn't going to scale, even if it is
100% reliable. We can't keep adding new CI jobs for each new nova.conf
setting that introduces a new code path, because each job has major
implications for resource consumption (number of test nodes, log storage),
not to mention reliability. I think we need to figure out a way to get
more targetted testing of features, so we can keep the overall number
of jobs lower and the tests shorter.

Instead of having a single tempest run that exercises all the Nova
functionality in one run, we need to figure out how to split it up
into independant functional areas. For example if we could isolate
tests which are affected by choice of cinder storage backend, then
we could run those subset of tests multiple times, once for each
supported cinder backend. Without this, the combinatorial explosion
of test jobs is going to kill us.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Michael Still
On Mon, Sep 15, 2014 at 7:42 PM, Daniel P. Berrange berra...@redhat.com wrote:

 Unfortunately the reliability of the gate systems has the highest negative
 impact on productivity right at the point in the dev cycle where we need
 it to have the least impact too.

Agreed.

However, my instinct is that a lot of our CI unreliability isn't from
the number of permutations, but from buggy code. We have our users
telling us where to look to fix this in the form of many many bug
reports. I find it hard to believe that we couldn't improve our gate
reliability by taking fixing the bugs we currently have reported more
seriously.

Michael



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Sean Dague
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
 On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
 Just an observation from the last week or so...

 The biggest problem nova faces at the moment isn't code review latency. Our
 biggest problem is failing to fix our bugs so that the gate is reliable.
 The number of rechecks we've done in the last week to try and land code is
 truly startling.
 
 I consider both problems to be pretty much equally as important. I don't
 think solving review latency or test reliabilty in isolation is enough to
 save Nova. We need to tackle both problems as a priority. I tried to avoid
 getting into my concerns about testing in my mail on review team bottlenecks
 since I think we should address the problems independantly / in parallel.
 
 I know that some people are focused by their employers on feature work, but
 those features aren't going to land in a world in which we have to hand
 walk everything through the gate.
 
 Unfortunately the reliability of the gate systems has the highest negative
 impact on productivity right at the point in the dev cycle where we need
 it to have the least impact too.
 
 If we're going to continue to raise the bar in terms of testing coverage
 then we need to have a serious look at the overall approach we use for
 testing because what we do today isn't going to scale, even if it is
 100% reliable. We can't keep adding new CI jobs for each new nova.conf
 setting that introduces a new code path, because each job has major
 implications for resource consumption (number of test nodes, log storage),
 not to mention reliability. I think we need to figure out a way to get
 more targetted testing of features, so we can keep the overall number
 of jobs lower and the tests shorter.
 
 Instead of having a single tempest run that exercises all the Nova
 functionality in one run, we need to figure out how to split it up
 into independant functional areas. For example if we could isolate
 tests which are affected by choice of cinder storage backend, then
 we could run those subset of tests multiple times, once for each
 supported cinder backend. Without this, the combinatorial explosion
 of test jobs is going to kill us.

One of the top issues killing Nova patches last week was a unit test
race (the wsgi worker one). There is no one to blame but Nova for that.
Jay was really the only team member digging into it.

I don't disagree on the disaggregation problem, however as lots of Nova
devs are ignoring unit test fails at this point, unless that changes no
other disaggregation is going make anything better.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Russell Bryant
On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
 On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
 Just an observation from the last week or so...

 The biggest problem nova faces at the moment isn't code review latency. Our
 biggest problem is failing to fix our bugs so that the gate is reliable.
 The number of rechecks we've done in the last week to try and land code is
 truly startling.
 
 I consider both problems to be pretty much equally as important. I don't
 think solving review latency or test reliabilty in isolation is enough to
 save Nova. We need to tackle both problems as a priority. I tried to avoid
 getting into my concerns about testing in my mail on review team bottlenecks
 since I think we should address the problems independantly / in parallel.

Agreed with this.  I don't think we can afford to ignore either one of them.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 04:01 AM, Nikola Đipanov wrote:

On 09/13/2014 11:07 PM, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency.
Our biggest problem is failing to fix our bugs so that the gate is
reliable. The number of rechecks we've done in the last week to try and
land code is truly startling.



This is exactly what I was saying in my ranty email from 2 weeks ago
[1]. Debt is everywhere and as any debt, it is unlikely going away on
it's own.



I know that some people are focused by their employers on feature work,
but those features aren't going to land in a world in which we have to
hand walk everything through the gate.



The thing is that - without doing work on the code - you cannot know
where the real issues are. You cannot look at a codebase as big as Nova
and say, hmmm looks like we need to fix the resource tracker. You can
know that only if you are neck-deep in the stuff. And then you need to
agree on what is really bad and what is just distasteful, and then focus
the efforts on that. None of the things we've put in place (specs, the
way we do and organize code review and bugs) acknowledge or help this
part of the development process.

I tried to explain this in my previous ranty email [1] but I guess I
failed due to ranting :) so let me try again: Nova team needs to act as
a development team.

We are not in a place (yet?) where we can just overlook the addition of
features based on weather they are appropriate for our use case. We have
to work together on a set of important things to get Nova to where we
think it needs to be and make sure we get it done - by actually doing
it! (*)

However - I don't think freezing development of features for a cycle is
a viable option - this is just not how software in the real world gets
done. It will likely be the worst possible thing we can do, no matter
how appealing it seems to us as developers.

But we do need to be extremely strict on what we let in, and under which
conditions! As I mentioned to sdague on IRC the other day (yes, I am
quoting myself :) ): Not all features are the same - there are
features that are better, that are coded better, and are integrated
better - we should be wanting those features always! Then there are
features that are a net negative on the code - we should *never* want
those features. And then there are features in the middle - we may want
to cut those or push them back depending on a number of things that are
important. Things like: code quality, can it fit withing the current
constraints, can we let it in like that, or some work needs to happen
first. Things which we haven't been really good at considering
previously IMHO.

But you can't really judge that unless you are actively developing Nova
yourself, and have a tighter grip on the proposed code than what our
current process gives.

Peace!
N.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044722.html

(*) The only effort like this going on at the moment in Nova is the
Objects work done by dansmith (even thought there are several others
proposed) - I will let the readers judge how much of an impact it was in
only 2 short cycles, from just a single effort.


+1 Well said, Nikola.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Michael Still
On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote:
 On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
 On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
 Just an observation from the last week or so...

 The biggest problem nova faces at the moment isn't code review latency. Our
 biggest problem is failing to fix our bugs so that the gate is reliable.
 The number of rechecks we've done in the last week to try and land code is
 truly startling.

 I consider both problems to be pretty much equally as important. I don't
 think solving review latency or test reliabilty in isolation is enough to
 save Nova. We need to tackle both problems as a priority. I tried to avoid
 getting into my concerns about testing in my mail on review team bottlenecks
 since I think we should address the problems independantly / in parallel.

 Agreed with this.  I don't think we can afford to ignore either one of them.

Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more seriously?

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Brant Knudson
On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com wrote:

 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com
 wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review
 latency. Our
  biggest problem is failing to fix our bugs so that the gate is
 reliable.
  The number of rechecks we've done in the last week to try and land
 code is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I don't
  think solving review latency or test reliabilty in isolation is enough
 to
  save Nova. We need to tackle both problems as a priority. I tried to
 avoid
  getting into my concerns about testing in my mail on review team
 bottlenecks
  since I think we should address the problems independantly / in
 parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one of
 them.

 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.

 So, how do we get people to start taking bugs / gate failures more
 seriously?

 Michael


What do you think about having an irc channel for working through gate
bugs? I've always found looking at gate failures frustrating because I seem
to be expected to work through these by myself, and maybe somebody's
already looking at it or has more information that I don't know about.
There have been times already where a gate bug that could have left
everything broken for a while wound up fixed pretty quickly because we were
able to find the right person hanging out in irc. Sometimes all it takes is
for someone with the right knowledge to be there. A hypothetical exchange:

rechecker: I got this error where the tempest-foo test failed ... http://...
tempest-expert: That test calls the compute-bar nova API
nova-expert: That API calls the network-baz neutron API
neutron-expert: When you call that API you need to also call this other API
to poll for it to be done... is nova doing that?
nova-expert: Nope. Fix on the way.

- Brant
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Sean Dague
On 09/15/2014 05:52 PM, Brant Knudson wrote:
 
 
 On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com
 mailto:mi...@stillhq.com wrote:
 
 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com
 mailto:rbry...@redhat.com wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review 
 latency. Our
  biggest problem is failing to fix our bugs so that the gate is 
 reliable.
  The number of rechecks we've done in the last week to try and land 
 code is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I 
 don't
  think solving review latency or test reliabilty in isolation is enough 
 to
  save Nova. We need to tackle both problems as a priority. I tried to 
 avoid
  getting into my concerns about testing in my mail on review team 
 bottlenecks
  since I think we should address the problems independantly / in 
 parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one of 
 them.
 
 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.
 
 So, how do we get people to start taking bugs / gate failures more
 seriously?
 
 Michael
 
 
 What do you think about having an irc channel for working through gate
 bugs? I've always found looking at gate failures frustrating because I
 seem to be expected to work through these by myself, and maybe
 somebody's already looking at it or has more information that I don't
 know about. There have been times already where a gate bug that could
 have left everything broken for a while wound up fixed pretty quickly
 because we were able to find the right person hanging out in irc.
 Sometimes all it takes is for someone with the right knowledge to be
 there. A hypothetical exchange:
 
 rechecker: I got this error where the tempest-foo test failed ... http://...
 tempest-expert: That test calls the compute-bar nova API
 nova-expert: That API calls the network-baz neutron API
 neutron-expert: When you call that API you need to also call this other
 API to poll for it to be done... is nova doing that?
 nova-expert: Nope. Fix on the way.

Honestly, the #openstack-qa channel is a completely appropriate place
for that. Plus it already has a lot of the tempest experts.
Realistically anyone that works on these kinds of fixes tend to be there.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 05:30 PM, Michael Still wrote:

On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com wrote:

On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:

On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:

Just an observation from the last week or so...

The biggest problem nova faces at the moment isn't code review latency. Our
biggest problem is failing to fix our bugs so that the gate is reliable.
The number of rechecks we've done in the last week to try and land code is
truly startling.


I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.


Agreed with this.  I don't think we can afford to ignore either one of them.


Yes, that was my point. I don't mind us debating how to rearrange
hypervisor drivers. However, if we think that will solve all our
problems we are confused.

So, how do we get people to start taking bugs / gate failures more seriously?


A few suggestions:

1) Bug bounties

Money talks. I know it sounds silly, but lots of developers get paid to 
work on features. Not as many have financial incentive to fix bugs.


It doesn't need to be a huge amount. And I think the wall of fame 
respect reward for top bug fixers or gate unblockers would be a good 
incentive as well.


The foundation has a budget. I can't think of a better way to effect 
positive change than allocating $10-20K to paying bug bounties.


2) Videos discussing gate tools and diagnostics techniques

I hope I'm not bursting any of Sean Dague's bubble, but one thing we've 
been discussing, together with Dan Smith, is having a weekly or 
bi-weekly Youtube show where we discuss Nova development topics, with 
deep dives into common but hairy parts of the Nova codebase. The idea is 
to grow Nova contributors' knowledge of more parts of Nova than just one 
particular area they might be paid to work on.


I think a weekly or bi-weekly show that focuses on bug and gate issues 
would be a really great idea, and I'd be happy to play a role in this. 
The Chef+OpenStack community does weekly Youtube recordings of their 
status meetings and AFAICT, it's pretty successful.


3) Provide a clearer way to understand what is a gate/CI/infra issue and 
what is a project bug


Sometimes it's pretty hard to determine whether something in the E-R 
check page is due to something in the infra scripts, some transient 
issue in the upstream CI platform (or part of it), or actually a bug in 
one or more of the OpenStack projects.


Perhaps there is a way to identify/categorize gate failures (in the form 
of E-R recheck queries) on some meta status page, that would either be 
populated manually or through some clever analysis to better direct 
would-be gate block fixers to where they need to focus?


Anyway, just a few ideas,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Davanum Srinivas
Sean,

I have tabs opened to:
http://status.openstack.org/elastic-recheck/gate.html
http://status.openstack.org/elastic-recheck/data/uncategorized.html

and periodically catch up on openstack-qa on IRC as well, i just did
not realize this wsgi gate bug was hurting the gate this much.

So, could we somehow indicate (email? or one of the web pages above?)
where occassional helpers can watch and pitch in when needed.

thanks,
dims


On Mon, Sep 15, 2014 at 5:55 PM, Sean Dague s...@dague.net wrote:
 On 09/15/2014 05:52 PM, Brant Knudson wrote:


 On Mon, Sep 15, 2014 at 4:30 PM, Michael Still mi...@stillhq.com
 mailto:mi...@stillhq.com wrote:

 On Tue, Sep 16, 2014 at 12:30 AM, Russell Bryant rbry...@redhat.com
 mailto:rbry...@redhat.com wrote:
  On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
  On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
  Just an observation from the last week or so...
 
  The biggest problem nova faces at the moment isn't code review 
 latency. Our
  biggest problem is failing to fix our bugs so that the gate is 
 reliable.
  The number of rechecks we've done in the last week to try and land 
 code is
  truly startling.
 
  I consider both problems to be pretty much equally as important. I 
 don't
  think solving review latency or test reliabilty in isolation is 
 enough to
  save Nova. We need to tackle both problems as a priority. I tried to 
 avoid
  getting into my concerns about testing in my mail on review team 
 bottlenecks
  since I think we should address the problems independantly / in 
 parallel.
 
  Agreed with this.  I don't think we can afford to ignore either one of 
 them.

 Yes, that was my point. I don't mind us debating how to rearrange
 hypervisor drivers. However, if we think that will solve all our
 problems we are confused.

 So, how do we get people to start taking bugs / gate failures more
 seriously?

 Michael


 What do you think about having an irc channel for working through gate
 bugs? I've always found looking at gate failures frustrating because I
 seem to be expected to work through these by myself, and maybe
 somebody's already looking at it or has more information that I don't
 know about. There have been times already where a gate bug that could
 have left everything broken for a while wound up fixed pretty quickly
 because we were able to find the right person hanging out in irc.
 Sometimes all it takes is for someone with the right knowledge to be
 there. A hypothetical exchange:

 rechecker: I got this error where the tempest-foo test failed ... http://...
 tempest-expert: That test calls the compute-bar nova API
 nova-expert: That API calls the network-baz neutron API
 neutron-expert: When you call that API you need to also call this other
 API to poll for it to be done... is nova doing that?
 nova-expert: Nope. Fix on the way.

 Honestly, the #openstack-qa channel is a completely appropriate place
 for that. Plus it already has a lot of the tempest experts.
 Realistically anyone that works on these kinds of fixes tend to be there.

 -Sean

 --
 Sean Dague
 http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: https://twitter.com/dims

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jeremy Stanley
On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]
 Sometimes it's pretty hard to determine whether something in the
 E-R check page is due to something in the infra scripts, some
 transient issue in the upstream CI platform (or part of it), or
 actually a bug in one or more of the OpenStack projects.
[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-15 Thread Jay Pipes

On 09/15/2014 08:07 PM, Jeremy Stanley wrote:

On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
[...]

Sometimes it's pretty hard to determine whether something in the
E-R check page is due to something in the infra scripts, some
transient issue in the upstream CI platform (or part of it), or
actually a bug in one or more of the OpenStack projects.

[...]

Sounds like an NP-complete problem, but if you manage to solve it
let me know and I'll turn it into the first line of triage for Infra
bugs. ;)


LOL, thanks for making me take the last hour reading Wikipedia pages 
about computational complexity theory! :P


No, in all seriousness, I wasn't actually asking anyone to boil the 
ocean, mathematically. I think doing a couple things just making the 
categorization more obvious (a UI thing, really) and doing some 
(hopefully simple?) inspection of some control group of patches that we 
know do not introduce any code changes themselves and comparing to 
another group of patches that we know *do* introduce code changes to 
Nova, and then seeing if there are a set of E-R issues that consistently 
appear in *both* groups. That set of E-R issues has a higher likelihood 
of not being due to Nova, right?


OK, so perhaps it's not the most scientific or well-thought out plan, 
but hey, it's a spark for thought... ;)


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-13 Thread Boris Pavlovic
Michael,

I am so glad that you started this topic.
I really like idea of  of taking a pause with features and concentrating on
improvement of current code base.

Even if the 1 k open bugs https://bugs.launchpad.net/nova are vital issue,
there are other things that could be addressed to improve Nova team
throughput.

Like it was said in another thread: Nova code is current too big and
complex to be understand by one person.
It produces 2 issues:
A) There is hard to find person who can observer full project and make
global architecture decisions including work on cross projects interactions
(So project doesn't have straight direction of development)
B) It's really hard to find cores, and current cores are under too heavy
load (because of project complexity)

I believe that whole current Nova functionality can be implemented in much
simpler manner.
Basically, complexity was added during the process of adding a lot of
features for years, that didn't perfectly fit to architecture of Nova.
And there wasn't much work on refactoring the architecture to cleanup these
features.

So maybe it's proper time to think about what, why and how we are
doing.
That will allows us to find simpler solutions for current functionality.


Best regards,
Boris Pavlovic


On Sun, Sep 14, 2014 at 1:07 AM, Michael Still mi...@stillhq.com wrote:

 Just an observation from the last week or so...

 The biggest problem nova faces at the moment isn't code review latency.
 Our biggest problem is failing to fix our bugs so that the gate is
 reliable. The number of rechecks we've done in the last week to try and
 land code is truly startling.

 I know that some people are focused by their employers on feature work,
 but those features aren't going to land in a world in which we have to hand
 walk everything through the gate.

 Michael


 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev