Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-12 Thread Daniel P. Berrange
On Thu, Sep 11, 2014 at 02:02:00PM -0400, Dan Prince wrote:
> I've always referred to the virt/driver.py API as an internal API
> meaning there are no guarantees about it being preserved across
> releases. I'm not saying this is correct... just that it is what we've
> got.  While OpenStack attempts to do a good job at stabilizing its
> public API's we haven't done the same for internal API's. It is actually
> quite painful to be out of tree at this point as I've seen with the
> Ironic driver being out of the Nova tree. (really glad that is back in
> now!)

Oh absolutely, I've always insisted that virt/driver.py is unstable
and that as a result out of tree drivers get to keep both pieces when
it breaks.

> So because we haven't designed things to be split out in this regard we
> can't just go and do it. 

I don't think that conclusion follows directly. We certainly need to
do some prep work to firm up our virt driver interface, as outlined
in my original mail, but if we agreed to push forward in this I think
it is practical to get that done in Kilo and split in L. It is
mostly a matter of having the will todo it IMHO.

> I tinkered with some numbers... not sure if this helps or hurts my
> stance but here goes. By my calculation this is the number of commits
> we've made that touched each virt driver tree for the last 3 releases
> plus stuff done to-date in Juno.
> 
> Created using a command like this in each virt directory for each
> release: git log origin/stable/havana..origin/stable/icehouse
> --no-merges --pretty=oneline . | wc -l
> 
> essex => folsom:
> 
>  baremetal: 26
>  hyperv: 9
>  libvirt: 222
>  vmwareapi: 18
>  xenapi: 164
> * total for above: 439
> 
> folsom => grizzly:
> 
>  baremetal: 83
>  hyperv: 58
>  libvirt: 254
>  vmwareapi: 59
>  xenapi: 126
>* total for above: 580
> 
> grizzly => havana:
> 
>  baremetal: 48
>  hyperv: 55
>  libvirt: 157
>  vmwareapi: 105
>  xenapi: 123
>* total for above: 488
> 
> havana => icehouse:
> 
>  baremetal: 45
>  hyperv: 42
>  libvirt: 212
>  vmwareapi: 121
>  xenapi: 100
>* total for above: 520
> 
> icehouse => master:
> 
>  baremetal: 26
>  hyperv: 32
>  libvirt: 188
>  vmwareapi: 121
>  xenapi: 71
>* total for above: 438
> 
> ---
> 
> A couple of things jump out at me from the numbers:
> 
>  -drivers that are being deprecated (baremetal) still have lots of
> changes. Some of these changes are valid bug fixes for the driver but a
> majority of them are actually related to internal cleanups and interface
> changes. This goes towards the fact that Nova isn't mature enough to do
> a split like this yet.

Our position that the virt driver is internal only, has permitted us
to make backwards incompatible changes to it at will. Given that freedom
people inevitably take that route since is is the least effort option.
If our position had been that the virt driver needed to be forwards
compatible, people would have been forced to make the same changes without
breaking existing drivers.  IOW, the fact that we've made lots of changes
to baremetal historically, doesn't imply that we can't decide to make the
virt driver API stable henceforth & thus avoid further changes of that
kind.

>  -the number of commits landed isn't growing *that* much across releases
> in the virt driver trees. Presumably we think we were doing a better job
> 2 years ago? But the number of changes in the virt trees is largely the
> same... perhaps this is because people aren't submitting stuff because
> they are frustrated though?

Our core team size & thus review bandwidth has been fairly static over
that time, so the only way virt driver commits could have risen is if
core reviewers increased their focus on virt drivers at the expense of
other parts of nova. I actually read those numbers as showing that as
we've put more effort into reviewing vmware contributions, we've lost
resource going into libvirt contributions.

In addition we're of course missing out on capturing the changes that
we've never had submitted, or submitted by abandoned, or submitted by
slipped across multiple releases waiting for merge. Overall I think
the figures paint a pretty depressing picture of no overall growth,
perhaps even a decline.


> 
> For comparison here are the total number of commits for each Nova
> release (includes the above commits):
> 
> essex -> folsom: 1708
> folsom -> grizzly: 2131
> grizzly -> havana: 2188
> havana -> icehouse: 1696
> icehouse -> master: 1493
> 
> ---

So we've still a way to go for juno cycle, but I'd be surprised if we
got beyond the havana numbers given where we are today. Again I think
those numbers show a plateau or even decline, which just reinforces
my point that our model is not scaling today.

> So say around 30% of the commits for a given release touch the virt
> drivers themselves.. many of them aren't specifically related to the
> virt drivers. Rather just general Nova internal cleanups because the
> interfaces aren't stable.
> 
> And while 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Chris Friesen

On 09/11/2014 12:02 PM, Dan Prince wrote:


Maybe I'm impatient (I totally am!) but I see much of the review
slowdown as a result of the feedback loop times increasing over the
years. OpenStack has some really great CI and testing but I think our
focus on not breaking things actually has us painted into a corner. We
are losing our agility and the review process is paying the price. At
this point I think splitting out the virt drivers would be more of a
distraction than a help.


I think the only solution to feedback loop times increasing is to scale 
the review process, which I think means giving more people 
responsibility for a smaller amount of code.


I don't think it's strictly necessary to split the code out into a 
totally separate repo, but I do think it would make sense to have 
changes that are entirely contained within a virt driver be reviewed 
only by developers of that virt driver rather than requiring review by 
the project as a whole.  And they should only have to pass a subset of 
the CI testing--that way they wouldn't be held up by gating bugs in 
other areas.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Dan Prince
On Thu, 2014-09-04 at 11:24 +0100, Daniel P. Berrange wrote:
> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.
> 


I've always referred to the virt/driver.py API as an internal API
meaning there are no guarantees about it being preserved across
releases. I'm not saying this is correct... just that it is what we've
got.  While OpenStack attempts to do a good job at stabilizing its
public API's we haven't done the same for internal API's. It is actually
quite painful to be out of tree at this point as I've seen with the
Ironic driver being out of the Nova tree. (really glad that is back in
now!)

So because we haven't designed things to be split out in this regard we
can't just go and do it. 

I tinkered with some numbers... not sure if this helps or hurts my
stance but here goes. By my calculation this is the number of commits
we've made that touched each virt driver tree for the last 3 releases
plus stuff done to-date in Juno.

Created using a command like this in each virt directory for each
release: git log origin/stable/havana..origin/stable/icehouse
--no-merges --pretty=oneline . | wc -l

essex => folsom:

 baremetal: 26
 hyperv: 9
 libvirt: 222
 vmwareapi: 18
 xenapi: 164
* total for above: 439

folsom => grizzly:

 baremetal: 83
 hyperv: 58
 libvirt: 254
 vmwareapi: 59
 xenapi: 126
   * total for above: 580

grizzly => havana:

 baremetal: 48
 hyperv: 55
 libvirt: 157
 vmwareapi: 105
 xenapi: 123
   * total for above: 488

havana => icehouse:

 baremetal: 45
 hyperv: 42
 libvirt: 212
 vmwareapi: 121
 xenapi: 100
   * total for above: 520

icehouse => master:

 baremetal: 26
 hyperv: 32
 libvirt: 188
 vmwareapi: 121
 xenapi: 71
   * total for above: 438

---

A couple of things jump out at me from the numbers:

 -drivers that are being deprecated (baremetal) still have lots of
changes. Some of these changes are valid bug fixes for the driver but a
majority of them are actually related to internal cleanups and interface
changes. This goes towards the fact that Nova isn't mature enough to do
a split like this yet.

 -the number of commits landed isn't growing *that* much across releases
in the virt driver trees. Presumably we think we were doing a better job
2 years ago? But the number of changes in the virt trees is largely the
same... perhaps this is because people aren't submitting stuff because
they are frustrated though?

---

For comparison here are the total number of commits for each Nova
release (includes the above commits):

essex -> folsom: 1708
folsom -> grizzly: 2131
grizzly -> havana: 2188
havana -> icehouse: 1696
icehouse -> master: 1493

---

So say around 30% of the commits for a given release touch the virt
drivers themselves.. many of them aren't specifically related to the
virt drivers. Rather just general Nova internal cleanups because the
interfaces aren't stable.

And while splitting Nova virt drivers might help out some I'm not sure
it helps the general Nova issue in that we have more reviews with less
of the good ones landing. Nova is a weird beast at the moment and just
splitting things like this is probably going to harm as much as it helps
(like we saw with Ironic) unless we stabilize the APIs... and even then
I'm skeptical of death by a million tiny sub-projects. I'm just not
convinced this is the number #1 pain point around Nova reviews. What
about the other 70%?

For me a lot of the frustration with reviews is around test/gate time,
pushing things through, rechecks, etc... and if we break something it
takes just as much time to get the revert in. The last point (the
ability to revert code quickly) is a really important one as it
sometimes takes days to get a simple (obvious) revert landed. This
leaves groups like TripleO who have their own CI and 3rd party testing
systems which also capable of finding many critical issues in the
difficult position of having to revert/cherry pick critical changes for
days at a time in order to keep things running.

Maybe I'm impatient (I totally am!) but I see much of the review
slowdown as a result of the feedbac

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 05:27:06PM +, Alessandro Pilotti wrote:
> This means that if we reach a point in which we agree to spin off the drivers 
> in
> separate core projects, we need to consider how driver related CIs will be 
> still
> included in the Nova review process, possibly with voting rights when the
> individual CI stability allows it. Having each third party CI to vote only on
> its spin-off driver project is not an option IMO, as it won’t catch 
> regressions
> introduced in Nova that affect the drivers, including race conditions [5]

Yes, the 3rd party CI would still need to be run against the nova common
repos to ensure changes there don't cause regressions on the virt drivers
in question. I'd expect them to continue to be non-gating as they are today
though. THe 3rd party CI would only be gating on the virt driver repo.

> An interesting area of discussion is who is going to be part of the initial 
> core
> teams for each new subproject. I truly appreciated the experience and help of
> the Nova core guys, so in order to allow a smoother transition I’d suggest to
> have for each new project (e.g. nova-compute-hyperv, nova-compute-vmware, etc)
> an initial core team consisting in one or two members of the current Nova
> sub-team and one Nova core, with ideally each patch reviewed by both the 
> domain
> experts and the Nova core. The team could then go on its way by voting its own
> members as any other OpenStack project does.

The question of precisely who should be on the core team of each virt driver
will probably vary depending on the driver. In the Xen & libvirt cases, they
are already privileged to have several nova-core members who would naturally
also be core on the virt drivers. In the VMWare / HyperV cases, the idea
you mention of having a couple of existing nova cores (temporarily) join
their new teams would be a good way to bootstrap the new team.

Beyond those cores though, I think what I'd suggest is that we look at the
list of people who have contributed most code to each driver, and also the
people who have reviewed most code in each driver and finally people active
in the sub-team meetings. From those lists identify approx 5-10 top candidates
to form the nucleus of the new team. Once up & running for a few months they
can then look to promote any other candidates who show commitment to the
driver in question.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Daniel P. Berrange
On Wed, Sep 10, 2014 at 12:41:44PM -0700, Vishvananda Ishaya wrote:
> 
> On Sep 5, 2014, at 4:12 AM, Sean Dague  wrote:
> 
> > On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
> >> 
> >> 
> >> Just some things to think about with regards to the whole idea, by no
> >> means exhaustive.
> > 
> > So maybe the better question is: what are the top sources of technical
> > debt in Nova that we need to address? And if we did, everyone would be
> > more sane, and feel less burnt.
> > 
> > Maybe the drivers are the worst debt, and jettisoning them makes them
> > someone else's problem, so that helps some. I'm not entirely convinced
> > right now.
> > 
> > I think Cells represents a lot of debt right now. It doesn't fully work
> > with the rest of Nova, and produces a ton of extra code paths special
> > cased for the cells path.
> > 
> > The Scheduler has a ton of debt as has been pointed out by the efforts
> > in and around Gannt. The focus has been on the split, but realistically
> > I'm with Jay is that we should focus on the debt, and exposing a REST
> > interface in Nova.
> > 
> > What about the Nova objects transition? That continues to be slow
> > because it's basically Dan (with a few other helpers from time to time).
> > Would it be helpful if we did an all hands on deck transition of the
> > rest of Nova for K1 and just get it done? Would be nice to have the bulk
> > of Nova core working on one thing like this and actually be in shared
> > context with everyone else for a while.
> 
> In my mind, spliting helps with all of these things. A lot of the cleanup
> related work is completely delayed because the review queue starts to seem
> like an insurmountable hurdle. There are various cleanups needed in the
> drivers as well but they are not progressing due to the glacier pace we
> are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug
> fixes, Libvirt resize/migrate (this is still using ssh to copy data!)
> 
> People need smaller areas of work. And they need a sense of pride and
> ownership of the things that they work on. In my mind that is the best
> way to ensure success.

I do like to look at past experiance for guidance, and with Nova we have
had a history of splitting out pieces of code and I think it is fair to
say that all those splits have been very successful for both sides (the
new project and Nova). eg if we look at the size and scope of the cinder
project & team today, I don't think it could ever have grown to that
scale if it had remained part of Nova. Splitting it out unleashed its
latent potential for success.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Jeremy Stanley
On 2014-09-10 12:19:08 -0700 (-0700), Vishvananda Ishaya wrote:
> I don’t think this is a viable option for us, but if we were going
> to do it, we would probably be better off using
> https://code.google.com/p/rietveld/ as a base, since it is
> actually written in python.

The proposal floated in Atlanta was to write a new python-based
front-end built on Gerrit's API layer (in fact, at least one such
alternative front-end now exists in the form of gertty, but that's
console-oriented and so probably not to everyone's tastes). I'll let
the vinz developers speak to their plans and current progress
though.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 5, 2014, at 4:12 AM, Sean Dague  wrote:

> On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
>> 
>> 
>> Just some things to think about with regards to the whole idea, by no
>> means exhaustive.
> 
> So maybe the better question is: what are the top sources of technical
> debt in Nova that we need to address? And if we did, everyone would be
> more sane, and feel less burnt.
> 
> Maybe the drivers are the worst debt, and jettisoning them makes them
> someone else's problem, so that helps some. I'm not entirely convinced
> right now.
> 
> I think Cells represents a lot of debt right now. It doesn't fully work
> with the rest of Nova, and produces a ton of extra code paths special
> cased for the cells path.
> 
> The Scheduler has a ton of debt as has been pointed out by the efforts
> in and around Gannt. The focus has been on the split, but realistically
> I'm with Jay is that we should focus on the debt, and exposing a REST
> interface in Nova.
> 
> What about the Nova objects transition? That continues to be slow
> because it's basically Dan (with a few other helpers from time to time).
> Would it be helpful if we did an all hands on deck transition of the
> rest of Nova for K1 and just get it done? Would be nice to have the bulk
> of Nova core working on one thing like this and actually be in shared
> context with everyone else for a while.

In my mind, spliting helps with all of these things. A lot of the cleanup
related work is completely delayed because the review queue starts to seem
like an insurmountable hurdle. There are various cleanups needed in the
drivers as well but they are not progressing due to the glacier pace we
are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug
fixes, Libvirt resize/migrate (this is still using ssh to copy data!)

People need smaller areas of work. And they need a sense of pride and
ownership of the things that they work on. In my mind that is the best
way to ensure success.

Vish





signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 4, 2014, at 8:33 AM, Daniel P. Berrange  wrote:

> On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote:
>> Hi,
>> I do not think that Nova is in a death spiral. I just think that the
>> current way of working at the moment is strangling the project. I do not
>> understand why we need to split drivers out of the core project. Why not
>> have the ability to provide Œcore review¹ status to people for reviewing
>> those parts of the code? We have enough talented people in OpenStack to be
>> able to write a driver above gerrit to enable that.
> 
> The consensus view at the summit was that, having tried & failed at getting
> useful changes into gerrit, it is not a viable option unless we undertake a
> permanent fork of the code base. There didn't seem to be any apetite for
> maintaining & developing a large java app ourselves. So people we're looking
> to start writing a replacement for gerrit from scratch (albeit reusing the
> database schema).

I don’t think this is a viable option for us, but if we were going to do it,
we would probably be better off using https://code.google.com/p/rietveld/
as a base, since it is actually written in python.

Vish



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 4, 2014, at 3:24 AM, Daniel P. Berrange  wrote:

> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.

I am highly in favor of this approach (and have been for at
least a year). Every time we have brought this up in the past
there has been concern about the shared code, but we have to
make a change. We have tried various other approaches and none
of them have made a dent.

+1000

Vish
> 
> 
> Background information
> ==
> 
> I see many factors coming together to form the crisis
> 
> - Burn out of core team members from over work 
> - Difficulty bringing new talent into the core team
> - Long delay in getting code reviewed & merged
> - Marginalization of code areas which aren't popular
> - Increasing size of nova code through new drivers
> - Exclusion of developers without corporate backing
> 
> Each item on their own may not seem too bad, but combined they
> add up to a big problem.
> 
> Core team burn out
> --
> 
> Having been involved in Nova for several dev cycles now, it is clear
> that the backlog of code up for review never goes away. Even
> intensive code review efforts at various points in the dev cycle
> makes only a small impact on the backlog. This has a pretty
> significant impact on core team members, as their work is never
> done. At best, the dial is sometimes set to 10, instead of 11.
> 
> Many people, myself included, have built tools to help deal with
> the reviews in a more efficient manner than plain gerrit allows
> for. These certainly help, but they can't ever solve the problem
> on their own - just make it slightly more bearable. And this is
> not even considering that core team members might have useful
> contributions to make in ways beyond just code review. Ultimately
> the workload is just too high to sustain the levels of review
> required, so core team members will eventually burn out (as they
> have done many times already).
> 
> Even if one person attempts to take the initiative to heavily
> invest in review of certain features it is often to no avail.
> Unless a second dedicated core reviewer can be found to 'tag
> team' it is hard for one person to make a difference. The end
> result is that a patch is +2d and then sits idle for weeks or
> more until a merge conflict requires it to be reposted at which
> point even that one +2 is lost. This is a pretty demotivating
> outcome for both reviewers & the patch contributor.
> 
> 
> New core team talent
> 
> 
> It can't escape attention that the Nova core team does not grow
> in size very often. When Nova was younger and its code base was
> smaller, it was easier for contributors to get onto core because
> the base level of knowledge required was that much smaller. To
> get onto core today requires a major investment in learning Nova
> over a year or more. Even people who potentially have the latent
> skills may not have the time available to invest in learning the
> entire of Nova.
> 
> With the number of reviews proposed to Nova, the core team should
> probably be at least double its current size[1]. There is plenty of
> expertize in the project as a whole but it is typically focused
> into specific areas of the codebase. There is nowhere we can find
> 20 more people with broad knowledge of the codebase who could be
> promoted even over the next year, let alone today. This is ignoring
> that many existing members of core are relatively inactive due to
> burnout and so need replacing. That means we really need another
> 25-30 people for core. That's not going to happen.
> 
> 
> Code review delays
> --
> 
> The obvious result of having too much work for too few reviewers
> is that code contributors face major delays in getting their work
> reviewed and merged. From personal experience, during Juno, I've
> probably spent 1 week in aggregate on actual code development vs
> 8 weeks on waiting on code review. You have to constantly be on
> alert for review comments because unless you can respond quickly
> (and repost) while you still have the at

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-09 Thread Gary Kotton


On 9/8/14, 7:23 PM, "Sylvain Bauza"  wrote:

>
>Le 08/09/2014 18:06, Steven Dake a écrit :
>> On 09/05/2014 06:10 AM, Sylvain Bauza wrote:
>>>
>>> Le 05/09/2014 12:48, Sean Dague a écrit :
 On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
> Le 05/09/2014 01:22, Michael Still a écrit :
>> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
>>  wrote:
>>
>> [Heavy snipping because of length]
>>
>>> The radical (?) solution to the nova core team bottleneck is thus
>>>to
>>> follow this lead and split the nova virt drivers out into separate
>>> projects and delegate their maintainence to new dedicated teams.
>>>
>>>- Nova becomes the home for the public APIs, RPC system,
>>>database
>>>  persistent and the glue that ties all this together with the
>>>  virt driver API.
>>>
>>>- Each virt driver project gets its own core team and is
>>> responsible
>>>  for dealing with review, merge & release of their codebase.
>> I think this is the crux of the matter. We're not doing a great
>> job of
>> landing code at the moment, because we can't keep up with the review
>> workload.
>>
>> So far we've had two proposals mooted:
>>
>>- slots / runways, where we try to rate limit the number of
>>things
>> we're trying to review at once to maintain focus
>>- splitting all the virt drivers out of the nova tree
> Ahem, IIRC, there is a third proposal for Kilo :
>   - create subteam's half-cores responsible for reviewing patch's
> iterations and send to cores approvals requests once they consider
>the
> patch enough stable for it.
>
> As I explained, it would allow to free up reviewing time for cores
> without loosing the control over what is being merged.
 I don't really understand how the half core idea works outside of a
 math
 equation, because the point is in core is to have trust over the
 judgement of your fellow core members so that they can land code when
 you aren't looking. I'm not sure how I manage to build up half trust
in
 someone any quicker.
>>>
>>> Well, this thread is becoming huge so that's becoming hard to follow
>>> all the discussion but I explained the idea elsewhere. Let me just
>>> provide it here too :
>>> The idea is *not* to land patches by the halfcores. Core team will
>>> still be fully responsible for approving patches. The main problem in
>>> Nova is that cores are spending lots of time because they review each
>>> iteration of a patch, and also have to look at if a patch is good or
>>> not.
>>>
>>> That's really time consuming, and for most of the time, quite
>>> frustrating as it requires to follow the patch's life, so there are
>>> high risks that your core attention is becoming distracted over the
>>> life of the patch.
>>>
>>> Here, the idea is to reduce dramatically this time by having teams
>>> dedicated to specific areas (as it's already done anyway for the
>>> various majority of reviewers) who could on their own take time for
>>> reviewing all the iterations. Of course, that doesn't mean cores
>>> would loose the possibility to specifically follow a patch and bypass
>>> the halfcores, that's just for helping them if they're overwhelmed.
>>>
>>> About the question of trusting cores or halfcores, I can just say
>>> that Nova team is anyway needing to grow up or divide it so the
>>> trusting delegation has to be real anyway.
>>>
>>> This whole process is IMHO very encouraging for newcomers because
>>> that creates dedicated teams that could help them to improve their
>>> changes, and not waiting 2 months for getting a -1 and a frank reply.
>>>
>>>
>> Interesting idea, but having been core on Heat for ~2 years, it is
>> critical to be involved in the review from the beginning of the patch
>> set.  Typically you won't see core reviewer's participate in a review
>> that is already being handled by two core reviewers.
>>
>> The reason it is important from the beginning of the change request is
>> that the project core can store the iterations and purpose of the
>> change in their heads.  Delegating all that up front work to a
>> non-core just seems counter to the entire process of code reviews.
>> Better would be reduce the # of reviews in the queue (what is proposed
>> by this change) or trust new reviewers "faster".  I'm not sure how you
>> do that - but this second model is what your proposing.
>>
>> I think one thing that would be helpful is to point out somehow in the
>> workflow that two core reviewers are involved in the review so core
>> reviewers don't have to sift through 10 pages of reviews to find new
>> work.
>>
>
>Now that the specs repo is in place and has been proved with Juno, most
>of the design stage is approved before the implementation is going. If
>the cores are getting more time because they wouldn't be focused on each
>single patchset, they could really find some pat

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-08 Thread Sylvain Bauza


Le 08/09/2014 18:06, Steven Dake a écrit :

On 09/05/2014 06:10 AM, Sylvain Bauza wrote:


Le 05/09/2014 12:48, Sean Dague a écrit :

On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
 wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.

   - Each virt driver project gets its own core team and is 
responsible

 for dealing with review, merge & release of their codebase.
I think this is the crux of the matter. We're not doing a great 
job of

landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

   - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
   - splitting all the virt drivers out of the nova tree

Ahem, IIRC, there is a third proposal for Kilo :
  - create subteam's half-cores responsible for reviewing patch's
iterations and send to cores approvals requests once they consider the
patch enough stable for it.

As I explained, it would allow to free up reviewing time for cores
without loosing the control over what is being merged.
I don't really understand how the half core idea works outside of a 
math

equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.


Well, this thread is becoming huge so that's becoming hard to follow 
all the discussion but I explained the idea elsewhere. Let me just 
provide it here too :
The idea is *not* to land patches by the halfcores. Core team will 
still be fully responsible for approving patches. The main problem in 
Nova is that cores are spending lots of time because they review each 
iteration of a patch, and also have to look at if a patch is good or 
not.


That's really time consuming, and for most of the time, quite 
frustrating as it requires to follow the patch's life, so there are 
high risks that your core attention is becoming distracted over the 
life of the patch.


Here, the idea is to reduce dramatically this time by having teams 
dedicated to specific areas (as it's already done anyway for the 
various majority of reviewers) who could on their own take time for 
reviewing all the iterations. Of course, that doesn't mean cores 
would loose the possibility to specifically follow a patch and bypass 
the halfcores, that's just for helping them if they're overwhelmed.


About the question of trusting cores or halfcores, I can just say 
that Nova team is anyway needing to grow up or divide it so the 
trusting delegation has to be real anyway.


This whole process is IMHO very encouraging for newcomers because 
that creates dedicated teams that could help them to improve their 
changes, and not waiting 2 months for getting a -1 and a frank reply.



Interesting idea, but having been core on Heat for ~2 years, it is 
critical to be involved in the review from the beginning of the patch 
set.  Typically you won't see core reviewer's participate in a review 
that is already being handled by two core reviewers.


The reason it is important from the beginning of the change request is 
that the project core can store the iterations and purpose of the 
change in their heads.  Delegating all that up front work to a 
non-core just seems counter to the entire process of code reviews. 
Better would be reduce the # of reviews in the queue (what is proposed 
by this change) or trust new reviewers "faster".  I'm not sure how you 
do that - but this second model is what your proposing.


I think one thing that would be helpful is to point out somehow in the 
workflow that two core reviewers are involved in the review so core 
reviewers don't have to sift through 10 pages of reviews to find new 
work.




Now that the specs repo is in place and has been proved with Juno, most 
of the design stage is approved before the implementation is going. If 
the cores are getting more time because they wouldn't be focused on each 
single patchset, they could really find some patches they would like to 
look at, or they could just wait for the half-approvals from the halfcores.


If a core thinks that a patch is enough tricky for looking at each 
iteration, I don't see any bad things. At least, it's up to the core 
reviewer to choose which patches he could look at, and he would be more 
free than if the slots proposal would be there.


I'm a core from a tiny project but I know how time consuming it is. I 
would really enjoy if I could delegate

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-08 Thread Steven Dake

On 09/05/2014 06:10 AM, Sylvain Bauza wrote:


Le 05/09/2014 12:48, Sean Dague a écrit :

On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
 wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.

   - Each virt driver project gets its own core team and is 
responsible

 for dealing with review, merge & release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

   - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
   - splitting all the virt drivers out of the nova tree

Ahem, IIRC, there is a third proposal for Kilo :
  - create subteam's half-cores responsible for reviewing patch's
iterations and send to cores approvals requests once they consider the
patch enough stable for it.

As I explained, it would allow to free up reviewing time for cores
without loosing the control over what is being merged.

I don't really understand how the half core idea works outside of a math
equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.


Well, this thread is becoming huge so that's becoming hard to follow 
all the discussion but I explained the idea elsewhere. Let me just 
provide it here too :
The idea is *not* to land patches by the halfcores. Core team will 
still be fully responsible for approving patches. The main problem in 
Nova is that cores are spending lots of time because they review each 
iteration of a patch, and also have to look at if a patch is good or not.


That's really time consuming, and for most of the time, quite 
frustrating as it requires to follow the patch's life, so there are 
high risks that your core attention is becoming distracted over the 
life of the patch.


Here, the idea is to reduce dramatically this time by having teams 
dedicated to specific areas (as it's already done anyway for the 
various majority of reviewers) who could on their own take time for 
reviewing all the iterations. Of course, that doesn't mean cores would 
loose the possibility to specifically follow a patch and bypass the 
halfcores, that's just for helping them if they're overwhelmed.


About the question of trusting cores or halfcores, I can just say that 
Nova team is anyway needing to grow up or divide it so the trusting 
delegation has to be real anyway.


This whole process is IMHO very encouraging for newcomers because that 
creates dedicated teams that could help them to improve their changes, 
and not waiting 2 months for getting a -1 and a frank reply.



Interesting idea, but having been core on Heat for ~2 years, it is 
critical to be involved in the review from the beginning of the patch 
set.  Typically you won't see core reviewer's participate in a review 
that is already being handled by two core reviewers.


The reason it is important from the beginning of the change request is 
that the project core can store the iterations and purpose of the change 
in their heads.  Delegating all that up front work to a non-core just 
seems counter to the entire process of code reviews. Better would be 
reduce the # of reviews in the queue (what is proposed by this change) 
or trust new reviewers "faster".  I'm not sure how you do that - but 
this second model is what your proposing.


I think one thing that would be helpful is to point out somehow in the 
workflow that two core reviewers are involved in the review so core 
reviewers don't have to sift through 10 pages of reviews to find new work.


Regards,
-steve

As I said elsewhere, I dislike the slots proposal because it sends to 
the developers the message that the price to pay for contributing to 
Nova is increasing. Again, that's not because you're prioritizing that 
you increase your velocity, that's 2 distinct subjects.


-Sylvain



-Sean




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-08 Thread Dan Smith
>> The last few days have been interesting as I watch FFEs come through.
>> People post explaining their feature, its importance, and the risk
>> associated with it. Three cores sign on for review. All of the ones
>> I've looked at have received active review since being posted. Would
>> it be bonkers to declare nova to be in "permanent feature freeze"? If
>> we could maintain the level of focus we see now, then we'd be getting
>> heaps more done that before.
> 
> Agreed. Honestly, this has been a really nice flow. I'd love to figure
> out what part of this focus is capturable for normal cadence. This
> realistically is what I was hoping slots would provide, because I feel
> like we actually move really fast when we call out 5-10 things to go
> look at this week.

The funny thing is, last week I was thinking how similar FF is to what
slots/runways would likely provide. That is, intense directed focus on a
single thing by a group of people until it's merged (or fails). Context
is kept between iterations because everyone is on board for quick
iterations with minimal distraction between them. It *does* work during
FF, as we've seen in the past -- I'd expect we have nearly 100% merge
rate of FFEs. How we arrive at a thing getting focus is different in
slots/runways, but I feel the result could be the same.

Splitting out the virt drivers is an easy way to make the life of a core
much easier, but I think the negative impacts are severe and potentially
irreversible, so I'd rather make sure we're totally out of options
before we exercise it.

--Dan



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread James Bottomley
On Fri, 2014-09-05 at 14:14 +0200, Thierry Carrez wrote:
> Daniel P. Berrange wrote:
> > For a long time I've use the LKML 'subsystem maintainers' model as the
> > reference point for ideas. In a more LKML like model, each virt team
> > (or other subsystem team) would have their own separate GIT repo with
> > a complete Nova codebase, where they did they day to day code submissions,
> > reviews and merges. Periodically the primary subsystem maintainer would
> > submit a large pull / merge requests to the overall Nova maintainer.
> > The $1,000,000 question in such a model is what kind of code review
> > happens during the big pull requests to integrate subsystem trees. 
> 
> Please note that the Kernel subsystem model is actually a trust tree
> based on 20 years of trust building. OpenStack is only 4 years old, so
> it's difficult to apply the same model as-is.

That's true but not entirely accurate.  The kernel maintainership is a
trust tree, but not every person in that tree has been in the position
for 20 years.  We have one or two who have (Dave Miller, net maintainer,
for instance), but we have some newcomers: Sarah Sharp has only been on
USB3.0 for a year.  People pass in and out of the maintainer tree all
the time.

In many ways, the Open Stack core model is also a trust tree (you elect
people to the core and support their nominations because you trust them
to do the required job).  It's not a 1 for 1 conversion, but it should
be possible to derive the trust you need from the model you already
have, should you wish to make OpenStack function more like the Linux
Kernel.

Essentially Daniel's proposal boils down to making the trust boundaries
align with separated community interests to get more scaling in the
model.  This is very similar to the way the kernel operates: most
maintainers only have expertise in their own areas.  We have a few
people with broad reach, like Andrew and Linus, but by and large most
people settle down in a much smaller area.  However, you don't have to
follow the kernel model to get this to happen, you just have to identify
the natural interest boundaries of the contributors and align around
them (provided they have enough mass to form their own community).

James



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Nathanael Burton
Daniel,

Thanks for the well thought out and thorough proposal to help Nova.

As an OpenStack operator/developer since Cactus time, it has definitely
gotten harder and harder to get fixes in Nova for small bugs that we find
running at scale with production systems. This forces us to maintain more
and more custom patches in-house (or for longer periods of time).  The huge
amount of time necessary to shepherd patches through review discourages
additional devs from contributing patches because of the amount of time
investment required.

I believe whatever we can do to improve the ability to fix technical debt
within Nova and both keep and grow the non-core contributors of Nova would
be greatly beneficial.

Thanks!

Nate
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread James Bottomley

On Fri, 2014-09-05 at 08:02 -0400, Sean Dague wrote:
> On 09/05/2014 07:40 AM, Daniel P. Berrange wrote:
> > On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
> >> On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
> >>> A handy example of this I can think of is the currently granted FFE for
> >>> serial consoles - consider how much of the code went into the common
> >>> part vs. the libvirt specific part, I would say the ratio is very close
> >>> to 1 if not even in favour of the common part (current 4 outstanding
> >>> patches are all for core, and out of the 5 merged - only one of them was
> >>> purely libvirt specific, assuming virt/ will live in nova-common).
> >>>
> >>> Joe asked a similar question elsewhere on the thread.
> >>>
> >>> Once again - I am not against doing it - what I am saying is that we
> >>> need to look into this closer as it may not be as big of a win from the
> >>> number of changes needed per feature as we may think.
> >>>
> >>> Just some things to think about with regards to the whole idea, by no
> >>> means exhaustive.
> >>
> >> So maybe the better question is: what are the top sources of technical
> >> debt in Nova that we need to address? And if we did, everyone would be
> >> more sane, and feel less burnt.
> >>
> >> Maybe the drivers are the worst debt, and jettisoning them makes them
> >> someone else's problem, so that helps some. I'm not entirely convinced
> >> right now.
> >>
> >> I think Cells represents a lot of debt right now. It doesn't fully work
> >> with the rest of Nova, and produces a ton of extra code paths special
> >> cased for the cells path.
> >>
> >> The Scheduler has a ton of debt as has been pointed out by the efforts
> >> in and around Gannt. The focus has been on the split, but realistically
> >> I'm with Jay is that we should focus on the debt, and exposing a REST
> >> interface in Nova.
> >>
> >> What about the Nova objects transition? That continues to be slow
> >> because it's basically Dan (with a few other helpers from time to time).
> >> Would it be helpful if we did an all hands on deck transition of the
> >> rest of Nova for K1 and just get it done? Would be nice to have the bulk
> >> of Nova core working on one thing like this and actually be in shared
> >> context with everyone else for a while.
> > 
> > I think the idea that we can tell everyone in Nova what they should
> > focus on for a cycle, or more generally, is doomed to failure. This
> > isn't a closed source company controlled project where you can dictate
> > what everyones priority must be. We must accept that rely on all our
> > contributors good will in voluntarily giving their time & resource to
> > the projct, to scratch whatever itch they have in the project. We have
> > to encourage them to want to work nova and demonstrate that we value
> > whatever form of contributor they choose to make. If we have technical
> > debt that we think is important to address we need to illustrate /
> > show people why they should care about helping. If they none the less
> > decide that work isn't for them, we can't just cast them aside and/or
> > ignore their contributions, while we get on with other things. This
> > is why I think it is important that we split up nova to allow each
> > are to self-organize around what they consider to be priorities in
> > their area of interest / motivation. Not enabling that is going to
> > to continue to kill our community
> 
> I'm getting tired of the reprieve that because we are an Open Source
> project declaring priorities is pointless, because it's not. I would say
> it's actually the exception that a developer wakes up in the morning and
> says "I completely disregard what anyone else thinks is important in
> this project, this is what I'm going to do today". Because if that's how
> they felt they wouldn't choose to be part of a community, they would
> just go do their own thing. Lone wolfs by definition don't form
> communities.

Actually, I don't think this analysis is accurate.  Some people are
simply interested in small aspects of a project.  It's the "scratch your
own itch" part of open source.  The thing which makes itch scratchers
not lone wolfs is the desire to go the extra mile to make what they've
done useful to the community.  If they never do this, they likely have a
forked repo with only their changes (and are the epitome of a lone
wolf).  If you scratch your own itch and make the effort to get it
upstream, you're assisting the community (even if that's the only piece
of code you do) and that assistance makes you (at least for a time) part
of the community.

A community doesn't necessarily require continuity from all its
elements.  It requires continuity from some (the core, if you will), but
it also allows for contributions from people who only have one or two
things they need doing.  For OpenStack to convert its users into its
contributors, it is going to have to embrace this, because they likely
only need a couple of things fixi

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 03:01 PM, Russell Bryant wrote:

On 09/05/2014 10:06 AM, Jay Pipes wrote:

On 09/05/2014 06:29 AM, John Garbutt wrote:

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.


-1

Without first cleaning up the interfaces around resource tracking, claim
creation and processing, and the communication interfaces between the
nova-conductor, nova-scheduler, and nova-compute.

I see no urgency at all in splitting out the scheduler. The cleanup of
the interfaces around the resource tracker and scheduler has great
priority, though, IMO.


I'd just reframe things ... I'd like the work you're referring to here
be treated as an obvious key pre-requisite to a split, and this cleanup
is what should be treated with urgency by those with a vested interest
in getting more autonomy around scheduler development.


Sure, that's a perfectly gentle way of putting it :)

Thanks!
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Russell Bryant
On 09/05/2014 10:06 AM, Jay Pipes wrote:
> On 09/05/2014 06:29 AM, John Garbutt wrote:
>> Scheduler: I think we need to split out the scheduler with a similar
>> level of urgency. We keep blocking features on the split, because we
>> know we don't have the review bandwidth to deal with them. Right now I
>> am talking about a compute related scheduler in the compute program,
>> that might evolve to worry about other services at a later date.
> 
> -1
> 
> Without first cleaning up the interfaces around resource tracking, claim
> creation and processing, and the communication interfaces between the
> nova-conductor, nova-scheduler, and nova-compute.
> 
> I see no urgency at all in splitting out the scheduler. The cleanup of
> the interfaces around the resource tracker and scheduler has great
> priority, though, IMO.

I'd just reframe things ... I'd like the work you're referring to here
be treated as an obvious key pre-requisite to a split, and this cleanup
is what should be treated with urgency by those with a vested interest
in getting more autonomy around scheduler development.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Chris Friesen

On 09/05/2014 03:52 AM, Daniel P. Berrange wrote:



So my biggest fear with a model where each team had their own full
Nova tree and did large pull requests, is that we'd suffer major
pain during the merging of large pull requests, especially if any
of the merges touched common code. It could make the pull requests
take a really long time to get accepted into the primary repo.

By constrast with split out git repos per virt driver code, we will
only ever have 1 stage of code review for each patch. Changes to
common code would go straight to main nova common repo and so get
reviewed by the experts there without delay, avoiding the 2nd stage
of review from merge requests.


Why treat things differently?  It seems to me that even in the first 
scenario you could still send common code changes straight to the main 
nova repo.  Then the pulls from the virt repo would literally only touch 
the virt code in the common repo.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Dugger, Donald D
Well, I and I believe a few others feel a slightly higher sense of urgency 
about splitting out the scheduler but I don't want to hijack this thread for 
that debate.  Fair warning, I intend to start a new thread where we can talk 
specifically about the scheduler split, I'm afraid we're in the situation where 
we're all in agreement but everyone has a different view of what that agreement 
is.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jay Pipes [mailto:jaypi...@gmail.com] 
Sent: Friday, September 5, 2014 8:07 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
virt drivers

On 09/05/2014 06:29 AM, John Garbutt wrote:
> Scheduler: I think we need to split out the scheduler with a similar 
> level of urgency. We keep blocking features on the split, because we 
> know we don't have the review bandwidth to deal with them. Right now I 
> am talking about a compute related scheduler in the compute program, 
> that might evolve to worry about other services at a later date.

-1

Without first cleaning up the interfaces around resource tracking, claim 
creation and processing, and the communication interfaces between the 
nova-conductor, nova-scheduler, and nova-compute.

I see no urgency at all in splitting out the scheduler. The cleanup of the 
interfaces around the resource tracker and scheduler has great priority, 
though, IMO.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 10:25:09AM -0500, Kevin L. Mitchell wrote:
> On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote:
> > > 2. Removal of drivers other than the reference implementation for each
> > > project could be the healthiest option
> > > a. Requires transparent, public, automated 3'rd party CI
> > > b. Requires a TRUE plugin architecture and mentality
> > > c. Requires a stable and well defined API
> > 
> > As mentioned in the original mail I don't want to see a situation where
> > we end up with some drivers in tree and others out of tree as it sets up
> > bad dynamics within the project. Those out of tree will always have the
> > impression of being second class citizens and thus there will be constant
> > pressure to accept drivers back into tree. The so called 'reference'
> > driver that stayed in tree would also continue to be penalized in the
> > way it is today, and so its development would be disadvantaged compared
> > to the out of tree drivers.
> 
> I have one quibble with the notion of "not even one" driver in core: I
> think it is probably useful to include a dummy, do-nothing driver that
> can be used for in-tree functional tests and as an example to point
> those interested in writing a driver.  Then, the "second-class citizen"
> is the one actually in the tree :)  Beyond that, I agree with this
> proposal: it has never made sense to me that *all* drivers live in the
> tree, and it actually offends my sense of organization to have the tree
> so cluttered; we split functions when they get too big, we split modules
> when they get too big, and we create subdirectories when packages get
> too big, so why not split repos when they get too big?

Oh sure, having a "fake virt" driver in tree is fine and indeed desirable
for the reasons you mention. I was exclusively thinking about the real
virt drivers in my earlier statement.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Lucas Alvares Gomes
> I look at what we do with Ironic testing current as a guide here.
> We have tempest job that runs against Nova, that validates changes
> to nova don't break the separate Ironic git repo. So my thought
> is that all our current tempest jobs would simply work in that
> way. IOW changes to so called "nova common" would run jobs that
> validate the change against all the virt driver git repos. I think
> this kind of setup is pretty much mandatory for split repos to be
> viable, because I don't want to see us loose testing coverage in
> this proposed change.

Thanks Daniel for raising it this problem.

Yeah I think that what we did with Ironic while the driver is* out of
the Nova tree serves as a good example. I also think that having
drivers out of the tree is possible, making the tests run against the
"nova-common" and assert things didn't break is no problem. But as you
described before the process of code submission was quite painful and
required a lot of effort and coordination from the Ironic and Nova
teams, we would need to improve that.

Another problem we will have in splitting the drivers out is that
classic limitation of launchpad blueprints, we can't track tasks
across multiple projects. (This will change once Storyboard is
completed I guess).

But that's all a long-term solution. In the short term I don't have
see any real solution yet, this thing about asking companies/projects
that has a driver in Nova to help with reviews is not so bad IMO. I've
started reviewing code in Nova today and will continue doing that,
maybe aiming for core so that we can speed up the future reviews to
the Ironic driver.

Now, I let me throw a crazy idea here into the mix (it might be stupid, but):

Maybe Nova is doing much more than it should, deprecating the
baremetal and network part and splitting the scheduler out of the
project helps a lot. But, and if other parts were splitted as well,
like managing flavors, creating the instances etc... And then Nova can
be the thing that knows how to talk/manage hypervisors only and won't
have to deal with crazy cases like the Ironic where we try make real
machines looks & feel like VMs to fit into Nova, because that's
painful and I think we are going to have many limitations if we
continue to do that (I believe the same may happen with the Docker
driver).

So if we have another project on top of Nova, Ironic and
$CONTAINER_PROJECT_NAME** that abstract all the rest and only talks to
Nova when a VM is going to be deployed or Ironic when a Baremetal
machine is going to be deployed, etc... Maybe then Nova will be
considerable small and can keep all drivers in tree (hypervisor
drivers only, no Docker or Ironic).

* was tempted to write 'was' there :)
** A new project that will know how to handle the containers case.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Kevin L. Mitchell
On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote:
> > 2. Removal of drivers other than the reference implementation for each
> > project could be the healthiest option
> > a. Requires transparent, public, automated 3'rd party CI
> > b. Requires a TRUE plugin architecture and mentality
> > c. Requires a stable and well defined API
> 
> As mentioned in the original mail I don't want to see a situation where
> we end up with some drivers in tree and others out of tree as it sets up
> bad dynamics within the project. Those out of tree will always have the
> impression of being second class citizens and thus there will be constant
> pressure to accept drivers back into tree. The so called 'reference'
> driver that stayed in tree would also continue to be penalized in the
> way it is today, and so its development would be disadvantaged compared
> to the out of tree drivers.

I have one quibble with the notion of "not even one" driver in core: I
think it is probably useful to include a dummy, do-nothing driver that
can be used for in-tree functional tests and as an example to point
those interested in writing a driver.  Then, the "second-class citizen"
is the one actually in the tree :)  Beyond that, I agree with this
proposal: it has never made sense to me that *all* drivers live in the
tree, and it actually offends my sense of organization to have the tree
so cluttered; we split functions when they get too big, we split modules
when they get too big, and we create subdirectories when packages get
too big, so why not split repos when they get too big?
-- 
Kevin L. Mitchell 
Rackspace


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 06:29 AM, John Garbutt wrote:

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.


-1

Without first cleaning up the interfaces around resource tracking, claim 
creation and processing, and the communication interfaces between the 
nova-conductor, nova-scheduler, and nova-compute.


I see no urgency at all in splitting out the scheduler. The cleanup of 
the interfaces around the resource tracker and scheduler has great 
priority, though, IMO.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Eric Windisch
>
>
>  - Each virt driver project gets its own core team and is responsible
>for dealing with review, merge & release of their codebase.
>
> Note, I really do mean *all* virt drivers should be separate. I do
> not want to see some virt drivers split out and others remain in tree
> because I feel that signifies that the out of tree ones are second
> class citizens.


+1. I made this same proposal to Michael during the mid-cycle. However, I
haven't wanted to conflate this issue with bringing Docker back into Nova.
For the Docker driver in particular, I feel that being able to stay out of
tree and having our own core team would be beneficial, but  I wouldn't want
to do this unless it applied equally to all drivers.

-- 
Regards,
Eric Windisch
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 15:11, Jay Pipes a écrit :

On 09/05/2014 08:58 AM, Sylvain Bauza wrote:

Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
"help but not solve this problem" -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent
ones due to my house move (finally done, yay!). I believe we are
mostly aligned on the plan of record, but I see no urgency in
splitting out the scheduler. I only see urgency on cleaning up the
interfaces. But, that said, let's not highjack Dan's thread here too
much. We can discuss on IRC. I was only saying that Don's comment that
splitting the scheduler out would help solve the bandwidth issues
should be predicated on the same contingency that Dan placed on
splitting out the virt drivers: that the internal interfaces be
cleaned up, documented and stabilized.



So, this effort requires at least one cycle, and as Dan stated, 
there is

urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call 
"half-cores")

for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation
proposals. It's just really a technical problem to solve w.r.t. Gerrit
permissions.



Well, that just requires new Gerrit groups and a new label (like
Subteam-Approved) so that members of this group could just
+Subteam-Approved if they're OK (here I imagine 2 people from the group
labelling it)


And what about code that crosses module boundaries? Would we need a 
LibvirtSubteamApproved, SchedulerSubteamApproved, etc?




Luckily not. I think we only need one more label (we only have 3 now : 
Verified, Code-Review, Approved).


Here the key thing is having a search label that cores can consume 
because they know that this label is worth of interest. If something is 
crosses module, then that's something that probably a core would help.


For example, if I'm an API halfcore, I can subteam-approve all the 
changes related to the API itself (so that encourages small and readable 
patches btw.) but I leave my turn if I'm looking at something I don't 
know enough (or I provide +1)


The porting idea is to encourage reviewing because the step is not so 
high as if I wanted to be core. On the other hand, if an halfcore is 
becoming enough trustable (because he also provides good +1s for other 
areas and is enough involved in the release process), then this folk is 
a good candidate for becoming core.



As you identified, most of the proposal is based on gentle-person 
agreement because Gerrit is not enough flexible for doing that (although 
since 2.8, you can search all patches related to a path, like 
file:^nova/scheduler/*)


-Sylvain

Of course, all the groups could have permissions to label any file of
Nova, but here we can just define a gentleman's agreement, like we do
for having two +2s before approving.


Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce 
this kind of policy, that's what I was getting at.



That would say that cores could just search using Gerrit with
'label:Subteam-Approved>=1'


Interesting, yes, that would be useful.

-jay


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Day, Phil


> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: 05 September 2014 11:49
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out
> virt drivers
> 
> On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
> >
> >
> > Ahem, IIRC, there is a third proposal for Kilo :
> >  - create subteam's half-cores responsible for reviewing patch's
> > iterations and send to cores approvals requests once they consider the
> > patch enough stable for it.
> >
> > As I explained, it would allow to free up reviewing time for cores
> > without loosing the control over what is being merged.
> 
> I don't really understand how the half core idea works outside of a math
> equation, because the point is in core is to have trust over the judgement of
> your fellow core members so that they can land code when you aren't
> looking. I'm not sure how I manage to build up half trust in someone any
> quicker.
> 
>   -Sean
> 
You seem to be looking at a model Sean where trust is purely binary - you’re 
either trusted to know about all of Nova or not trusted at all.  

What Sylvain is proposing (I think) is something more akin to having folks that 
are trusted in some areas of the system and/or trusted to be right enough of 
the time that their reviewing skills take a significant part of the burden of 
the core reviewers.That kind of incremental development of trust feels like 
a fairly natural model me.Its some way between the full divide and rule 
approach of splitting out various components (which doesn't feel like a short 
term solution) and the blanket approach of adding more cores.

Making it easier to incrementally grant trust, and having the processes and 
will to remove it if its seen to be misused feels to me like it has to be part 
of the solution to breaking out of the "we need more people we trust, but we 
don’t feel comfortable trusting more than N people at any one time".  Sometimes 
you have to give people a chance in small, well defined and controlled steps.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 08:58 AM, Sylvain Bauza wrote:

Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
"help but not solve this problem" -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent
ones due to my house move (finally done, yay!). I believe we are
mostly aligned on the plan of record, but I see no urgency in
splitting out the scheduler. I only see urgency on cleaning up the
interfaces. But, that said, let's not highjack Dan's thread here too
much. We can discuss on IRC. I was only saying that Don's comment that
splitting the scheduler out would help solve the bandwidth issues
should be predicated on the same contingency that Dan placed on
splitting out the virt drivers: that the internal interfaces be
cleaned up, documented and stabilized.




So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call "half-cores")
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation
proposals. It's just really a technical problem to solve w.r.t. Gerrit
permissions.



Well, that just requires new Gerrit groups and a new label (like
Subteam-Approved) so that members of this group could just
+Subteam-Approved if they're OK (here I imagine 2 people from the group
labelling it)


And what about code that crosses module boundaries? Would we need a 
LibvirtSubteamApproved, SchedulerSubteamApproved, etc?



Of course, all the groups could have permissions to label any file of
Nova, but here we can just define a gentleman's agreement, like we do
for having two +2s before approving.


Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce 
this kind of policy, that's what I was getting at.



That would say that cores could just search using Gerrit with
'label:Subteam-Approved>=1'


Interesting, yes, that would be useful.

-jay


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 12:48, Sean Dague a écrit :

On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
 wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.

   - Each virt driver project gets its own core team and is responsible
 for dealing with review, merge & release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

   - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
   - splitting all the virt drivers out of the nova tree

Ahem, IIRC, there is a third proposal for Kilo :
  - create subteam's half-cores responsible for reviewing patch's
iterations and send to cores approvals requests once they consider the
patch enough stable for it.

As I explained, it would allow to free up reviewing time for cores
without loosing the control over what is being merged.

I don't really understand how the half core idea works outside of a math
equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.


Well, this thread is becoming huge so that's becoming hard to follow all 
the discussion but I explained the idea elsewhere. Let me just provide 
it here too :
The idea is *not* to land patches by the halfcores. Core team will still 
be fully responsible for approving patches. The main problem in Nova is 
that cores are spending lots of time because they review each iteration 
of a patch, and also have to look at if a patch is good or not.


That's really time consuming, and for most of the time, quite 
frustrating as it requires to follow the patch's life, so there are high 
risks that your core attention is becoming distracted over the life of 
the patch.


Here, the idea is to reduce dramatically this time by having teams 
dedicated to specific areas (as it's already done anyway for the various 
majority of reviewers) who could on their own take time for reviewing 
all the iterations. Of course, that doesn't mean cores would loose the 
possibility to specifically follow a patch and bypass the halfcores, 
that's just for helping them if they're overwhelmed.


About the question of trusting cores or halfcores, I can just say that 
Nova team is anyway needing to grow up or divide it so the trusting 
delegation has to be real anyway.


This whole process is IMHO very encouraging for newcomers because that 
creates dedicated teams that could help them to improve their changes, 
and not waiting 2 months for getting a -1 and a frank reply.



As I said elsewhere, I dislike the slots proposal because it sends to 
the developers the message that the price to pay for contributing to 
Nova is increasing. Again, that's not because you're prioritizing that 
you increase your velocity, that's 2 distinct subjects.


-Sylvain



-Sean




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
"help but not solve this problem" -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent 
ones due to my house move (finally done, yay!). I believe we are 
mostly aligned on the plan of record, but I see no urgency in 
splitting out the scheduler. I only see urgency on cleaning up the 
interfaces. But, that said, let's not highjack Dan's thread here too 
much. We can discuss on IRC. I was only saying that Don's comment that 
splitting the scheduler out would help solve the bandwidth issues 
should be predicated on the same contingency that Dan placed on 
splitting out the virt drivers: that the internal interfaces be 
cleaned up, documented and stabilized.





So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call "half-cores")
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation 
proposals. It's just really a technical problem to solve w.r.t. Gerrit 
permissions.




Well, that just requires new Gerrit groups and a new label (like 
Subteam-Approved) so that members of this group could just 
+Subteam-Approved if they're OK (here I imagine 2 people from the group 
labelling it)


Of course, all the groups could have permissions to label any file of 
Nova, but here we can just define a gentleman's agreement, like we do 
for having two +2s before approving.


That would say that cores could just search using Gerrit with 
'label:Subteam-Approved>=1'


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
"help but not solve this problem" -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent 
ones due to my house move (finally done, yay!). I believe we are mostly 
aligned on the plan of record, but I see no urgency in splitting out the 
scheduler. I only see urgency on cleaning up the interfaces. But, that 
said, let's not highjack Dan's thread here too much. We can discuss on 
IRC. I was only saying that Don's comment that splitting the scheduler 
out would help solve the bandwidth issues should be predicated on the 
same contingency that Dan placed on splitting out the virt drivers: that 
the internal interfaces be cleaned up, documented and stabilized.





So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call "half-cores")
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation 
proposals. It's just really a technical problem to solve w.r.t. Gerrit 
permissions.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Thierry Carrez
Daniel P. Berrange wrote:
> For a long time I've use the LKML 'subsystem maintainers' model as the
> reference point for ideas. In a more LKML like model, each virt team
> (or other subsystem team) would have their own separate GIT repo with
> a complete Nova codebase, where they did they day to day code submissions,
> reviews and merges. Periodically the primary subsystem maintainer would
> submit a large pull / merge requests to the overall Nova maintainer.
> The $1,000,000 question in such a model is what kind of code review
> happens during the big pull requests to integrate subsystem trees. 

Please note that the Kernel subsystem model is actually a trust tree
based on 20 years of trust building. OpenStack is only 4 years old, so
it's difficult to apply the same model as-is.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:49:04AM -0400, Sean Dague wrote:
> On 09/05/2014 07:26 AM, Daniel P. Berrange wrote:
> > On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
> >> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
> >>> On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
>  On Thu, 4 Sep 2014 11:24:29 +0100
>  "Daniel P. Berrange"  wrote:
> >
> >  - A fairly significant amount of nova code would need to be
> >considered semi-stable API. Certainly everything under nova/virt
> >and any object which is passed in/out of the virt driver API.
> >Changes to such APIs would have to be done in a backwards
> >compatible manner, since it is no longer possible to lock-step
> >change all the virt driver impls. In some ways I think this would
> >be a good thing as it will encourage people to put more thought
> >into the long term maintainability of nova internal code instead
> >of relying on being able to rip it apart later, at will.
> >
> >  - The nova/virt/driver.py class would need to be much better
> >specified. All parameters / return values which are opaque dicts
> >must be replaced with objects + attributes. Completion of the
> >objectification work is mandatory, so there is cleaner separation
> >between virt driver impls & the rest of Nova.
> 
>  I think for this to work well with multiple repositories and drivers
>  having different priorities over implementing changes in the API it
>  would not just need to be semi-stable, but stable with versioning built
>  in from the start to allow for backwards incompatible changes. And
>  the interface would have to be very well documented including things
>  such as what exceptions are allowed to be raised through the API.
>  Hopefully this would be enforced through code as well. But as long as
>  driver maintainers are willing to commit to this extra overhead I can
>  see it working. 
> >>>
> >>> With our primary REST or RPC APIs we're under quite strict rules about
> >>> what we can & can't change - almost impossible to remove an existing
> >>> API from the REST API for example. With the internal virt driver API
> >>> we would probably have a little more freedom. For example, I think
> >>> if we found an existing virt driver API that was insufficient for a
> >>> new bit of work, we could add a new API in parallel with it, give the
> >>> virt drivers 1 dev cycle to convert, and then permanently delete the
> >>> original virt driver API. So a combination of that kind of API
> >>> replacement,  versioning for some data structures/objects, and use of
> >>> the capabilties flags would probably be sufficient. That's what I mean
> >>> by semi-stable here - no need to maintain existing virt driver APIs
> >>> indefinitely - we can remove & replace them in reasonably short time
> >>> scales as long as we avoid any lock-step updates.
> >>
> >> I have spent a lot of time over the last year working on things that
> >> require coordinated code lands between projects it's much more
> >> friction than you give it credit.
> >>
> >> Every added git tree adds a non linear cost to mental overhead, and a
> >> non linear integration cost. Realistically the reason the gate is in the
> >> state it is has a ton to do with the fact that it's integrating 40 git
> >> trees. Because virt drivers run in the process space of Nova Compute,
> >> they can pretty much do whatever, and the impacts are going to be
> >> somewhat hard to figure out.
> >>
> >> Also, if spinning these out seems like the right idea, I think nova-core
> >> needs to retain core rights over the drivers as well. Because there do
> >> need to be veto authority on some of the worst craziness.
> > 
> > If they want todo crazy stuff, let them live or die with the
> > consequences.
> > 
> >> If the VMWare team stopped trying to build a distributed lock manager
> >> inside their compute driver, or the Hyperv team didn't wait until J2 to
> >> start pushing patches, I think there would be more trust in some of
> >> these teams. But, I am seriously concerned in both those cases, and the
> >> slow review there is a function of a historic lack of trust in judgment.
> >> I also personally went on a moratorium a year ago in reviewing either
> >> driver because entities at both places where complaining to my
> >> management chain through back channels that I was -1ing their code...
> > 
> > I venture to suggest that the reason we care so much about those kind
> > of things is precisely because of our policy of pulling them in the
> > tree. Having them in tree means their quality (or not) reflects directly
> > on the project as a whole. Separate them from Nova as a whole and give
> > them control of their own desinty and they can deal with the consequences
> > of their actions and people can judge the results for themselves.
> > 
> > We don't have the ti

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 07:40 AM, Daniel P. Berrange wrote:
> On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
>> On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
>>> A handy example of this I can think of is the currently granted FFE for
>>> serial consoles - consider how much of the code went into the common
>>> part vs. the libvirt specific part, I would say the ratio is very close
>>> to 1 if not even in favour of the common part (current 4 outstanding
>>> patches are all for core, and out of the 5 merged - only one of them was
>>> purely libvirt specific, assuming virt/ will live in nova-common).
>>>
>>> Joe asked a similar question elsewhere on the thread.
>>>
>>> Once again - I am not against doing it - what I am saying is that we
>>> need to look into this closer as it may not be as big of a win from the
>>> number of changes needed per feature as we may think.
>>>
>>> Just some things to think about with regards to the whole idea, by no
>>> means exhaustive.
>>
>> So maybe the better question is: what are the top sources of technical
>> debt in Nova that we need to address? And if we did, everyone would be
>> more sane, and feel less burnt.
>>
>> Maybe the drivers are the worst debt, and jettisoning them makes them
>> someone else's problem, so that helps some. I'm not entirely convinced
>> right now.
>>
>> I think Cells represents a lot of debt right now. It doesn't fully work
>> with the rest of Nova, and produces a ton of extra code paths special
>> cased for the cells path.
>>
>> The Scheduler has a ton of debt as has been pointed out by the efforts
>> in and around Gannt. The focus has been on the split, but realistically
>> I'm with Jay is that we should focus on the debt, and exposing a REST
>> interface in Nova.
>>
>> What about the Nova objects transition? That continues to be slow
>> because it's basically Dan (with a few other helpers from time to time).
>> Would it be helpful if we did an all hands on deck transition of the
>> rest of Nova for K1 and just get it done? Would be nice to have the bulk
>> of Nova core working on one thing like this and actually be in shared
>> context with everyone else for a while.
> 
> I think the idea that we can tell everyone in Nova what they should
> focus on for a cycle, or more generally, is doomed to failure. This
> isn't a closed source company controlled project where you can dictate
> what everyones priority must be. We must accept that rely on all our
> contributors good will in voluntarily giving their time & resource to
> the projct, to scratch whatever itch they have in the project. We have
> to encourage them to want to work nova and demonstrate that we value
> whatever form of contributor they choose to make. If we have technical
> debt that we think is important to address we need to illustrate /
> show people why they should care about helping. If they none the less
> decide that work isn't for them, we can't just cast them aside and/or
> ignore their contributions, while we get on with other things. This
> is why I think it is important that we split up nova to allow each
> are to self-organize around what they consider to be priorities in
> their area of interest / motivation. Not enabling that is going to
> to continue to kill our community

I'm getting tired of the reprieve that because we are an Open Source
project declaring priorities is pointless, because it's not. I would say
it's actually the exception that a developer wakes up in the morning and
says "I completely disregard what anyone else thinks is important in
this project, this is what I'm going to do today". Because if that's how
they felt they wouldn't choose to be part of a community, they would
just go do their own thing. Lone wolfs by definition don't form
communities.

And the FFE process is firm demonstration that when we pick a small
number of things to look at, they move a lot more quickly.

People are always free to work on whatever they want. But providing some
focus to debt clean up. FFE++ effectively, would be really nice.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Chris Dent

On Fri, 5 Sep 2014, Daniel P. Berrange wrote:


I venture to suggest that the reason we care so much about those kind
of things is precisely because of our policy of pulling them in the
tree. Having them in tree means their quality (or not) reflects directly
on the project as a whole. Separate them from Nova as a whole and give
them control of their own desinty and they can deal with the consequences
of their actions and people can judge the results for themselves.


Apart from any of the other issues present in this thread (and not
commenting on them in this message), I think this paragraph (above)
represents an unfortunately narrow view about how perceptions of the
quality of OpenStack work. People who are invested in using OpenStack in
some fashion and are not in the development priesthood see OpenStack.
They don't see individual teams making virt drivers.

It may be (I don't know) that having more granularity in projects will
allow different teams to engage at different rates and thus get stuff
done, but I do not think it will do much with regard to external
perceptions of quality. That's going to take a much different kind of
work and attention.

--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 07:26 AM, Daniel P. Berrange wrote:
> On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
>> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
>>> On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
 On Thu, 4 Sep 2014 11:24:29 +0100
 "Daniel P. Berrange"  wrote:
>
>  - A fairly significant amount of nova code would need to be
>considered semi-stable API. Certainly everything under nova/virt
>and any object which is passed in/out of the virt driver API.
>Changes to such APIs would have to be done in a backwards
>compatible manner, since it is no longer possible to lock-step
>change all the virt driver impls. In some ways I think this would
>be a good thing as it will encourage people to put more thought
>into the long term maintainability of nova internal code instead
>of relying on being able to rip it apart later, at will.
>
>  - The nova/virt/driver.py class would need to be much better
>specified. All parameters / return values which are opaque dicts
>must be replaced with objects + attributes. Completion of the
>objectification work is mandatory, so there is cleaner separation
>between virt driver impls & the rest of Nova.

 I think for this to work well with multiple repositories and drivers
 having different priorities over implementing changes in the API it
 would not just need to be semi-stable, but stable with versioning built
 in from the start to allow for backwards incompatible changes. And
 the interface would have to be very well documented including things
 such as what exceptions are allowed to be raised through the API.
 Hopefully this would be enforced through code as well. But as long as
 driver maintainers are willing to commit to this extra overhead I can
 see it working. 
>>>
>>> With our primary REST or RPC APIs we're under quite strict rules about
>>> what we can & can't change - almost impossible to remove an existing
>>> API from the REST API for example. With the internal virt driver API
>>> we would probably have a little more freedom. For example, I think
>>> if we found an existing virt driver API that was insufficient for a
>>> new bit of work, we could add a new API in parallel with it, give the
>>> virt drivers 1 dev cycle to convert, and then permanently delete the
>>> original virt driver API. So a combination of that kind of API
>>> replacement,  versioning for some data structures/objects, and use of
>>> the capabilties flags would probably be sufficient. That's what I mean
>>> by semi-stable here - no need to maintain existing virt driver APIs
>>> indefinitely - we can remove & replace them in reasonably short time
>>> scales as long as we avoid any lock-step updates.
>>
>> I have spent a lot of time over the last year working on things that
>> require coordinated code lands between projects it's much more
>> friction than you give it credit.
>>
>> Every added git tree adds a non linear cost to mental overhead, and a
>> non linear integration cost. Realistically the reason the gate is in the
>> state it is has a ton to do with the fact that it's integrating 40 git
>> trees. Because virt drivers run in the process space of Nova Compute,
>> they can pretty much do whatever, and the impacts are going to be
>> somewhat hard to figure out.
>>
>> Also, if spinning these out seems like the right idea, I think nova-core
>> needs to retain core rights over the drivers as well. Because there do
>> need to be veto authority on some of the worst craziness.
> 
> If they want todo crazy stuff, let them live or die with the
> consequences.
> 
>> If the VMWare team stopped trying to build a distributed lock manager
>> inside their compute driver, or the Hyperv team didn't wait until J2 to
>> start pushing patches, I think there would be more trust in some of
>> these teams. But, I am seriously concerned in both those cases, and the
>> slow review there is a function of a historic lack of trust in judgment.
>> I also personally went on a moratorium a year ago in reviewing either
>> driver because entities at both places where complaining to my
>> management chain through back channels that I was -1ing their code...
> 
> I venture to suggest that the reason we care so much about those kind
> of things is precisely because of our policy of pulling them in the
> tree. Having them in tree means their quality (or not) reflects directly
> on the project as a whole. Separate them from Nova as a whole and give
> them control of their own desinty and they can deal with the consequences
> of their actions and people can judge the results for themselves.
> 
> We don't have the time or resources go continue baby-sitting them
> ourselves - attempting todo so has just resulted in a scenario where
> they end up getting largely ignored as you admit here. This ultimately
> makes their quality even worse, 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
> On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
> > A handy example of this I can think of is the currently granted FFE for
> > serial consoles - consider how much of the code went into the common
> > part vs. the libvirt specific part, I would say the ratio is very close
> > to 1 if not even in favour of the common part (current 4 outstanding
> > patches are all for core, and out of the 5 merged - only one of them was
> > purely libvirt specific, assuming virt/ will live in nova-common).
> > 
> > Joe asked a similar question elsewhere on the thread.
> > 
> > Once again - I am not against doing it - what I am saying is that we
> > need to look into this closer as it may not be as big of a win from the
> > number of changes needed per feature as we may think.
> > 
> > Just some things to think about with regards to the whole idea, by no
> > means exhaustive.
> 
> So maybe the better question is: what are the top sources of technical
> debt in Nova that we need to address? And if we did, everyone would be
> more sane, and feel less burnt.
> 
> Maybe the drivers are the worst debt, and jettisoning them makes them
> someone else's problem, so that helps some. I'm not entirely convinced
> right now.
> 
> I think Cells represents a lot of debt right now. It doesn't fully work
> with the rest of Nova, and produces a ton of extra code paths special
> cased for the cells path.
> 
> The Scheduler has a ton of debt as has been pointed out by the efforts
> in and around Gannt. The focus has been on the split, but realistically
> I'm with Jay is that we should focus on the debt, and exposing a REST
> interface in Nova.
> 
> What about the Nova objects transition? That continues to be slow
> because it's basically Dan (with a few other helpers from time to time).
> Would it be helpful if we did an all hands on deck transition of the
> rest of Nova for K1 and just get it done? Would be nice to have the bulk
> of Nova core working on one thing like this and actually be in shared
> context with everyone else for a while.

I think the idea that we can tell everyone in Nova what they should
focus on for a cycle, or more generally, is doomed to failure. This
isn't a closed source company controlled project where you can dictate
what everyones priority must be. We must accept that rely on all our
contributors good will in voluntarily giving their time & resource to
the projct, to scratch whatever itch they have in the project. We have
to encourage them to want to work nova and demonstrate that we value
whatever form of contributor they choose to make. If we have technical
debt that we think is important to address we need to illustrate /
show people why they should care about helping. If they none the less
decide that work isn't for them, we can't just cast them aside and/or
ignore their contributions, while we get on with other things. This
is why I think it is important that we split up nova to allow each
are to self-organize around what they consider to be priorities in
their area of interest / motivation. Not enabling that is going to
to continue to kill our community

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
> > On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
> >> On Thu, 4 Sep 2014 11:24:29 +0100
> >> "Daniel P. Berrange"  wrote:
> >>>
> >>>  - A fairly significant amount of nova code would need to be
> >>>considered semi-stable API. Certainly everything under nova/virt
> >>>and any object which is passed in/out of the virt driver API.
> >>>Changes to such APIs would have to be done in a backwards
> >>>compatible manner, since it is no longer possible to lock-step
> >>>change all the virt driver impls. In some ways I think this would
> >>>be a good thing as it will encourage people to put more thought
> >>>into the long term maintainability of nova internal code instead
> >>>of relying on being able to rip it apart later, at will.
> >>>
> >>>  - The nova/virt/driver.py class would need to be much better
> >>>specified. All parameters / return values which are opaque dicts
> >>>must be replaced with objects + attributes. Completion of the
> >>>objectification work is mandatory, so there is cleaner separation
> >>>between virt driver impls & the rest of Nova.
> >>
> >> I think for this to work well with multiple repositories and drivers
> >> having different priorities over implementing changes in the API it
> >> would not just need to be semi-stable, but stable with versioning built
> >> in from the start to allow for backwards incompatible changes. And
> >> the interface would have to be very well documented including things
> >> such as what exceptions are allowed to be raised through the API.
> >> Hopefully this would be enforced through code as well. But as long as
> >> driver maintainers are willing to commit to this extra overhead I can
> >> see it working. 
> > 
> > With our primary REST or RPC APIs we're under quite strict rules about
> > what we can & can't change - almost impossible to remove an existing
> > API from the REST API for example. With the internal virt driver API
> > we would probably have a little more freedom. For example, I think
> > if we found an existing virt driver API that was insufficient for a
> > new bit of work, we could add a new API in parallel with it, give the
> > virt drivers 1 dev cycle to convert, and then permanently delete the
> > original virt driver API. So a combination of that kind of API
> > replacement,  versioning for some data structures/objects, and use of
> > the capabilties flags would probably be sufficient. That's what I mean
> > by semi-stable here - no need to maintain existing virt driver APIs
> > indefinitely - we can remove & replace them in reasonably short time
> > scales as long as we avoid any lock-step updates.
> 
> I have spent a lot of time over the last year working on things that
> require coordinated code lands between projects it's much more
> friction than you give it credit.
> 
> Every added git tree adds a non linear cost to mental overhead, and a
> non linear integration cost. Realistically the reason the gate is in the
> state it is has a ton to do with the fact that it's integrating 40 git
> trees. Because virt drivers run in the process space of Nova Compute,
> they can pretty much do whatever, and the impacts are going to be
> somewhat hard to figure out.
> 
> Also, if spinning these out seems like the right idea, I think nova-core
> needs to retain core rights over the drivers as well. Because there do
> need to be veto authority on some of the worst craziness.

If they want todo crazy stuff, let them live or die with the
consequences.

> If the VMWare team stopped trying to build a distributed lock manager
> inside their compute driver, or the Hyperv team didn't wait until J2 to
> start pushing patches, I think there would be more trust in some of
> these teams. But, I am seriously concerned in both those cases, and the
> slow review there is a function of a historic lack of trust in judgment.
> I also personally went on a moratorium a year ago in reviewing either
> driver because entities at both places where complaining to my
> management chain through back channels that I was -1ing their code...

I venture to suggest that the reason we care so much about those kind
of things is precisely because of our policy of pulling them in the
tree. Having them in tree means their quality (or not) reflects directly
on the project as a whole. Separate them from Nova as a whole and give
them control of their own desinty and they can deal with the consequences
of their actions and people can judge the results for themselves.

We don't have the time or resources go continue baby-sitting them
ourselves - attempting todo so has just resulted in a scenario where
they end up getting largely ignored as you admit here. This ultimately
makes their quality even worse, because the lack of reviewer availability
means they stand little chance of pushing through the work to 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
> On 09/04/2014 12:24 PM, Daniel P. Berrange wrote:
>> Position statement
>> ==
>>
>> Over the past year I've increasingly come to the conclusion that
>> Nova is heading for (or probably already at) a major crisis. If
>> steps are not taken to avert this, the project is likely to loose
>> a non-trivial amount of talent, both regular code contributors and
>> core team members. That includes myself. This is not good for
>> Nova's long term health and so should be of concern to anyone
>> involved in Nova and OpenStack.
>>
>> For those who don't want to read the whole mail, the executive
>> summary is that the nova-core team is an unfixable bottleneck
>> in our development process with our current project structure.
>> The only way I see to remove the bottleneck is to split the virt
>> drivers out of tree and let them all have their own core teams
>> in their area of code, leaving current nova core to focus on
>> all the common code outside the virt driver impls. I, now, none
>> the less urge people to read the whole mail.
>>
>>
>> Background information
>> ==
>>
>> I see many factors coming together to form the crisis
>>
>>  - Burn out of core team members from over work 
>>  - Difficulty bringing new talent into the core team
>>  - Long delay in getting code reviewed & merged
>>  - Marginalization of code areas which aren't popular
>>  - Increasing size of nova code through new drivers
>>  - Exclusion of developers without corporate backing
>>
>> Each item on their own may not seem too bad, but combined they
>> add up to a big problem.
>>
> 
> As many others - I cannot +1 this enough. Some technical comments below
> that we may want to consider before, but to sum them up - this will be a
> TON OF WORK! we better make sure we really want to do this before.
> 
> However - please don't read this as FUD, maybe rather pointing out that
> devil is in the details, and maybe getting ahead of myself with too deep
> of a dive.
> 
>>
>>  - A fairly significant amount of nova code would need to be
>>considered semi-stable API. Certainly everything under nova/virt
>>and any object which is passed in/out of the virt driver API.
>>Changes to such APIs would have to be done in a backwards
>>compatible manner, since it is no longer possible to lock-step
>>change all the virt driver impls. In some ways I think this would
>>be a good thing as it will encourage people to put more thought
>>into the long term maintainability of nova internal code instead
>>of relying on being able to rip it apart later, at will.
>>
> 
> I think we should not underestimate how big of a job this will be. We
> have been treating that API as internal for a long time and a lot of
> abstractions are just broken and need to be redesigned and then
> refactored. A lot of the stuff is implementation specific (live
> migrations is a good example of this). What makes it more difficult is
> that we need to get this as right as possible before we do the split.
> 
> Now I am not saying this cannot be done or that we shouldn't to it,
> however I _am_ saying that we should not take lightly how much work
> there will be and how fiddly the work itself is.
> 
> On top of that - there are some other serious issues with nova common
> code that we need to take care of ASAP, and this will definitely
> increase the churn and make that more difficult. We should take this
> into account and make sure we are focusing efforts on the right things.
> Making sure we do is the biggest challenge nova core faces in addition
> to all the others mentioned above.
> 
>>  - The nova/virt/driver.py class would need to be much better
>>specified. All parameters / return values which are opaque dicts
>>must be replaced with objects + attributes. Completion of the
>>objectification work is mandatory, so there is cleaner separation
>>between virt driver impls & the rest of Nova.
>>
> 
> Not only that - currently nova-objects do their versioning magic only
> over RPC, while they would have to do it over library boundaries. This
> in itself will require work, and is likely going to influence how we
> stabilize the API.
> 
> However - splitting out the scheduler is likely to require objects to be
> able to do similar things, and there are other things that we may want
> to do (e.g. using properly versioned data for the extensible resources)
> that will benefit from this.
> 
>>  - If changes are required to common code, the virt driver developer
>>would first have to get the necccessary pieces merged into Nova
>>common. Then the follow up virt driver specific changes could be
>>proposed to their repo. This implies that some changes to virt
>>drivers will still contend for resource in the common nova repo 
>>and team. This contention should be lower than it is today though
>>since the current nova core team should have less code to look 
>>   

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 12:40:59PM +0200, Nikola Đipanov wrote:
> On 09/04/2014 12:24 PM, Daniel P. Berrange wrote:
> >  - A fairly significant amount of nova code would need to be
> >considered semi-stable API. Certainly everything under nova/virt
> >and any object which is passed in/out of the virt driver API.
> >Changes to such APIs would have to be done in a backwards
> >compatible manner, since it is no longer possible to lock-step
> >change all the virt driver impls. In some ways I think this would
> >be a good thing as it will encourage people to put more thought
> >into the long term maintainability of nova internal code instead
> >of relying on being able to rip it apart later, at will.
> > 
> 
> I think we should not underestimate how big of a job this will be. We
> have been treating that API as internal for a long time and a lot of
> abstractions are just broken and need to be redesigned and then
> refactored. A lot of the stuff is implementation specific (live
> migrations is a good example of this). What makes it more difficult is
> that we need to get this as right as possible before we do the split.
> 
> Now I am not saying this cannot be done or that we shouldn't to it,
> however I _am_ saying that we should not take lightly how much work
> there will be and how fiddly the work itself is.
> 
> On top of that - there are some other serious issues with nova common
> code that we need to take care of ASAP, and this will definitely
> increase the churn and make that more difficult. We should take this
> into account and make sure we are focusing efforts on the right things.
> Making sure we do is the biggest challenge nova core faces in addition
> to all the others mentioned above.
> 
> >  - The nova/virt/driver.py class would need to be much better
> >specified. All parameters / return values which are opaque dicts
> >must be replaced with objects + attributes. Completion of the
> >objectification work is mandatory, so there is cleaner separation
> >between virt driver impls & the rest of Nova.
> > 
> 
> Not only that - currently nova-objects do their versioning magic only
> over RPC, while they would have to do it over library boundaries. This
> in itself will require work, and is likely going to influence how we
> stabilize the API.
> 
> However - splitting out the scheduler is likely to require objects to be
> able to do similar things, and there are other things that we may want
> to do (e.g. using properly versioned data for the extensible resources)
> that will benefit from this.

Looking at what we did for the NUMA work, the objects we have returned
from the nova/virt/driver.py APIs (as defined in hardware.py) are
separate from the versioned objects we use for persisting the data in
the datbase (as defined nova/objects/numa_topology.py). So in this
case the nova-objects versioning problem doesn't leak into the virt
drivers. If solving the versioning problemm over library boundaries
isn't workable, then perhaps the separate of objects is what we should
look at. ie, the version objects be purely an internal thing for nova
common to deal with and objects to be consumed by the virt drivers are
defined by the virt driver API itself.

> >  - If changes are required to common code, the virt driver developer
> >would first have to get the necccessary pieces merged into Nova
> >common. Then the follow up virt driver specific changes could be
> >proposed to their repo. This implies that some changes to virt
> >drivers will still contend for resource in the common nova repo 
> >and team. This contention should be lower than it is today though
> >since the current nova core team should have less code to look 
> >after per-person on aggregate.
> > 
> 
> A handy example of this I can think of is the currently granted FFE for
> serial consoles - consider how much of the code went into the common
> part vs. the libvirt specific part, I would say the ratio is very close
> to 1 if not even in favour of the common part (current 4 outstanding
> patches are all for core, and out of the 5 merged - only one of them was
> purely libvirt specific, assuming virt/ will live in nova-common).
> 
> Joe asked a similar question elsewhere on the thread.

In terms of patches merged to Nova, 1385 merged in 6 months, of which
437 (30%) touched /virt/ files. This obviously doesn't distinguish
between virt driver changes that we 100% isolated inside the virt
driver from changes that touch multiple code areas.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
htt

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
> On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
>> On Thu, 4 Sep 2014 11:24:29 +0100
>> "Daniel P. Berrange"  wrote:
>>>
>>>  - A fairly significant amount of nova code would need to be
>>>considered semi-stable API. Certainly everything under nova/virt
>>>and any object which is passed in/out of the virt driver API.
>>>Changes to such APIs would have to be done in a backwards
>>>compatible manner, since it is no longer possible to lock-step
>>>change all the virt driver impls. In some ways I think this would
>>>be a good thing as it will encourage people to put more thought
>>>into the long term maintainability of nova internal code instead
>>>of relying on being able to rip it apart later, at will.
>>>
>>>  - The nova/virt/driver.py class would need to be much better
>>>specified. All parameters / return values which are opaque dicts
>>>must be replaced with objects + attributes. Completion of the
>>>objectification work is mandatory, so there is cleaner separation
>>>between virt driver impls & the rest of Nova.
>>
>> I think for this to work well with multiple repositories and drivers
>> having different priorities over implementing changes in the API it
>> would not just need to be semi-stable, but stable with versioning built
>> in from the start to allow for backwards incompatible changes. And
>> the interface would have to be very well documented including things
>> such as what exceptions are allowed to be raised through the API.
>> Hopefully this would be enforced through code as well. But as long as
>> driver maintainers are willing to commit to this extra overhead I can
>> see it working. 
> 
> With our primary REST or RPC APIs we're under quite strict rules about
> what we can & can't change - almost impossible to remove an existing
> API from the REST API for example. With the internal virt driver API
> we would probably have a little more freedom. For example, I think
> if we found an existing virt driver API that was insufficient for a
> new bit of work, we could add a new API in parallel with it, give the
> virt drivers 1 dev cycle to convert, and then permanently delete the
> original virt driver API. So a combination of that kind of API
> replacement,  versioning for some data structures/objects, and use of
> the capabilties flags would probably be sufficient. That's what I mean
> by semi-stable here - no need to maintain existing virt driver APIs
> indefinitely - we can remove & replace them in reasonably short time
> scales as long as we avoid any lock-step updates.

I have spent a lot of time over the last year working on things that
require coordinated code lands between projects it's much more
friction than you give it credit.

Every added git tree adds a non linear cost to mental overhead, and a
non linear integration cost. Realistically the reason the gate is in the
state it is has a ton to do with the fact that it's integrating 40 git
trees. Because virt drivers run in the process space of Nova Compute,
they can pretty much do whatever, and the impacts are going to be
somewhat hard to figure out.

Also, if spinning these out seems like the right idea, I think nova-core
needs to retain core rights over the drivers as well. Because there do
need to be veto authority on some of the worst craziness.

If the VMWare team stopped trying to build a distributed lock manager
inside their compute driver, or the Hyperv team didn't wait until J2 to
start pushing patches, I think there would be more trust in some of
these teams. But, I am seriously concerned in both those cases, and the
slow review there is a function of a historic lack of trust in judgment.
I also personally went on a moratorium a year ago in reviewing either
driver because entities at both places where complaining to my
management chain through back channels that I was -1ing their code...
when I was one of the few people actually trying to provide constructive
feedback (basically only Russell and I were reviewing that code in
Grizzly, everyone else was ignoring it). Things may have changed since
then, at least I see a ton of good work from tjones in making Nova
overall better, but that was a pretty bitter pill. (Sorry for the
tangent, but honestly if we are going to fix what's broken we probably
have to expose all related brokens.)


If the concern is that we are keeping out too many contributors by the
CI requirements: let's let Class C back in tree. I believe in the
Freebsd case you were one of the original opponents to a top level
driver, and that they should go through libvirt instead. But I'm cool
with them just showing up as a Class C.

But I honestly don't think the virt driver split is going to make any of
this easier, when you account for the additional overhead it's going to
create, and the work required to get there.

-Sean

-- 
Sean Dague
http://dague.net

_

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 11:29:43AM +0100, John Garbutt wrote:
> On 4 September 2014 23:48, Russell Bryant  wrote:
> > On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
> > If we ignored gerrit for a moment, is rapid increase in splitting out
> > components the ideal workflow?  Would we be better off finding a way to
> > finally just implement a model more like the Linux kernel with
> > sub-system maintainers and pull requests to a top-level tree?  Maybe.
> > I'm not convinced that split of repos is obviously better.
> 
> I was thinking along similar lines.
> 
> Regardless of that, we should try this for Kilo.
> 
> If it feels like we are getting too much driver divergence, and
> tempest is not keeping everyone inline, the community is fragmenting
> and no one is working on the core of nova, then we might have to think
> about an alternative plan for L, including bringing the drivers back
> in tree.
> 
> At least the separate repos will help us firm up the interfaces, which
> I think is a good thing.
> 
> I worry about what it means to test a feature in "nova common, nova
> api, or nova core" or whatever we call it, if there are no virt
> drivers in tree. To some extent we might want to improve the fake virt
> driver for some in-tree functional tests anyways. But thats a separate
> discussion.

I look at what we do with Ironic testing current as a guide here.
We have tempest job that runs against Nova, that validates changes
to nova don't break the separate Ironic git repo. So my thought
is that all our current tempest jobs would simply work in that
way. IOW changes to so called "nova common" would run jobs that
validate the change against all the virt driver git repos. I think
this kind of setup is pretty much mandatory for split repos to be
viable, because I don't want to see us loose testing coverage in
this proposed change.

Having a decent in-tree fake virt driver would none the less be
a nice idea, because it would allow for more complete functional
testing isolated from the risks of bugs in the virt drivers
themselves.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/04/2014 07:22 PM, Michael Still wrote:
> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  
> wrote:
> 
> [Heavy snipping because of length]
> 
>> The radical (?) solution to the nova core team bottleneck is thus to
>> follow this lead and split the nova virt drivers out into separate
>> projects and delegate their maintainence to new dedicated teams.
>>
>>  - Nova becomes the home for the public APIs, RPC system, database
>>persistent and the glue that ties all this together with the
>>virt driver API.
>>
>>  - Each virt driver project gets its own core team and is responsible
>>for dealing with review, merge & release of their codebase.
> 
> I think this is the crux of the matter. We're not doing a great job of
> landing code at the moment, because we can't keep up with the review
> workload.
> 
> So far we've had two proposals mooted:
> 
>  - slots / runways, where we try to rate limit the number of things
> we're trying to review at once to maintain focus
>  - splitting all the virt drivers out of the nova tree
> 
> Splitting the drivers out of the nova tree does come at a cost -- we'd
> need to stabilise and probably version the hypervisor driver
> interface, and that will encourage more "out of tree" drivers, which
> are things we haven't historically wanted to do. If we did this split,
> I think we need to acknowledge that we are changing policy there. It
> also means that nova-core wouldn't be the ones holding the quality bar
> for hypervisor drivers any more, I guess this would open the door for
> drivers to more actively compete on the quality of their
> implementations, which might be a good thing.
> 
> Both of these have interesting aspects, and I agree we need to do
> _something_. I do wonder if there is a hybrid approach as well though.
> For example, could we implement some sort of more formal lieutenant
> system for drivers? We've talked about it in the past but never been
> able to express how it would work in practise.
> 
> The last few days have been interesting as I watch FFEs come through.
> People post explaining their feature, its importance, and the risk
> associated with it. Three cores sign on for review. All of the ones
> I've looked at have received active review since being posted. Would
> it be bonkers to declare nova to be in "permanent feature freeze"? If
> we could maintain the level of focus we see now, then we'd be getting
> heaps more done that before.

Agreed. Honestly, this has been a really nice flow. I'd love to figure
out what part of this focus is capturable for normal cadence. This
realistically is what I was hoping slots would provide, because I feel
like we actually move really fast when we call out 5-10 things to go
look at this week.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Nikola Đipanov
On 09/05/2014 01:26 AM, Jay Pipes wrote:
> On 09/04/2014 10:33 AM, Dugger, Donald D wrote:
>> Basically +1 with what Daniel is saying (note that, as mentioned, a
>> side effect of our effort to split out the scheduler will help but
>> not solve this problem).
> 
> The difference between Dan's proposal and the Gantt split is that Dan's
> proposal features quite prominently the following:
> 
> == begin ==
> 
>  - The nova/virt/driver.py class would need to be much better
>specified. All parameters / return values which are opaque dicts
>must be replaced with objects + attributes. Completion of the
>objectification work is mandatory, so there is cleaner separation
>between virt driver impls & the rest of Nova.
> 
> == end ==
> 
> In other words, Dan's proposal above is EXACTLY what I've been saying
> needs to be done to the interfaces between nova-conductor, nova-compute,
> and nova-scheduler *before* any split of the scheduler code is even
> remotely feasible.
> 
> Splitting the scheduler out before this is done would actually not "help
> but not solve this problem" -- it would instead further the problem, IMO.
> 

I don't think it's news to anyone that I strongly agree with the above
but let me restate that once more:

+1000

Not only that - but we need to make sure the APIs are *good and sane*
too. This is where the real meat of these types of problems is really.

If you need an example of why this is so crazy important - take a look
at Cinder that did get split out, and all the grief that came from it
the API being half baked ([1], [2], but there is plenty more examples).

Actually - as I write this I think of Ironic and can't help but think
that the API is _so freakin' important_ that you actually might be
better off writing the whole thing from scratch just to get the API right.

[1] https://review.openstack.org/#/c/87546/
[2] https://bugs.launchpad.net/tempest/+bug/1302774

N.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread John Garbutt
On 5 September 2014 00:26, Jay Pipes  wrote:
> On 09/04/2014 10:33 AM, Dugger, Donald D wrote:
>>
>> Basically +1 with what Daniel is saying (note that, as mentioned, a
>> side effect of our effort to split out the scheduler will help but
>> not solve this problem).
>
>
> The difference between Dan's proposal and the Gantt split is that Dan's
> proposal features quite prominently the following:
>
> == begin ==
>
>  - The nova/virt/driver.py class would need to be much better
>specified. All parameters / return values which are opaque dicts
>must be replaced with objects + attributes. Completion of the
>objectification work is mandatory, so there is cleaner separation
>between virt driver impls & the rest of Nova.
>
> == end ==
>
> In other words, Dan's proposal above is EXACTLY what I've been saying needs
> to be done to the interfaces between nova-conductor, nova-compute, and
> nova-scheduler *before* any split of the scheduler code is even remotely
> feasible.
>
> Splitting the scheduler out before this is done would actually not "help but
> not solve this problem" -- it would instead further the problem, IMO.

Given any changes we make to the scheduler interface need to be
backwards compatible, I am not totally convinced being in a separate
repo makes things a whole lot worse, vs the review bottlenecks we
have. Anyways, I certainly agree that work needs to be done ASAP, and
if we can make that a priority in Nova, it would be much quicker and
easier to do while still inside Nova.

We have similar issues with glance, cinder and neutron right now that
need fixing soon too. I know we have patches up for some improvements
in that area, but it certainly feels like we need to do better there.

The virt driver is a step ahead of the scheduler because we know what
interface we are talking about, and we already have most of a
versioning plan in place.

I think the key work we have with the scheduler is to actually draw
out the interface (in code), so we agree what interface we need to
firm up and version. I think we are starting to get agreement on that
now, which is great.

I still think the scheduler split is as urgent as the virt split, but
the virt split is much closer to being possible right now.

At this point, it feels like all of kilo-1 gets dedicated to splitting
out these interfaces, and completing objects. But lets see what the
summit brings.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
> 
> Le 05/09/2014 01:22, Michael Still a écrit :
>> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
>>  wrote:
>>
>> [Heavy snipping because of length]
>>
>>> The radical (?) solution to the nova core team bottleneck is thus to
>>> follow this lead and split the nova virt drivers out into separate
>>> projects and delegate their maintainence to new dedicated teams.
>>>
>>>   - Nova becomes the home for the public APIs, RPC system, database
>>> persistent and the glue that ties all this together with the
>>> virt driver API.
>>>
>>>   - Each virt driver project gets its own core team and is responsible
>>> for dealing with review, merge & release of their codebase.
>> I think this is the crux of the matter. We're not doing a great job of
>> landing code at the moment, because we can't keep up with the review
>> workload.
>>
>> So far we've had two proposals mooted:
>>
>>   - slots / runways, where we try to rate limit the number of things
>> we're trying to review at once to maintain focus
>>   - splitting all the virt drivers out of the nova tree
> 
> Ahem, IIRC, there is a third proposal for Kilo :
>  - create subteam's half-cores responsible for reviewing patch's
> iterations and send to cores approvals requests once they consider the
> patch enough stable for it.
> 
> As I explained, it would allow to free up reviewing time for cores
> without loosing the control over what is being merged.

I don't really understand how the half core idea works outside of a math
equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 06:22:18PM -0500, Michael Still wrote:
> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  
> wrote:
> 
> [Heavy snipping because of length]
> 
> > The radical (?) solution to the nova core team bottleneck is thus to
> > follow this lead and split the nova virt drivers out into separate
> > projects and delegate their maintainence to new dedicated teams.
> >
> >  - Nova becomes the home for the public APIs, RPC system, database
> >persistent and the glue that ties all this together with the
> >virt driver API.
> >
> >  - Each virt driver project gets its own core team and is responsible
> >for dealing with review, merge & release of their codebase.
> 
> I think this is the crux of the matter. We're not doing a great job of
> landing code at the moment, because we can't keep up with the review
> workload.
> 
> So far we've had two proposals mooted:
> 
>  - slots / runways, where we try to rate limit the number of things
> we're trying to review at once to maintain focus

FWIW, I'm not really seeing that as a long term solution. In its
essence it is just a more effective way for us to say 'no' to our
potential contributors. While it could no doubt relieve pressure
on the core team by reducing the flow of the pipe, I don't think
it is helpful for our contributors overall.

>  - splitting all the virt drivers out of the nova tree
> 
> Splitting the drivers out of the nova tree does come at a cost -- we'd
> need to stabilise and probably version the hypervisor driver
> interface, and that will encourage more "out of tree" drivers, which
> are things we haven't historically wanted to do. If we did this split,
> I think we need to acknowledge that we are changing policy there. It
> also means that nova-core wouldn't be the ones holding the quality bar
> for hypervisor drivers any more, I guess this would open the door for
> drivers to more actively compete on the quality of their
> implementations, which might be a good thing.

There are already a number of drivers out of tree such as Docker,
Ironic (though soon to be in tree), and IIUC there's something IBM
have done for Power hypervisor, and work Oracle have done for the
Solaris virt/container technologies. Probably the distinction I'd
made is around things that are actively part of the OpenStack
community (eg on our gerrit infrastructure and or stackforge, etc),
vs things that are developed in complete isolation from the OpenStack
community.

I'm unclear what the state of play is wrt discussions on OpenStack
technology compatibility certification & trademark usage, but perhaps
that is a partial counterweight to your concern ? I'd certainly like
to see a focus on out of tree drivers remaining a strong part of the
openstack community, and not go off into their own completely isolated
world outside the community.

But yes, I am clearly proposing a change our integration policy here
and so we need need to carefully consider what that means and take
any neccessary steps to mitigate risks.

In some respects I think the split repos could allow us to raise the
bar in terms of quality. For example, with a single repo, I don't
see it ever being practical to make VMware/HyperV/XenAPI  CI systems
gating on changes, because it would push up the level of pain from
false job failures in the gate even further than today. With a separate
repo each virt driver would only need to run jobs directly related to
them, so the VMWare CI could easily be made gating on VMWare driver git
repo.

On testing in general, I think we need to look at the granularity
at which we run tests, in order to let us scale up the number of tests
we run. For example, it is suggested that each feature like disk 
encryption,  disk discard support, each vif driver, and so on, each
requires a new tempest job with appropriate settings. If we look at
the number of possible tunable knobs like, that easily implies 100's
more tempest jobs with varying configs. I don't think it is practical
to consider doing that with our setup today. With separate virt driver
repos we'd have more headroom to add a larger number of jobs since
the volume of changes being tested overall would be smaller.

> Both of these have interesting aspects, and I agree we need to do
> _something_. I do wonder if there is a hybrid approach as well though.
> For example, could we implement some sort of more formal lieutenant
> system for drivers? We've talked about it in the past but never been
> able to express how it would work in practise.

Gerrit makes it hard to express that formally due to the lack of
path based permissioning. If we do go for the virt driver split,
it would none the less be useful if we trialled a lieutenant or
sub-team model during Kilo, as a way to prepare for an eventual
driver split in L. So this is worth talking about regardless
I reckon.

I still think on balance a virt driver split is benefical since
it brings benefits beyond just the review team.

> The last few days have be

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Nikola Đipanov
On 09/04/2014 12:24 PM, Daniel P. Berrange wrote:
> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.
> 
> 
> Background information
> ==
> 
> I see many factors coming together to form the crisis
> 
>  - Burn out of core team members from over work 
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
> 
> Each item on their own may not seem too bad, but combined they
> add up to a big problem.
> 

As many others - I cannot +1 this enough. Some technical comments below
that we may want to consider before, but to sum them up - this will be a
TON OF WORK! we better make sure we really want to do this before.

However - please don't read this as FUD, maybe rather pointing out that
devil is in the details, and maybe getting ahead of myself with too deep
of a dive.

> 
>  - A fairly significant amount of nova code would need to be
>considered semi-stable API. Certainly everything under nova/virt
>and any object which is passed in/out of the virt driver API.
>Changes to such APIs would have to be done in a backwards
>compatible manner, since it is no longer possible to lock-step
>change all the virt driver impls. In some ways I think this would
>be a good thing as it will encourage people to put more thought
>into the long term maintainability of nova internal code instead
>of relying on being able to rip it apart later, at will.
> 

I think we should not underestimate how big of a job this will be. We
have been treating that API as internal for a long time and a lot of
abstractions are just broken and need to be redesigned and then
refactored. A lot of the stuff is implementation specific (live
migrations is a good example of this). What makes it more difficult is
that we need to get this as right as possible before we do the split.

Now I am not saying this cannot be done or that we shouldn't to it,
however I _am_ saying that we should not take lightly how much work
there will be and how fiddly the work itself is.

On top of that - there are some other serious issues with nova common
code that we need to take care of ASAP, and this will definitely
increase the churn and make that more difficult. We should take this
into account and make sure we are focusing efforts on the right things.
Making sure we do is the biggest challenge nova core faces in addition
to all the others mentioned above.

>  - The nova/virt/driver.py class would need to be much better
>specified. All parameters / return values which are opaque dicts
>must be replaced with objects + attributes. Completion of the
>objectification work is mandatory, so there is cleaner separation
>between virt driver impls & the rest of Nova.
> 

Not only that - currently nova-objects do their versioning magic only
over RPC, while they would have to do it over library boundaries. This
in itself will require work, and is likely going to influence how we
stabilize the API.

However - splitting out the scheduler is likely to require objects to be
able to do similar things, and there are other things that we may want
to do (e.g. using properly versioned data for the extensible resources)
that will benefit from this.

>  - If changes are required to common code, the virt driver developer
>would first have to get the necccessary pieces merged into Nova
>common. Then the follow up virt driver specific changes could be
>proposed to their repo. This implies that some changes to virt
>drivers will still contend for resource in the common nova repo 
>and team. This contention should be lower than it is today though
>since the current nova core team should have less code to look 
>after per-person on aggregate.
> 

A handy example of this I can think of is the currently granted FFE for
serial consoles - consider how much of the code went into the common

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread John Garbutt
On 4 September 2014 23:48, Russell Bryant  wrote:
> On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
>> Position statement
>> ==
>>
>> Over the past year I've increasingly come to the conclusion that
>> Nova is heading for (or probably already at) a major crisis. If
>> steps are not taken to avert this, the project is likely to loose
>> a non-trivial amount of talent, both regular code contributors and
>> core team members. That includes myself. This is not good for
>> Nova's long term health and so should be of concern to anyone
>> involved in Nova and OpenStack.
>>
>> For those who don't want to read the whole mail, the executive
>> summary is that the nova-core team is an unfixable bottleneck
>> in our development process with our current project structure.
>> The only way I see to remove the bottleneck is to split the virt
>> drivers out of tree and let them all have their own core teams
>> in their area of code, leaving current nova core to focus on
>> all the common code outside the virt driver impls. I, now, none
>> the less urge people to read the whole mail.
>
> Fantastic write-up.  I can't +1 enough the problem statement, which I
> think you've done a nice job of framing.  We've taken steps to try to
> improve this, but none of them have been big enough.  I feel we've
> reached a tipping point.  I think many others do too, and several
> proposals being discussed all seem rooted in this same core issue.

+1

I totally agree we need to split Nova up further, there just didn't
seem to be the support for this before now.

Not yet sure the virt drivers are the best split, but we already have
sub-teams ready to take them on, so it will probably work for that
reason.

> If we ignored gerrit for a moment, is rapid increase in splitting out
> components the ideal workflow?  Would we be better off finding a way to
> finally just implement a model more like the Linux kernel with
> sub-system maintainers and pull requests to a top-level tree?  Maybe.
> I'm not convinced that split of repos is obviously better.

I was thinking along similar lines.

Regardless of that, we should try this for Kilo.

If it feels like we are getting too much driver divergence, and
tempest is not keeping everyone inline, the community is fragmenting
and no one is working on the core of nova, then we might have to think
about an alternative plan for L, including bringing the drivers back
in tree.

At least the separate repos will help us firm up the interfaces, which
I think is a good thing.

I worry about what it means to test a feature in "nova common, nova
api, or nova core" or whatever we call it, if there are no virt
drivers in tree. To some extent we might want to improve the fake virt
driver for some in-tree functional tests anyways. But thats a separate
discussion.

> I don't think we can afford to wait much longer without drastic change,
> so let's make it happen.

+1

But I do think we should try and go further...

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.

Nova-network: Maybe there isn't a big enough community to support this
right now, but we need to actually delete this, or pull it out of
nova-core.

API: I suspect we might want to also look at splitting out the API
from Nova common too. This one is a slightly more drastic, and needs
more pre-split work (and is very related to making cells a first class
concept), but I am still battling with that inside my head.

Oslo: I suspect we may need to do something around the virt utilities,
so they are easy to share, but there are probably other opportunities
too.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
> On Thu, 4 Sep 2014 11:24:29 +0100
> "Daniel P. Berrange"  wrote:
> > 
> >  - A fairly significant amount of nova code would need to be
> >considered semi-stable API. Certainly everything under nova/virt
> >and any object which is passed in/out of the virt driver API.
> >Changes to such APIs would have to be done in a backwards
> >compatible manner, since it is no longer possible to lock-step
> >change all the virt driver impls. In some ways I think this would
> >be a good thing as it will encourage people to put more thought
> >into the long term maintainability of nova internal code instead
> >of relying on being able to rip it apart later, at will.
> > 
> >  - The nova/virt/driver.py class would need to be much better
> >specified. All parameters / return values which are opaque dicts
> >must be replaced with objects + attributes. Completion of the
> >objectification work is mandatory, so there is cleaner separation
> >between virt driver impls & the rest of Nova.
> 
> I think for this to work well with multiple repositories and drivers
> having different priorities over implementing changes in the API it
> would not just need to be semi-stable, but stable with versioning built
> in from the start to allow for backwards incompatible changes. And
> the interface would have to be very well documented including things
> such as what exceptions are allowed to be raised through the API.
> Hopefully this would be enforced through code as well. But as long as
> driver maintainers are willing to commit to this extra overhead I can
> see it working. 

With our primary REST or RPC APIs we're under quite strict rules about
what we can & can't change - almost impossible to remove an existing
API from the REST API for example. With the internal virt driver API
we would probably have a little more freedom. For example, I think
if we found an existing virt driver API that was insufficient for a
new bit of work, we could add a new API in parallel with it, give the
virt drivers 1 dev cycle to convert, and then permanently delete the
original virt driver API. So a combination of that kind of API
replacement,  versioning for some data structures/objects, and use of
the capabilties flags would probably be sufficient. That's what I mean
by semi-stable here - no need to maintain existing virt driver APIs
indefinitely - we can remove & replace them in reasonably short time
scales as long as we avoid any lock-step updates.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Christopher Yeoh
On Thu, 4 Sep 2014 12:57:57 -0700
Joe Gordon  wrote:

> 
> Overall I do think we need to re-think how the review burden is
> distributed. That being said, this is a nice proposal but I am not
> sure if it moves the review burden around enough or is the right
> approach. Do you have any rough numbers on what percent of the review
> burden goes to virt drivers today (how ever you want to define that
> statement, number of merged patches, man hours, lines of code, number
> of reviews  etc.). If for example today the nova review team spends
> 10% of there review time on virt drivers then I don't think this
> proposal will have a significant impact on the review backlog (for
> nova-common).

Even if it doesn't have a huge impact on the review backlog for
nova-common (I think it should at least help a bit) it does have the
potential to make life much easier for the virt driver developers. 

I think my main concern is around testing - as soon as we have multiple
repositories involved I think debugging of test failures
(especially races) tends to get more complicated and we have fewer
people who are familiar enough with the two code bases. 

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Christopher Yeoh
On Thu, 4 Sep 2014 11:24:29 +0100
"Daniel P. Berrange"  wrote:
> 
>  - A fairly significant amount of nova code would need to be
>considered semi-stable API. Certainly everything under nova/virt
>and any object which is passed in/out of the virt driver API.
>Changes to such APIs would have to be done in a backwards
>compatible manner, since it is no longer possible to lock-step
>change all the virt driver impls. In some ways I think this would
>be a good thing as it will encourage people to put more thought
>into the long term maintainability of nova internal code instead
>of relying on being able to rip it apart later, at will.
> 
>  - The nova/virt/driver.py class would need to be much better
>specified. All parameters / return values which are opaque dicts
>must be replaced with objects + attributes. Completion of the
>objectification work is mandatory, so there is cleaner separation
>between virt driver impls & the rest of Nova.

I think for this to work well with multiple repositories and drivers
having different priorities over implementing changes in the API it
would not just need to be semi-stable, but stable with versioning built
in from the start to allow for backwards incompatible changes. And
the interface would have to be very well documented including things
such as what exceptions are allowed to be raised through the API.
Hopefully this would be enforced through code as well. But as long as
driver maintainers are willing to commit to this extra overhead I can
see it working. 

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 06:48:33PM -0400, Russell Bryant wrote:
> On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
> > Position statement
> > ==
> > 
> > Over the past year I've increasingly come to the conclusion that
> > Nova is heading for (or probably already at) a major crisis. If
> > steps are not taken to avert this, the project is likely to loose
> > a non-trivial amount of talent, both regular code contributors and
> > core team members. That includes myself. This is not good for
> > Nova's long term health and so should be of concern to anyone
> > involved in Nova and OpenStack.
> > 
> > For those who don't want to read the whole mail, the executive
> > summary is that the nova-core team is an unfixable bottleneck
> > in our development process with our current project structure.
> > The only way I see to remove the bottleneck is to split the virt
> > drivers out of tree and let them all have their own core teams
> > in their area of code, leaving current nova core to focus on
> > all the common code outside the virt driver impls. I, now, none
> > the less urge people to read the whole mail.
> 
> Fantastic write-up.  I can't +1 enough the problem statement, which I
> think you've done a nice job of framing.  We've taken steps to try to
> improve this, but none of them have been big enough.  I feel we've
> reached a tipping point.  I think many others do too, and several
> proposals being discussed all seem rooted in this same core issue.
> 
> When it comes to the proposed solution, I'm +1 on that too, but part of
> that is that it's hard for me to ignore the limitations placed on us by
> our current review infrastructure (gerrit).
> 
> If we ignored gerrit for a moment, is rapid increase in splitting out
> components the ideal workflow?  Would we be better off finding a way to
> finally just implement a model more like the Linux kernel with
> sub-system maintainers and pull requests to a top-level tree?  Maybe.
> I'm not convinced that split of repos is obviously better.
> 
> You make some good arguments for why splitting has other benefits.

For a long time I've use the LKML 'subsystem maintainers' model as the
reference point for ideas. In a more LKML like model, each virt team
(or other subsystem team) would have their own separate GIT repo with
a complete Nova codebase, where they did they day to day code submissions,
reviews and merges. Periodically the primary subsystem maintainer would
submit a large pull / merge requests to the overall Nova maintainer.
The $1,000,000 question in such a model is what kind of code review
happens during the big pull requests to integrate subsystem trees. 

The closest example I can see is what's happening with the Ironic
driver merge reviews. I'm personally finding review of that to be
quite a burdensome activity, because all comments on the merge
review then get fed back to the orginal maintainers who do a new
round of patch + review in Ironic tree and then we get a new version
submitted back to nova tree for merge. Rinse, repeat.

So my biggest fear with a model where each team had their own full
Nova tree and did large pull requests, is that we'd suffer major
pain during the merging of large pull requests, especially if any
of the merges touched common code. It could make the pull requests
take a really long time to get accepted into the primary repo.

By constrast with split out git repos per virt driver code, we will
only ever have 1 stage of code review for each patch. Changes to
common code would go straight to main nova common repo and so get
reviewed by the experts there without delay, avoiding the 2nd stage
of review from merge requests.

The more I think abut this, the more attracted I am to the idea
that separate repos will facilitate us doing more targetted testing
and allow 3rd party CI to become gating over their respective virt
driver codebases.

Finally the LKML model would still leave some drivers at a disadvantage
for development, if they're not able to meet the standards we require
in terms of CI testing, to be accepted into the primary repo.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 12:57:57PM -0700, Joe Gordon wrote:
> On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange 
> wrote:
> > Proposal / solution
> > ===
> >
> > In the past Nova has spun out its volume layer to form the cinder
> > project. The Neutron project started as an attempt to solve the
> > networking space, and ultimately replace the nova-network. It
> > is likely that the schedular will be spun out to a separate project.
> >
> > Now Neutron itself has grown so large and successful that it is
> > considering going one step further and spinning its actual drivers
> > out of tree into standalone add-on projects [4]. I've heard on the
> > grapevine that Ironic is considering similar steps for hardware
> > drivers.
> >
> > The radical (?) solution to the nova core team bottleneck is thus to
> > follow this lead and split the nova virt drivers out into separate
> > projects and delegate their maintainence to new dedicated teams.
> >
> >  - Nova becomes the home for the public APIs, RPC system, database
> >persistent and the glue that ties all this together with the
> >virt driver API.
> >
> >  - Each virt driver project gets its own core team and is responsible
> >for dealing with review, merge & release of their codebase.
> >
> 
> Overall I do think we need to re-think how the review burden is
> distributed. That being said, this is a nice proposal but I am not sure if
> it moves the review burden around enough or is the right approach. Do you
> have any rough numbers on what percent of the review burden goes to virt
> drivers today (how ever you want to define that statement, number of merged
> patches, man hours, lines of code, number of reviews  etc.). If for example
> today the nova review team spends 10% of there review time on virt drivers
> then I don't think this proposal will have a significant impact on the
> review backlog (for nova-common).

I'm a little wary of doing too many stats on things like reviews and
patches, because I fear it does not capture the full picture. Specifically
we're turning away contributors before they ever get to the point of
submitting reviews / patches, by rejecting their blueprints/specs.
Also the difficultly of getting stuff reviewed is discouraging people
even considering doing alot of work in the first place - if I had had the
confidence in getting it reviewed & merged I would easily have submitted
twice as much code to libvirt this cycle, but as it was I didn't even
start work on most things I would have liked to.

That said though, in the past 6 months we had 1385 changes merged.
Of those, 437 touched at least one file in the /virt/ directory
which is approximately 30%.

I agree though, this proposal will not have a dramatic effect on
the review backlog for the nova common code. It would probably be
a small (but noticable) improvement - most of the benefit would
fall on the virt drivers I expect. If we can make Nova a more
productive & enjoyable place to contribute though, this should
ultimately feed through into more people being involved in general
and thus more resource available to nova common too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 10:44:17PM -0600, John Griffith wrote:
> Just some thoughts and observations I've had regarding this topic in Cinder
> the past couple of years.  I realize this is a Nova thread so hopefully
> some of this can be applied in a more general context.
> 
> TLDR:
> 1. I think moving drivers into their own repo is just shoveling the pile to
> make a new pile (not really solving anything)

I'm not familiar with Cinder, but for Nova it would certainly have clear
benefits and not merely be shoveling the pile. Specifically it would

 - Easily let us double the number of "core" reviewers on aggregate

 - Reduce the bar for getting into a driver core team thus increasing
   the talent pool we can promote from.

 - Work accepted in a release for one driver would not reduce the
   bandwidth for another driver to accept work, since their review
   teams are separate

 - We can have more targetted testing, which will reduce the amount
   of bogus gate failures people get when submitting reviews and
   allow every driver to have gating CI jobs without impacting the
   other drivers

> 2. Removal of drivers other than the reference implementation for each
> project could be the healthiest option
> a. Requires transparent, public, automated 3'rd party CI
> b. Requires a TRUE plugin architecture and mentality
> c. Requires a stable and well defined API

As mentioned in the original mail I don't want to see a situation where
we end up with some drivers in tree and others out of tree as it sets up
bad dynamics within the project. Those out of tree will always have the
impression of being second class citizens and thus there will be constant
pressure to accept drivers back into tree. The so called 'reference'
driver that stayed in tree would also continue to be penalized in the
way it is today, and so its development would be disadvantaged compared
to the out of tree drivers.

> 3. While I'm still sort of a fan of the removal of drivers, I do think
> Cinder is "making it work", there have been missteps and yes it's a pain
> sometimes but it's working "ok" and we've got plans to try and improve
> 
> 4. Adding restrictions like drivers only in first milestone and more
> intense scrutinization of features will go a long way to help resolve the
> issues we do have currently

Not in nova at least. We have a fundamental bottleneck in nova and
simply re-arranging review priorities in this kind of way will never
fix it. We've tried many different approaches to prioritization of
work and the only result is that we've got more aggressive at saying
no to contributors. This is directly resulting in the crisis we have
today.

> I've spent a fair amount of time thinking about the explosive number of
> drivers being added to Cinder over the past year or so.  I've been a pretty
> vocal proponent of the idea of removing all drivers except the LVM
> reference implementation from Cinder.  I'd rather see Vendors drivers
> maintained in their own Github Repo and truly follow a "plugin" model.
>  This of course means that Cinder has to be truly designed and maintained
> with a real plugin architecture kept in mind in every aspect of development
> (experience proves this harder to do than it sounds).  I think with things
> stable and well defined interfaces as well as 3'rd party CI this is
> actually a reasonable approach and could be effective.  I do not see how
> creating a separate repo and in essence yet another set of OpenStack
> Projects really helps with the problem.  The fact is that the biggest issue
> most people see with driver contributions is those that are made by
> organizations that work on their driver only and don't contribute back to
> the core project (whether that be in the form of reviews of core
> contributions).  I'm not sure I understand why that would be any different
> by just putting the code in a separate bucket.  In other words, getting a
> solid and consistent team working on that "project" seems like you've just
> kicked the can down the road so you don't have to deal with it.

Fundamentally people contributing to a project are doing so voluntarily
to scratch their own itch. The project leadership can help identify areas
that need work and encourage people to take up the challenge, but you
cannot force people to do the work. We've done many things in nova that
are basically inflicting a form of punishment on contributors if they
don't work on things we tell them to work on. This is not having a positive
effect, on the contrary it is resulting in alot of demovated and pissed off
contributors who are ultimately leaving the project.

I agree that splitting the virt drivers out into their own repositories is
not going to hugely help get more people to work on Nova core - that was
not the primary intention. The big focus is on unblocking development of
the virt drivers so that their contributors actually feeled their efforts
are valued by the project. If we make the project a more attracti

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 02:56:04PM -0500, Kyle Mestery wrote:
> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  
> wrote:
> > Proposal / solution
> > ===
> >
> > In the past Nova has spun out its volume layer to form the cinder
> > project. The Neutron project started as an attempt to solve the
> > networking space, and ultimately replace the nova-network. It
> > is likely that the schedular will be spun out to a separate project.
> >
> > Now Neutron itself has grown so large and successful that it is
> > considering going one step further and spinning its actual drivers
> > out of tree into standalone add-on projects [4]. I've heard on the
> > grapevine that Ironic is considering similar steps for hardware
> > drivers.
> >
> I just wanted to note that this is a huge problem in Neutron, and it
> gets worse with each release as we add on more drivers and plugins
> which carry a maintenance cost without gaining any new reviewers from
> the companies who have the drivers. The rough plan I have for Neutron
> involves moving all non-Open Source drivers out of tree into a
> separate git repository. Your message has made me think that perhaps
> we in Neutron should go one step further and even remove the Open
> Source drivers, leaving the in-tree implementation as the only one
> there. Where we move these is the main issue. Given we have 20+
> drivers/plugins now, one git repository per driver/plugin won't scale,
> as we add 3-5 each cycle. So perhaps a single repository is the best
> idea here, with shared reviews from vendors across each other's code.

While I'll make no secret of my dislike for closed source software,
my feeling is that OpenStack as a project is explicitly welcoming
closed source software & vendors, not least by virtue of using a
more permissive Apache license instead of a strong copyleft license
like GPL. So given the project's stance, I'd not be in favour of
discriminating against drivers for closed source software.

In actual fact though, the premise of my proposal is the idea that
moving a driver out of tree will actually help its development by
giving its team much greater freedom & responsbility. So by only
moving out non-open source drivers, we'd arguably be putting the
in-tree open source drivers at a disadvantage ! I'm also very much
drawn to the idea that having separate repos will let us do more
targetted setup of CI test jobs, so each test job is actually
directly relevant to the code being tested.

I can see your concern about the number of drivers you have in
Neutron and the frequency with which more are added. We don't
have anywhere near this number in Nova and are not likely to
ever grow that much. If you did have 30 separate drivers and
thus 30 separate GIT repos though, the question to consider is
who is ultimately responsible for reviewing those drivers. If
each of those 30 drivers had their own self-organized team of
people the burden of 30 repos is not as bad as it seems, since
any one person would probably only be concerned with a couple
of git repos.  If you still see the single neutron core team
being responsible for each of those repos, then I can see that
having 30 repos would be a big burden. I don't think there is
a single right answer here for all OpenStack projects. It is
entirely conceivable that it might be best for Neutron to have
a single repo for a set of driver, while being best for Nova
to have a separate repo for each driver.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

  - Nova becomes the home for the public APIs, RPC system, database
persistent and the glue that ties all this together with the
virt driver API.

  - Each virt driver project gets its own core team and is responsible
for dealing with review, merge & release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

  - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
  - splitting all the virt drivers out of the nova tree


Ahem, IIRC, there is a third proposal for Kilo :
 - create subteam's half-cores responsible for reviewing patch's 
iterations and send to cores approvals requests once they consider the 
patch enough stable for it.


As I explained, it would allow to free up reviewing time for cores 
without loosing the control over what is being merged.


-Sylvain


Splitting the drivers out of the nova tree does come at a cost -- we'd
need to stabilise and probably version the hypervisor driver
interface, and that will encourage more "out of tree" drivers, which
are things we haven't historically wanted to do. If we did this split,
I think we need to acknowledge that we are changing policy there. It
also means that nova-core wouldn't be the ones holding the quality bar
for hypervisor drivers any more, I guess this would open the door for
drivers to more actively compete on the quality of their
implementations, which might be a good thing.

Both of these have interesting aspects, and I agree we need to do
_something_. I do wonder if there is a hybrid approach as well though.
For example, could we implement some sort of more formal lieutenant
system for drivers? We've talked about it in the past but never been
able to express how it would work in practise.

The last few days have been interesting as I watch FFEs come through.
People post explaining their feature, its importance, and the risk
associated with it. Three cores sign on for review. All of the ones
I've looked at have received active review since being posted. Would
it be bonkers to declare nova to be in "permanent feature freeze"? If
we could maintain the level of focus we see now, then we'd be getting
heaps more done that before.

These issues should very definitely be on the agenda for the design
summit, probably early in the week.

Michael




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that 
Dan's proposal features quite prominently the following:


== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying 
needs to be done to the interfaces between nova-conductor, 
nova-compute, and nova-scheduler *before* any split of the scheduler 
code is even remotely feasible.


Splitting the scheduler out before this is done would actually not 
"help but not solve this problem" -- it would instead further the 
problem, IMO.




Jay, we agreed on a plan to carry on, please be sure we're working on 
it, see the Gantt meetings logs for what my vision is.



That said, I think this concern of clean interfaces also applies to this 
thread: if we want to spin off the virt drivers out of Nova git repo, 
that does requires a cleanup on the interfaces, in particular on the 
compute manager and the resource tracker, where a lot of bits are still 
strongly tied and not versionified (thanks to JSON dicts).


So, this effort requires at least one cycle, and as Dan stated, there is 
urgency, so I think we need to identify a short-term solution which 
doesn't require refactoring. My personal opinion is what Russell and 
Thierry expressed, ie. subteam delegation (to what I call "half-cores") 
for iterations and only approvals for cores.


-Sylvain



Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread John Griffith
On Thu, Sep 4, 2014 at 4:32 PM, Jay Pipes  wrote:

>
>
> On 09/04/2014 12:11 PM, Duncan Thomas wrote:
>
>> I think that having a shared review team across all of the drivers
>> has definite benefits in terms of coherency and consistency - it is
>> very easy for experts on one technology to become tunnel-visioned on
>> some points and miss the wider, cross project picture. A common
>> drivers team is likely to have a broad enough range of opinions to
>> keep things healthy, compared to one repo (and team) per driver, and
>> also they are able to speak collectively to teh core nova team, which
>> helps set priorities there when they need to be influenced on behalf
>> of the drivers team.
>>
>
> In theory, the above sounds good. In practice, it doesn't happen. The code
> in the virt drivers is horribly inconsistent, duplicative and yet slightly
> and pointlessly different, and uses paradigms that make sense for the one
> platform but don't necessarily make sense for another platform.
>
> The testing/CI benefits that Dan highlighted -- in terms of patches to
> non-related virt drivers not interfering with the stability and progress of
> a patch to another virt driver -- is the #1 critical benefit to Dan's
> proposal, and doing a single virt drivers core team and repo totally throws
> that benefit away.
>
> Best,
> -jay
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Just some thoughts and observations I've had regarding this topic in Cinder
the past couple of years.  I realize this is a Nova thread so hopefully
some of this can be applied in a more general context.

TLDR:
1. I think moving drivers into their own repo is just shoveling the pile to
make a new pile (not really solving anything)

2. Removal of drivers other than the reference implementation for each
project could be the healthiest option
a. Requires transparent, public, automated 3'rd party CI
b. Requires a TRUE plugin architecture and mentality
c. Requires a stable and well defined API

3. While I'm still sort of a fan of the removal of drivers, I do think
Cinder is "making it work", there have been missteps and yes it's a pain
sometimes but it's working "ok" and we've got plans to try and improve

4. Adding restrictions like drivers only in first milestone and more
intense scrutinization of features will go a long way to help resolve the
issues we do have currently

Now the long winded version with a little more detail and context;





I've spent a fair amount of time thinking about the explosive number of
drivers being added to Cinder over the past year or so.  I've been a pretty
vocal proponent of the idea of removing all drivers except the LVM
reference implementation from Cinder.  I'd rather see Vendors drivers
maintained in their own Github Repo and truly follow a "plugin" model.
 This of course means that Cinder has to be truly designed and maintained
with a real plugin architecture kept in mind in every aspect of development
(experience proves this harder to do than it sounds).  I think with things
stable and well defined interfaces as well as 3'rd party CI this is
actually a reasonable approach and could be effective.  I do not see how
creating a separate repo and in essence yet another set of OpenStack
Projects really helps with the problem.  The fact is that the biggest issue
most people see with driver contributions is those that are made by
organizations that work on their driver only and don't contribute back to
the core project (whether that be in the form of reviews of core
contributions).  I'm not sure I understand why that would be any different
by just putting the code in a separate bucket.  In other words, getting a
solid and consistent team working on that "project" seems like you've just
kicked the can down the road so you don't have to deal with it.

Any time I've mentioned the removal approach the response is typically that
there's no quality control, or that Vendors won't be as willing to invest
in OpenStack because they can focus on their own interests and get by with
that.  The quality control one was a tough one to counter, but now that
we're moving towards things like 3'rd party CI I'm not sure that's quite as
significant as it was a year ago.  I'd still like to see a public record of
testing in the form of CI, NOT just Vendor-A submitting something that
says.. "yeah, I'm awesome".  I suspect that OpenStack adopters would look
at things like public CI postings to determine what's worth pursuing and
what's not.

The other concern I had in the past was "we'd loose valuable contributors".
 There are vendors that are directly responsible for providing us with some
great contributors in the Core of the Cinder project.  They do a great job
of balancing the tactical and strategic interests, and the concern is that
if the driv

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Russell Bryant


- Original Message -
> On 09/04/2014 11:32 AM, Vladik Romanovsky wrote:
> > +1
> >
> > I very much agree with Dan's the propsal.
> >
> > I am concerned about difficulties we will face with merging
> > patches that spreads accross various regions: manager, conductor,
> > scheduler, etc..
> > However, I think, this is a small price to pay for having a more focused
> > teams.
> >
> > IMO, we will stiil have to pay it, the moment the scheduler will separate.
> 
> There will be more pain the moment the scheduler separates, IMO,
> especially with its current design and interfaces.

I absolutely agree that the scheduler split is a non-starter without 
stabilizing all of the relevant interfaces.  I hope there's not much debate on 
that high level point.  Of course, identifying exactly what those interfaces 
should be a bit more complicated, but I hope the focus can stay there.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes



On 09/04/2014 12:11 PM, Duncan Thomas wrote:

I think that having a shared review team across all of the drivers
has definite benefits in terms of coherency and consistency - it is
very easy for experts on one technology to become tunnel-visioned on
some points and miss the wider, cross project picture. A common
drivers team is likely to have a broad enough range of opinions to
keep things healthy, compared to one repo (and team) per driver, and
also they are able to speak collectively to teh core nova team, which
helps set priorities there when they need to be influenced on behalf
of the drivers team.


In theory, the above sounds good. In practice, it doesn't happen. The 
code in the virt drivers is horribly inconsistent, duplicative and yet 
slightly and pointlessly different, and uses paradigms that make sense 
for the one platform but don't necessarily make sense for another platform.


The testing/CI benefits that Dan highlighted -- in terms of patches to 
non-related virt drivers not interfering with the stability and progress 
of a patch to another virt driver -- is the #1 critical benefit to Dan's 
proposal, and doing a single virt drivers core team and repo totally 
throws that benefit away.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that Dan's 
proposal features quite prominently the following:


== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls & the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying 
needs to be done to the interfaces between nova-conductor, nova-compute, 
and nova-scheduler *before* any split of the scheduler code is even 
remotely feasible.


Splitting the scheduler out before this is done would actually not "help 
but not solve this problem" -- it would instead further the problem, IMO.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Michael Still
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  wrote:

[Heavy snipping because of length]

> The radical (?) solution to the nova core team bottleneck is thus to
> follow this lead and split the nova virt drivers out into separate
> projects and delegate their maintainence to new dedicated teams.
>
>  - Nova becomes the home for the public APIs, RPC system, database
>persistent and the glue that ties all this together with the
>virt driver API.
>
>  - Each virt driver project gets its own core team and is responsible
>for dealing with review, merge & release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

 - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
 - splitting all the virt drivers out of the nova tree

Splitting the drivers out of the nova tree does come at a cost -- we'd
need to stabilise and probably version the hypervisor driver
interface, and that will encourage more "out of tree" drivers, which
are things we haven't historically wanted to do. If we did this split,
I think we need to acknowledge that we are changing policy there. It
also means that nova-core wouldn't be the ones holding the quality bar
for hypervisor drivers any more, I guess this would open the door for
drivers to more actively compete on the quality of their
implementations, which might be a good thing.

Both of these have interesting aspects, and I agree we need to do
_something_. I do wonder if there is a hybrid approach as well though.
For example, could we implement some sort of more formal lieutenant
system for drivers? We've talked about it in the past but never been
able to express how it would work in practise.

The last few days have been interesting as I watch FFEs come through.
People post explaining their feature, its importance, and the risk
associated with it. Three cores sign on for review. All of the ones
I've looked at have received active review since being posted. Would
it be bonkers to declare nova to be in "permanent feature freeze"? If
we could maintain the level of focus we see now, then we'd be getting
heaps more done that before.

These issues should very definitely be on the agenda for the design
summit, probably early in the week.

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 09:36 AM, Gary Kotton wrote:

Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.


Clearly you have never looked at the Gerrit source code.

:)

-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Russell Bryant
On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.

Fantastic write-up.  I can't +1 enough the problem statement, which I
think you've done a nice job of framing.  We've taken steps to try to
improve this, but none of them have been big enough.  I feel we've
reached a tipping point.  I think many others do too, and several
proposals being discussed all seem rooted in this same core issue.

When it comes to the proposed solution, I'm +1 on that too, but part of
that is that it's hard for me to ignore the limitations placed on us by
our current review infrastructure (gerrit).

If we ignored gerrit for a moment, is rapid increase in splitting out
components the ideal workflow?  Would we be better off finding a way to
finally just implement a model more like the Linux kernel with
sub-system maintainers and pull requests to a top-level tree?  Maybe.
I'm not convinced that split of repos is obviously better.

You make some good arguments for why splitting has other benefits.
Besides, even if we weren't going to split them and instead wanted to
have separate branches, we'd have to take interface stability much more
seriously.   I think the work immediately needed overlaps quite a bit.

In any case, let's not completely side-tracked on the end game workflow.
 I am completely on board with the idea that we have to move to a model
that involves more than one team and spreading out the responsibility
further than we have thus far.

I don't think we can afford to wait much longer without drastic change,
so let's make it happen.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Joe Gordon
On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange 
wrote:

> Position statement
> ==
>
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
>
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.
>
>
> Background information
> ==
>
> I see many factors coming together to form the crisis
>
>  - Burn out of core team members from over work
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
>
> Each item on their own may not seem too bad, but combined they
> add up to a big problem.
>
> Core team burn out
> --
>
> Having been involved in Nova for several dev cycles now, it is clear
> that the backlog of code up for review never goes away. Even
> intensive code review efforts at various points in the dev cycle
> makes only a small impact on the backlog. This has a pretty
> significant impact on core team members, as their work is never
> done. At best, the dial is sometimes set to 10, instead of 11.
>
> Many people, myself included, have built tools to help deal with
> the reviews in a more efficient manner than plain gerrit allows
> for. These certainly help, but they can't ever solve the problem
> on their own - just make it slightly more bearable. And this is
> not even considering that core team members might have useful
> contributions to make in ways beyond just code review. Ultimately
> the workload is just too high to sustain the levels of review
> required, so core team members will eventually burn out (as they
> have done many times already).
>
> Even if one person attempts to take the initiative to heavily
> invest in review of certain features it is often to no avail.
> Unless a second dedicated core reviewer can be found to 'tag
> team' it is hard for one person to make a difference. The end
> result is that a patch is +2d and then sits idle for weeks or
> more until a merge conflict requires it to be reposted at which
> point even that one +2 is lost. This is a pretty demotivating
> outcome for both reviewers & the patch contributor.
>
>
> New core team talent
> 
>
> It can't escape attention that the Nova core team does not grow
> in size very often. When Nova was younger and its code base was
> smaller, it was easier for contributors to get onto core because
> the base level of knowledge required was that much smaller. To
> get onto core today requires a major investment in learning Nova
> over a year or more. Even people who potentially have the latent
> skills may not have the time available to invest in learning the
> entire of Nova.
>
> With the number of reviews proposed to Nova, the core team should
> probably be at least double its current size[1]. There is plenty of
> expertize in the project as a whole but it is typically focused
> into specific areas of the codebase. There is nowhere we can find
> 20 more people with broad knowledge of the codebase who could be
> promoted even over the next year, let alone today. This is ignoring
> that many existing members of core are relatively inactive due to
> burnout and so need replacing. That means we really need another
> 25-30 people for core. That's not going to happen.
>
>
> Code review delays
> --
>
> The obvious result of having too much work for too few reviewers
> is that code contributors face major delays in getting their work
> reviewed and merged. From personal experience, during Juno, I've
> probably spent 1 week in aggregate on actual code development vs
> 8 weeks on waiting on code review. You have to constantly be on
> alert for review comments because unless you can respond quickly
> (and repost) while you still have the attention of the reviewer,
> they may not be look again for days/weeks.
>
> The length of time to get work merged serves as a demotivator to
> actually do work in the first place. I've personally avoided doing
> alot of code refactoring & cleanup work that would improve the
> maintainability of th

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Kyle Mestery
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange  wrote:
> Position statement
> ==
>
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
>
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.
>
As others have said, thanks for writing this up Daniel.

>
> Background information
> ==
>
> I see many factors coming together to form the crisis
>
>  - Burn out of core team members from over work
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
>
> Each item on their own may not seem too bad, but combined they
> add up to a big problem.
>
> Core team burn out
> --
>
> Having been involved in Nova for several dev cycles now, it is clear
> that the backlog of code up for review never goes away. Even
> intensive code review efforts at various points in the dev cycle
> makes only a small impact on the backlog. This has a pretty
> significant impact on core team members, as their work is never
> done. At best, the dial is sometimes set to 10, instead of 11.
>
> Many people, myself included, have built tools to help deal with
> the reviews in a more efficient manner than plain gerrit allows
> for. These certainly help, but they can't ever solve the problem
> on their own - just make it slightly more bearable. And this is
> not even considering that core team members might have useful
> contributions to make in ways beyond just code review. Ultimately
> the workload is just too high to sustain the levels of review
> required, so core team members will eventually burn out (as they
> have done many times already).
>
> Even if one person attempts to take the initiative to heavily
> invest in review of certain features it is often to no avail.
> Unless a second dedicated core reviewer can be found to 'tag
> team' it is hard for one person to make a difference. The end
> result is that a patch is +2d and then sits idle for weeks or
> more until a merge conflict requires it to be reposted at which
> point even that one +2 is lost. This is a pretty demotivating
> outcome for both reviewers & the patch contributor.
>
>
> New core team talent
> 
>
> It can't escape attention that the Nova core team does not grow
> in size very often. When Nova was younger and its code base was
> smaller, it was easier for contributors to get onto core because
> the base level of knowledge required was that much smaller. To
> get onto core today requires a major investment in learning Nova
> over a year or more. Even people who potentially have the latent
> skills may not have the time available to invest in learning the
> entire of Nova.
>
> With the number of reviews proposed to Nova, the core team should
> probably be at least double its current size[1]. There is plenty of
> expertize in the project as a whole but it is typically focused
> into specific areas of the codebase. There is nowhere we can find
> 20 more people with broad knowledge of the codebase who could be
> promoted even over the next year, let alone today. This is ignoring
> that many existing members of core are relatively inactive due to
> burnout and so need replacing. That means we really need another
> 25-30 people for core. That's not going to happen.
>
>
> Code review delays
> --
>
> The obvious result of having too much work for too few reviewers
> is that code contributors face major delays in getting their work
> reviewed and merged. From personal experience, during Juno, I've
> probably spent 1 week in aggregate on actual code development vs
> 8 weeks on waiting on code review. You have to constantly be on
> alert for review comments because unless you can respond quickly
> (and repost) while you still have the attention of the reviewer,
> they may not be look again for days/weeks.
>
> The length of time to get work merged serves as a demotivator to
> actually do work in the first place. I've personally avoided doing
> alot of code refactoring & cle

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Alessandro Pilotti
Hi all,

This is an issue that has been discussed quite a few times. As I was fearing the
bottleneck effect is getting worse with each release.

Nova grew simply too much and even though features like networking and block
storage have been spun off at some point in time, it still lacks the cohesion
necessary for a successful long term lifecycle, or in other terms, it’s just 
too big to
be properly maintained by a handful of amazing and overworked people.

Compute drivers are easy to identify as decoupled sub-projects and are among
those which suffer to a bigger extent the lack of an independent development
process. 

Nova is a mature project (at least relatively to the OpenStack’s context) and as
such new features and bug fixes need to go through a very thorough screening and
review before being approved and merged, which does not work well with
sub-projects that need to grow faster, especially when introduced later in the
lifecycle (e.g the current Hyper-V driver introduced in Folsom) or when being
pushed by more aggressive market requirements. 

Just as an example, only 3 out of 8 Hyper-V blueprint specs have been approved
and implemented in Juno, the rest will simply get bumped to Kilo, which means
that new additional specs will need to be bumped to L and so on introducing
further delays. We ended up privileging feature parity blueprints, delaying
almost anything else.

Bug fixes landing time in stable releases is also another issue for the user
base since merging in master takes a long time and backporting requires another
long review process, e.g. more than four months in some cases [1]. 
As a result we ended up releasing the fixes in a project fork that became our de
facto stable release in place of upstream, while waiting for upstream merge.

We never experienced similar issues in smaller projects like Neutron, Cinder,
Ceilometer or Horizon where we are involved as well, which can be a practical
example of the potential benefits of splitting Nova.

OpenStack has a clear process for incubation, letting new projects grow as fast
as they need during their youth and integrating them into core only when a
mature stage is reached [2]. Unfortunately this process applies to projects, but
not to subprojects (Hyper-V and VMWare drivers in particular, but not only)
resulting in a way slower development pace compared to what a project lead by an
independent team could have allowed. On the other hand, Docker is an example of
a driver going the StackForge way, but its ultimate potential inclusion in Nova
will just increase the current pain points.

>From an Hyper-V team perspective, in the late Havana cycle the same reasons
highlighted in this thread almost lead us to ask for removal of the driver from
Nova in order to improve our development process, even at the cost of the
subsequent fall from (core) grace and StackForge incubation Purgatory period, so
I’m definitely happy that the conversation has been resumed with a bigger
consensus.

The main factor that blocked the Hyper-V driver’s exit from Nova was the
introduction of the Hyper-V CI during the same cycle. Regressions are a very
sensitive topic when you run OpenStack components on an operating system which
is not Linux and the CI helped a lot in blocking or discovering issues in a
timely fashion. Beside that, the size of the Hyper-V team increased considerably
during Icehouse and Juno [3], so the Hyper-V CI became a mandatory and almost
irreplaceable tool in our review process, leading us to reach an excellent level
of stability of the driver on every supported version of Hyper-V (and
progressive CI voting stability as well, but that’s another topic [4]).

This means that if we reach a point in which we agree to spin off the drivers in
separate core projects, we need to consider how driver related CIs will be still
included in the Nova review process, possibly with voting rights when the
individual CI stability allows it. Having each third party CI to vote only on
its spin-off driver project is not an option IMO, as it won’t catch regressions
introduced in Nova that affect the drivers, including race conditions [5]

An interesting area of discussion is who is going to be part of the initial core
teams for each new subproject. I truly appreciated the experience and help of
the Nova core guys, so in order to allow a smoother transition I’d suggest to
have for each new project (e.g. nova-compute-hyperv, nova-compute-vmware, etc)
an initial core team consisting in one or two members of the current Nova
sub-team and one Nova core, with ideally each patch reviewed by both the domain
experts and the Nova core. The team could then go on its way by voting its own
members as any other OpenStack project does.

Another point of discussion is the stabilization and documentation of the driver
interface. There are simply too many areas where the behavior between drivers
differs, and looking at some other driver’s behavior was in too many cases the
only source of document

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 11:32 AM, Vladik Romanovsky wrote:

+1

I very much agree with Dan's the propsal.

I am concerned about difficulties we will face with merging
patches that spreads accross various regions: manager, conductor, scheduler, 
etc..
However, I think, this is a small price to pay for having a more focused teams.

IMO, we will stiil have to pay it, the moment the scheduler will separate.


There will be more pain the moment the scheduler separates, IMO, 
especially with its current design and interfaces.


-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 05:11:22PM +0100, Duncan Thomas wrote:
> On 4 September 2014 16:00, Solly Ross  wrote:
> >> My only question is about the need to separate out each virt driver into a 
> >> separate project, wouldn't you
> >> accomplish a lot of the benefit by creating a single virt project that 
> >> includes all of the drivers?
> >
> > I don't think there's particularly a *point* to having all drivers in one 
> > repo.  Part of code review is looking for code "gotchas", but part of code 
> > review is looking for subtle issues that are caused by the very nature of 
> > the driver.  A HyperV "core" reviewing a libvirt change should certainly be 
> > able to provide the former, but most likely cannot provide the latter to a 
> > sufficient degree (if he or she can, then he or she should be a libvirt 
> > "core" as well).
> 
> I think that having a shared review team across all of the drivers has
> definite benefits in terms of coherency and consistency - it is very
> easy for experts on one technology to become tunnel-visioned on some
> points and miss the wider, cross project picture. A common drivers
> team is likely to have a broad enough range of opinions to keep things
> healthy, compared to one repo (and team) per driver, and also they are
> able to speak collectively to teh core nova team, which helps set
> priorities there when they need to be influenced on behalf of the
> drivers team.

If people are interested in reviewing all the driver code there's nothing
preventing them doing that. It is easy to setup gerrit to notify you on
changes across many drivers if you have that desire, or to write scripts
to query gerrit too. Realistically though, even today most people working
on a virt driver totally ignore the other virt drivers and so separating
them isn't going to make things significantly worse in that regard.

> TLDR: I don't think there's particularly a point to splitting out the
> drivers into individual repos, and much to be gained from keeping them
> all in one (but still breaking them out of nova)

There's significant benefits in the way we can test and gate changes
by having separate repos. It also ensures that the workload for changes
for one driver don't impact on the workload of changes for another
driver which is a very real problem today. It also ensures that any
new drivers can start off on a level playing field wrt existing drivers
and not have to jump over a huge initial bar to get into the official
repo. So there is a great deal of benefit to having  one repo per
driver.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 17:57, Daniel P. Berrange a écrit :

On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote:

Actually, I think Sylvain's point is even stronger as I don't think
splitting the virt drivers out of Nova is a complete fix.  It may
solve the review latency for the virt driver area but, unless virt
drivers are the bulk of Nova patches, the Nova core team will still
be swamped with review requests.  Some solution, maybe half-cores,
will still be needed for Nova long term.

Absolutely, nova core will still have an awful lot of work todo
and will need to have fresh blood. The split will free up some %
of existing cores time though as there's certainly plenty of virt
driver only patches going through merge that are taking up non
negligble review time. eg I've done loads of review on vmware
only code which I'd be relieved of with vmware maintainers able
to form their own review core for their driver. There is also the
fact that people are holding back on even submitting code for
many drivers because they know it'll never get reviewed. So the
proportion of virt driver only code is likely to be higher than
what we currently see on review today.



I totally understand your point and I agree with it. I'm just thinking 
that for Kilo and Lxxx, we also need to experiment some halfcore teams 
in order to free up your review duty, at least until the virt code is 
splitted out correctly.


On a side note, assuming I'm a non-core (so you can just throw my 
advice), I don't think the runway/slot proposal for Kilo will increase 
the reviewing bandwidth as it will just create another layer of 
prioritization without addressing the velocity. In another world, that's 
not because you just create a Scrum's sprint with 2 people and provide 
poker planning that you can address a 2-month man-day work.


-Sylvain


Regards,
Daniel



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Duncan Thomas
On 4 September 2014 16:00, Solly Ross  wrote:
>> My only question is about the need to separate out each virt driver into a 
>> separate project, wouldn't you
>> accomplish a lot of the benefit by creating a single virt project that 
>> includes all of the drivers?
>
> I don't think there's particularly a *point* to having all drivers in one 
> repo.  Part of code review is looking for code "gotchas", but part of code 
> review is looking for subtle issues that are caused by the very nature of the 
> driver.  A HyperV "core" reviewing a libvirt change should certainly be able 
> to provide the former, but most likely cannot provide the latter to a 
> sufficient degree (if he or she can, then he or she should be a libvirt 
> "core" as well).

I think that having a shared review team across all of the drivers has
definite benefits in terms of coherency and consistency - it is very
easy for experts on one technology to become tunnel-visioned on some
points and miss the wider, cross project picture. A common drivers
team is likely to have a broad enough range of opinions to keep things
healthy, compared to one repo (and team) per driver, and also they are
able to speak collectively to teh core nova team, which helps set
priorities there when they need to be influenced on behalf of the
drivers team.

TLDR: I don't think there's particularly a point to splitting out the
drivers into individual repos, and much to be gained from keeping them
all in one (but still breaking them out of nova)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote:
> Actually, I think Sylvain's point is even stronger as I don't think
> splitting the virt drivers out of Nova is a complete fix.  It may
> solve the review latency for the virt driver area but, unless virt
> drivers are the bulk of Nova patches, the Nova core team will still
> be swamped with review requests.  Some solution, maybe half-cores,
> will still be needed for Nova long term.

Absolutely, nova core will still have an awful lot of work todo
and will need to have fresh blood. The split will free up some %
of existing cores time though as there's certainly plenty of virt
driver only patches going through merge that are taking up non
negligble review time. eg I've done loads of review on vmware
only code which I'd be relieved of with vmware maintainers able
to form their own review core for their driver. There is also the
fact that people are holding back on even submitting code for
many drivers because they know it'll never get reviewed. So the
proportion of virt driver only code is likely to be higher than
what we currently see on review today.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Dugger, Donald D
Actually, I think Sylvain's point is even stronger as I don't think splitting 
the virt drivers out of Nova is a complete fix.  It may solve the review 
latency for the virt driver area but, unless virt drivers are the bulk of Nova 
patches, the Nova core team will still be swamped with review requests.  Some 
solution, maybe half-cores, will still be needed for Nova long term.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Sylvain Bauza [mailto:sba...@redhat.com] 
Sent: Thursday, September 4, 2014 9:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
virt drivers


Le 04/09/2014 17:00, Solly Ross a écrit :
>> My only question is about the need to separate out each virt driver 
>> into a separate project, wouldn't you accomplish a lot of the benefit by 
>> creating a single virt project that includes all of the drivers?
> I don't think there's particularly a *point* to having all drivers in one 
> repo.  Part of code review is looking for code "gotchas", but part of code 
> review is looking for subtle issues that are caused by the very nature of the 
> driver.  A HyperV "core" reviewing a libvirt change should certainly be able 
> to provide the former, but most likely cannot provide the latter to a 
> sufficient degree (if he or she can, then he or she should be a libvirt 
> "core" as well).
>
> A strong +1 to Dan's proposal.  I think this would also make it easier for 
> non-core reviewers to get started reviewing, without having a specialized 
> tool setup.

As I said previously, I'm also giving a +1 to this proposal. That said, as I 
think it deserves at least one iteration for getting this done (look at the 
scheduler split and since hox long we're working on it), I also think we need a 
short-term solution like the one proposed by Thierry, ie. what I call 
"half-cores" - people who help reviewing an code area and free up time for 
cores just for approving instead of focusing on each iteration.

-Sylvain


> Best Regards,
> Solly Ross
>
> P.S.
>> This is a crisis. A large crisis. In fact, if you got a moment, it's 
>> a twelve-storey crisis with a magnificent entrance hall, carpeting 
>> throughout, 24-hour portage, and an enormous sign on the roof, saying 
>> 'This Is a Large Crisis'. A large crisis requires a large plan.
> Ha!
>
> - Original Message -
>> From: "Donald D Dugger" 
>> To: "Daniel P. Berrange" , "OpenStack Development 
>> Mailing List (not for usage questions)"
>> 
>> Sent: Thursday, September 4, 2014 10:33:27 AM
>> Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting 
>> outvirt drivers
>>
>> Basically +1 with what Daniel is saying (note that, as mentioned, a 
>> side effect of our effort to split out the scheduler will help but 
>> not solve this problem).
>>
>> My only question is about the need to separate out each virt driver 
>> into a separate project, wouldn't you accomplish a lot of the benefit 
>> by creating a single virt project that includes all of the drivers?  
>> I wouldn't necessarily expect a VMware guy to understand the 
>> specifics of the HyperV implementation but both people should 
>> understand what a virt driver does, how it interfaces to Nova and 
>> they should be able to intelligently review each other's code.
>>
>> --
>> Don Dugger
>> "Censeo Toto nos in Kansa esse decisse." - D. Gale
>> Ph: 303/443-3786
>>
>> -Original Message-
>> From: Daniel P. Berrange [mailto:berra...@redhat.com]
>> Sent: Thursday, September 4, 2014 4:24 AM
>> To: OpenStack Development
>> Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting 
>> out virt drivers
>>
>> Position statement
>> ==
>>
>> Over the past year I've increasingly come to the conclusion that Nova 
>> is heading for (or probably already at) a major crisis. If steps are 
>> not taken to avert this, the project is likely to loose a non-trivial 
>> amount of talent, both regular code contributors and core team 
>> members. That includes myself. This is not good for Nova's long term 
>> health and so should be of concern to anyone involved in Nova and OpenStack.
>>
>> For those who don't want to read the whole mail, the executive 
>> summary is that the nova-core team is an unfixable bottleneck in our 
>> development process with our current proje

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote:
> Hi,
> I do not think that Nova is in a death spiral. I just think that the
> current way of working at the moment is strangling the project. I do not
> understand why we need to split drivers out of the core project. Why not
> have the ability to provide Œcore review¹ status to people for reviewing
> those parts of the code? We have enough talented people in OpenStack to be
> able to write a driver above gerrit to enable that.

The consensus view at the summit was that, having tried & failed at getting
useful changes into gerrit, it is not a viable option unless we undertake a
permanent fork of the code base. There didn't seem to be any apetite for
maintaining & developing a large java app ourselves. So people we're looking
to start writing a replacement for gerrit from scratch (albeit reusing the
database schema).

Even if we did have such fine grained permissioning in gerrit or another
review tool, I'd still suggest a split because this is about more than just
the review team size. There are a number of other compelling benefits to
having fully separate drivers I've mentioned in the original thread & other
replies here.

> Fragmenting the project will be very unhealthy.

On the contrary, I think it will re-invigorate the project. The other
historical cases where open stack projects have split out code have
resulted in a pretty significant benefit for all involved. The testing
frameworks we have will help ensure that the virt drivers continue to
provide consistent semantics, just as they do today, and any eventual
openstack trademark certifications would re-inforce that. Improving
the specification of the virt driver interface by introducing more
objects and killing undocumented dict usage will also further help
in keeping virt drivers aligned.

> On 9/4/14, 3:59 PM, "Thierry Carrez"  wrote:
> 
> >Like I mentioned before, I think the only way out of the Nova death
> >spiral is to split code and give control over it to smaller dedicated
> >review teams. This is one way to do it. Thanks Dan for pulling this
> >together :)
> >
> >A couple comments inline:
> >
> >Daniel P. Berrange wrote:
> >> [...]
> >> This is a crisis. A large crisis. In fact, if you got a moment, it's
> >> a twelve-storey crisis with a magnificent entrance hall, carpeting
> >> throughout, 24-hour portage, and an enormous sign on the roof,
> >> saying 'This Is a Large Crisis'. A large crisis requires a large
> >> plan.
> >> [...]
> >
> >I totally agree. We need a plan now, because we can't go through another
> >cycle without a solution in sight.
> >
> >> [...]
> >> This has quite a few implications for the way development would
> >> operate.
> >> 
> >>  - The Nova core team at least, would be voluntarily giving up a big
> >>amount of responsibility over the evolution of virt drivers. Due
> >>to human nature, people are not good at giving up power, so this
> >>may be painful to swallow. Realistically current nova core are
> >>not experts in most of the virt drivers to start with, and more
> >>important we clearly do not have sufficient time to do a good job
> >>of review with everything submitted. Much of the current need
> >>for core review of virt drivers is to prevent the mis-use of a
> >>poorly defined virt driver API...which can be mitigated - See
> >>later point(s)
> >> 
> >>  - Nova core would/should not have automatic +2 over the virt driver
> >>repositories since it is unreasonable to assume they have the
> >>suitable domain knowledge for all virt drivers out there. People
> >>would of course be able to be members of multiple core teams. For
> >>example John G would naturally be nova-core and nova-xen-core. I
> >>would aim for nova-core and nova-libvirt-core, and so on. I do not
> >>want any +2 responsibility over VMWare/HyperV/Docker drivers since
> >>they're not my area of expertize - I only look at them today because
> >>they have no other nova-core representation.
> >> 
> >>  - Not sure if it implies the Nova PTL would be solely focused on
> >>Nova common. eg would there continue to be one PTL over all virt
> >>driver implementation projects, or would each project have its
> >>own PTL. Maybe this is irrelevant if a Czars approach is chosen
> >>by virt driver projects for their work. I'd be inclined to say
> >>that a single PTL should stay as a figurehead to represent all
> >>the virt driver projects, acting as a point of contact to ensure
> >>we keep communication / co-operation between the drivers in sync.
> >> [...]
> >
> >At this point it may look like our current structure (programs, one PTL,
> >single core teams...) prevents us from implementing that solution. I
> >just want to say that in OpenStack, organizational structure reflects
> >how we work, not the other way around. If we need to reorganize
> >"official" project structure to work in smarter and long-term hea

[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Vladik Romanovsky
+1 

I very much agree with Dan's the propsal.

I am concerned about difficulties we will face with merging
patches that spreads accross various regions: manager, conductor, scheduler, 
etc..
However, I think, this is a small price to pay for having a more focused teams.

IMO, we will stiil have to pay it, the moment the scheduler will separate.

Regards,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 10:18:04AM -0500, Matt Riedemann wrote:
> 
> >>
> >>  - Changes submitted to nova common code would trigger running of CI
> >>tests against the external virt drivers. Each virt driver core team
> >>would decide whether they want their driver to be tested upon Nova
> >>common changes. Expect that all would choose to be included to the
> >>same extent that they are today. So level of validation of nova code
> >>would remain at least at current level. I don't want to reduce the
> >>amount of code testing here since that's contrary to the direction
> >>we're taking wrt testing.
> >>
> >>  - Changes submitted to virt drivers would trigger running CI tests
> >>that are applicable. eg changes to libvirt driver repo would not
> >>involve running database migration tests, since all database code
> >>is isolated in nova. libvirt changes would not trigger vmware,
> >>xenserver, ironic, etc CI systems. Virt driver changes should
> >>see fewer false positives in the tests as a result, and those
> >>that do occur should be more explicitly related to the code being
> >>proposed. eg a change to vmware is not going to trigger a tempest
> >>run that uses libvirt, so non-deterministic failures in libvirt
> >>will no longer plague vmware developers reviews. This would also
> >>make it possible for VMWare CI to be made gating for changes to
> >>the VMWare virt driver repository, without negatively impacting
> >>other virt drivers. So this change should increase testing quality
> >>for non-libvirt virt drivers and reduce pain of false failures
> >>for everyone.

[snip]

> Even if we split the virt drivers out, libvirt would still be the default in
> the Tempest gate runs right?

Yes, what I'm calling the nova common repository would still need to
have a tempest job that was gating on at least one virt driver as a
sanity check. As mentioned above, I'd pretty much expect that all
current tempest jobs for nova common code would continue unchanged.
IOW, a libvirt job would still be gating, and there'd still be a
number of 3rd party CIs for other virt drivers non-gating too.

The only change in testing jobs would be wrt to the new git repos for
the individual virt drivers. Those would be only running jobs directly
related to the code in those repos. it vmware is tested by a vmware CI
job and libvirt is tested by a libvirt CI job.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 17:00, Solly Ross a écrit :

My only question is about the need to separate out each virt driver into a 
separate project, wouldn't you
accomplish a lot of the benefit by creating a single virt project that includes 
all of the drivers?

I don't think there's particularly a *point* to having all drivers in one repo.  Part of code review is 
looking for code "gotchas", but part of code review is looking for subtle issues that are caused by 
the very nature of the driver.  A HyperV "core" reviewing a libvirt change should certainly be able 
to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, 
then he or she should be a libvirt "core" as well).

A strong +1 to Dan's proposal.  I think this would also make it easier for 
non-core reviewers to get started reviewing, without having a specialized tool 
setup.


As I said previously, I'm also giving a +1 to this proposal. That said, 
as I think it deserves at least one iteration for getting this done 
(look at the scheduler split and since hox long we're working on it), I 
also think we need a short-term solution like the one proposed by 
Thierry, ie. what I call "half-cores" - people who help reviewing an 
code area and free up time for cores just for approving instead of 
focusing on each iteration.


-Sylvain



Best Regards,
Solly Ross

P.S.

This is a crisis. A large crisis. In fact, if you got a moment, it's
a twelve-storey crisis with a magnificent entrance hall, carpeting
throughout, 24-hour portage, and an enormous sign on the roof,
saying 'This Is a Large Crisis'. A large crisis requires a large
plan.

Ha!

- Original Message -

From: "Donald D Dugger" 
To: "Daniel P. Berrange" , "OpenStack Development Mailing List 
(not for usage questions)"

Sent: Thursday, September 4, 2014 10:33:27 AM
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out   
virt drivers

Basically +1 with what Daniel is saying (note that, as mentioned, a side
effect of our effort to split out the scheduler will help but not solve this
problem).

My only question is about the need to separate out each virt driver into a
separate project, wouldn't you accomplish a lot of the benefit by creating a
single virt project that includes all of the drivers?  I wouldn't
necessarily expect a VMware guy to understand the specifics of the HyperV
implementation but both people should understand what a virt driver does,
how it interfaces to Nova and they should be able to intelligently review
each other's code.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com]
Sent: Thursday, September 4, 2014 4:24 AM
To: OpenStack Development
Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out
virt drivers

Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is
heading for (or probably already at) a major crisis. If steps are not taken
to avert this, the project is likely to loose a non-trivial amount of
talent, both regular code contributors and core team members. That includes
myself. This is not good for Nova's long term health and so should be of
concern to anyone involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is
that the nova-core team is an unfixable bottleneck in our development
process with our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out
of tree and let them all have their own core teams in their area of code,
leaving current nova core to focus on all the common code outside the virt
driver impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed & merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a
big problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the
backlog of code up for review never goes away. Even intensive code review
efforts at various points in the dev cycle makes only a small impact on the
backlog. This has a pretty significant impact on core team members, as their
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the re

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Matt Riedemann



On 9/4/2014 9:57 AM, Daniel P. Berrange wrote:

On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned,
a side effect of our effort to split out the scheduler will help
but not solve this problem).


Thanks for taking the time to read & give feedback


My only question is about the need to separate out each virt driver
into a separate project, wouldn't you accomplish a lot of the
benefit by creating a single virt project that includes all of the
drivers?  I wouldn't necessarily expect a VMware guy to understand
the specifics of the HyperV implementation but both people should
understand what a virt driver does, how it interfaces to Nova and
they should be able to intelligently review each other's code.


A single repo for virt drivers would have all the same costs of
separating from nova common, but with fewer of the benefits of
separate repos per driver. IOW, if we're going to split the
virt drivers out from the nova common, then we should go all
the way.

I think the separate driver repos is fairly compelling for a
number of reasons besides just core team size. As mentioned
elsewhere it allows better targeting of CI test jobs. ie a
VMware CI job can be easily made gating for only VMware code
changes. So VMWare CI instability won't affect libvirt code
submissions, and libvirt CI instability won't affect VMware
code submissions. Separate repos means that people starting
off a new driver (like Ironic or Docker) would not have to
immediately meet the same very high quality & testing bar
that existing drivers do. THey can evolve at their own pace
and not have to then undergo the disruption of jumping from
their initial repo to the 'official' repo.  Finally, I would
like each drivers team to be isolated from each other in terms
of code review capacity planning as far as practical - ie the
libvirt team should be able to accept as many libvirt features
as they can handle without being concerned that they'll reduce
what vmware is able to accept (though changes involving the
nova common code would obviously still contend).



Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is 
heading for (or probably already at) a major crisis. If steps are not taken to 
avert this, the project is likely to loose a non-trivial amount of talent, both 
regular code contributors and core team members. That includes myself. This is 
not good for Nova's long term health and so should be of concern to anyone 
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is that 
the nova-core team is an unfixable bottleneck in our development process with 
our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out of 
tree and let them all have their own core teams in their area of code, leaving 
current nova core to focus on all the common code outside the virt driver 
impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed & merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a big 
problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the 
backlog of code up for review never goes away. Even intensive code review 
efforts at various points in the dev cycle makes only a small impact on the 
backlog. This has a pretty significant impact on core team members, as their 
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the reviews in 
a more efficient manner than plain gerrit allows for. These certainly help, but 
they can't ever solve the problem on their own - just make it slightly more 
bearable. And this is not even considering that core team members might have 
useful contributions to make in ways beyond just code review. Ultimately the 
workload is just too high to sustain the levels of review required, so core 
team members will eventually burn out (as they have done many times already).

Even if one person attempts to take the initiative to heavily invest in review 
of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag team' it is hard for 
one person to make a difference. The end result is that a patch is +2d and then 
sits idle for weeks or more until a merge conflict requires it to be reposted at 
which point even that one +2 is lost. This is a pretty

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 15:36, Gary Kotton a écrit :

Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.
Fragmenting the project will be very unhealthy.
For what it is worth having a release date at the end of a vacation is
really bad. Look at the numbers:
http://stackalytics.com/report/contribution/nova-group/30
Thanks
Gary


From my perspective, the raw number of reviews should not be the only 
metric for saying if someone good for being a core. Indeed, that's quite 
easy to provide some comments on cosmetic but if you see why the patches 
are getting a -1 from a core, that's mostly because of a more important 
design issue or going reverse from another current effort.



Also, I can note that Stackanalytics metrics are *really* different from 
other tools like 
http://russellbryant.net/openstack-stats/nova-reviewers-30.txt


As a non-core people, I can just say that a core people must be at least 
there during Nova meetings and voice his opinions, provide some help 
with the gate status, look at bugs, give feedback to newcomers etc. and 
not just click on -1 or +1



Here, the problem is that the core team is not scalable : I don't want 
to provide examples of governments but just adding more people is often 
not the solution. Instead, providing delegations to subteams seems maybe 
the intermediate solution for helping this as it could help the core 
team to only approve and leave the subteam's half-cores reviewing the 
iterations until they consider the patch enough good for being merged.


Of course, nova cores could still bypass half-cores as they know the 
whole knowledge of Nova, or they could disapprove what the halfcores 
agreed, but that would free a lot of time for cores without giving them 
more bureaucracy.



I really like Dan's proposal of splitting code into different repos with 
separate teams and a single PTL (that's exactly the difference in 
between a Program and a Project) but as it requires some prework, I'm 
just thinking of allocating halfcores as a short-term solution until all 
the bits are sorted out.


And yes, there is urgency, I also felt the pain.

-Sylvain



On 9/4/14, 3:59 PM, "Thierry Carrez"  wrote:


Like I mentioned before, I think the only way out of the Nova death
spiral is to split code and give control over it to smaller dedicated
review teams. This is one way to do it. Thanks Dan for pulling this
together :)

A couple comments inline:

Daniel P. Berrange wrote:

[...]
This is a crisis. A large crisis. In fact, if you got a moment, it's
a twelve-storey crisis with a magnificent entrance hall, carpeting
throughout, 24-hour portage, and an enormous sign on the roof,
saying 'This Is a Large Crisis'. A large crisis requires a large
plan.
[...]

I totally agree. We need a plan now, because we can't go through another
cycle without a solution in sight.


[...]
This has quite a few implications for the way development would
operate.

  - The Nova core team at least, would be voluntarily giving up a big
amount of responsibility over the evolution of virt drivers. Due
to human nature, people are not good at giving up power, so this
may be painful to swallow. Realistically current nova core are
not experts in most of the virt drivers to start with, and more
important we clearly do not have sufficient time to do a good job
of review with everything submitted. Much of the current need
for core review of virt drivers is to prevent the mis-use of a
poorly defined virt driver API...which can be mitigated - See
later point(s)

  - Nova core would/should not have automatic +2 over the virt driver
repositories since it is unreasonable to assume they have the
suitable domain knowledge for all virt drivers out there. People
would of course be able to be members of multiple core teams. For
example John G would naturally be nova-core and nova-xen-core. I
would aim for nova-core and nova-libvirt-core, and so on. I do not
want any +2 responsibility over VMWare/HyperV/Docker drivers since
they're not my area of expertize - I only look at them today because
they have no other nova-core representation.

  - Not sure if it implies the Nova PTL would be solely focused on
Nova common. eg would there continue to be one PTL over all virt
driver implementation projects, or would each project have its
own PTL. Maybe this is irrelevant if a Czars approach is chosen
by virt driver projects for their work. I'd be inclined to say
that a single PTL should stay as a figurehead to represent all
the virt driver projects, acting as a point of contact to ensure
  

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Solly Ross
> My only question is about the need to separate out each virt driver into a 
> separate project, wouldn't you 
> accomplish a lot of the benefit by creating a single virt project that 
> includes all of the drivers?

I don't think there's particularly a *point* to having all drivers in one repo. 
 Part of code review is looking for code "gotchas", but part of code review is 
looking for subtle issues that are caused by the very nature of the driver.  A 
HyperV "core" reviewing a libvirt change should certainly be able to provide 
the former, but most likely cannot provide the latter to a sufficient degree 
(if he or she can, then he or she should be a libvirt "core" as well).

A strong +1 to Dan's proposal.  I think this would also make it easier for 
non-core reviewers to get started reviewing, without having a specialized tool 
setup.

Best Regards,
Solly Ross

P.S. 
>This is a crisis. A large crisis. In fact, if you got a moment, it's
> a twelve-storey crisis with a magnificent entrance hall, carpeting
> throughout, 24-hour portage, and an enormous sign on the roof,
> saying 'This Is a Large Crisis'. A large crisis requires a large
> plan.

Ha!

- Original Message -
> From: "Donald D Dugger" 
> To: "Daniel P. Berrange" , "OpenStack Development 
> Mailing List (not for usage questions)"
> 
> Sent: Thursday, September 4, 2014 10:33:27 AM
> Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
> virt drivers
> 
> Basically +1 with what Daniel is saying (note that, as mentioned, a side
> effect of our effort to split out the scheduler will help but not solve this
> problem).
> 
> My only question is about the need to separate out each virt driver into a
> separate project, wouldn't you accomplish a lot of the benefit by creating a
> single virt project that includes all of the drivers?  I wouldn't
> necessarily expect a VMware guy to understand the specifics of the HyperV
> implementation but both people should understand what a virt driver does,
> how it interfaces to Nova and they should be able to intelligently review
> each other's code.
> 
> --
> Don Dugger
> "Censeo Toto nos in Kansa esse decisse." - D. Gale
> Ph: 303/443-3786
> 
> -----Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Thursday, September 4, 2014 4:24 AM
> To: OpenStack Development
> Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out
> virt drivers
> 
> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that Nova is
> heading for (or probably already at) a major crisis. If steps are not taken
> to avert this, the project is likely to loose a non-trivial amount of
> talent, both regular code contributors and core team members. That includes
> myself. This is not good for Nova's long term health and so should be of
> concern to anyone involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive summary is
> that the nova-core team is an unfixable bottleneck in our development
> process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt drivers out
> of tree and let them all have their own core teams in their area of code,
> leaving current nova core to focus on all the common code outside the virt
> driver impls. I, now, none the less urge people to read the whole mail.
> 
> 
> Background information
> ==
> 
> I see many factors coming together to form the crisis
> 
>  - Burn out of core team members from over work
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
> 
> Each item on their own may not seem too bad, but combined they add up to a
> big problem.
> 
> Core team burn out
> --
> 
> Having been involved in Nova for several dev cycles now, it is clear that the
> backlog of code up for review never goes away. Even intensive code review
> efforts at various points in the dev cycle makes only a small impact on the
> backlog. This has a pretty significant impact on core team members, as their
> work is never done. At best, the dial is sometimes set to 10, instead of 11.
> 
> Many people, myself included, have built tools to help deal with the reviews
> in a more efficient manner than plain gerrit allows for. These certainly
> help, but they can

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote:
> Basically +1 with what Daniel is saying (note that, as mentioned,
> a side effect of our effort to split out the scheduler will help
> but not solve this problem).

Thanks for taking the time to read & give feedback

> My only question is about the need to separate out each virt driver
> into a separate project, wouldn't you accomplish a lot of the
> benefit by creating a single virt project that includes all of the
> drivers?  I wouldn't necessarily expect a VMware guy to understand
> the specifics of the HyperV implementation but both people should
> understand what a virt driver does, how it interfaces to Nova and
> they should be able to intelligently review each other's code.

A single repo for virt drivers would have all the same costs of
separating from nova common, but with fewer of the benefits of
separate repos per driver. IOW, if we're going to split the
virt drivers out from the nova common, then we should go all
the way.

I think the separate driver repos is fairly compelling for a
number of reasons besides just core team size. As mentioned
elsewhere it allows better targeting of CI test jobs. ie a
VMware CI job can be easily made gating for only VMware code
changes. So VMWare CI instability won't affect libvirt code
submissions, and libvirt CI instability won't affect VMware
code submissions. Separate repos means that people starting
off a new driver (like Ironic or Docker) would not have to
immediately meet the same very high quality & testing bar
that existing drivers do. THey can evolve at their own pace
and not have to then undergo the disruption of jumping from
their initial repo to the 'official' repo.  Finally, I would
like each drivers team to be isolated from each other in terms
of code review capacity planning as far as practical - ie the
libvirt team should be able to accept as many libvirt features
as they can handle without being concerned that they'll reduce
what vmware is able to accept (though changes involving the
nova common code would obviously still contend).


> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that Nova is 
> heading for (or probably already at) a major crisis. If steps are not taken 
> to avert this, the project is likely to loose a non-trivial amount of talent, 
> both regular code contributors and core team members. That includes myself. 
> This is not good for Nova's long term health and so should be of concern to 
> anyone involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive summary is 
> that the nova-core team is an unfixable bottleneck in our development process 
> with our current project structure.
> The only way I see to remove the bottleneck is to split the virt drivers out 
> of tree and let them all have their own core teams in their area of code, 
> leaving current nova core to focus on all the common code outside the virt 
> driver impls. I, now, none the less urge people to read the whole mail.
> 
> 
> Background information
> ==
> 
> I see many factors coming together to form the crisis
> 
>  - Burn out of core team members from over work
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
> 
> Each item on their own may not seem too bad, but combined they add up to a 
> big problem.
> 
> Core team burn out
> --
> 
> Having been involved in Nova for several dev cycles now, it is clear that the 
> backlog of code up for review never goes away. Even intensive code review 
> efforts at various points in the dev cycle makes only a small impact on the 
> backlog. This has a pretty significant impact on core team members, as their 
> work is never done. At best, the dial is sometimes set to 10, instead of 11.
> 
> Many people, myself included, have built tools to help deal with the reviews 
> in a more efficient manner than plain gerrit allows for. These certainly 
> help, but they can't ever solve the problem on their own - just make it 
> slightly more bearable. And this is not even considering that core team 
> members might have useful contributions to make in ways beyond just code 
> review. Ultimately the workload is just too high to sustain the levels of 
> review required, so core team members will eventually burn out (as they have 
> done many times already).
> 
> Even if one person attempts to take the initiative to heavily invest in 
> review of certain features it is often to no avail.
> Unless a second dedicated core reviewer can be found to 'tag team' it is hard 
> for one person to make a difference. The end result is that a patch is +2d 
> and then sits idle for weeks or more until a merge conflict requires it 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Dugger, Donald D
Basically +1 with what Daniel is saying (note that, as mentioned, a side effect 
of our effort to split out the scheduler will help but not solve this problem).

My only question is about the need to separate out each virt driver into a 
separate project, wouldn't you accomplish a lot of the benefit by creating a 
single virt project that includes all of the drivers?  I wouldn't necessarily 
expect a VMware guy to understand the specifics of the HyperV implementation 
but both people should understand what a virt driver does, how it interfaces to 
Nova and they should be able to intelligently review each other's code.

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com] 
Sent: Thursday, September 4, 2014 4:24 AM
To: OpenStack Development
Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt 
drivers

Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is 
heading for (or probably already at) a major crisis. If steps are not taken to 
avert this, the project is likely to loose a non-trivial amount of talent, both 
regular code contributors and core team members. That includes myself. This is 
not good for Nova's long term health and so should be of concern to anyone 
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is that 
the nova-core team is an unfixable bottleneck in our development process with 
our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out of 
tree and let them all have their own core teams in their area of code, leaving 
current nova core to focus on all the common code outside the virt driver 
impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

 - Burn out of core team members from over work
 - Difficulty bringing new talent into the core team
 - Long delay in getting code reviewed & merged
 - Marginalization of code areas which aren't popular
 - Increasing size of nova code through new drivers
 - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a big 
problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the 
backlog of code up for review never goes away. Even intensive code review 
efforts at various points in the dev cycle makes only a small impact on the 
backlog. This has a pretty significant impact on core team members, as their 
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the reviews in 
a more efficient manner than plain gerrit allows for. These certainly help, but 
they can't ever solve the problem on their own - just make it slightly more 
bearable. And this is not even considering that core team members might have 
useful contributions to make in ways beyond just code review. Ultimately the 
workload is just too high to sustain the levels of review required, so core 
team members will eventually burn out (as they have done many times already).

Even if one person attempts to take the initiative to heavily invest in review 
of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag team' it is hard 
for one person to make a difference. The end result is that a patch is +2d and 
then sits idle for weeks or more until a merge conflict requires it to be 
reposted at which point even that one +2 is lost. This is a pretty demotivating 
outcome for both reviewers & the patch contributor.


New core team talent


It can't escape attention that the Nova core team does not grow in size very 
often. When Nova was younger and its code base was smaller, it was easier for 
contributors to get onto core because the base level of knowledge required was 
that much smaller. To get onto core today requires a major investment in 
learning Nova over a year or more. Even people who potentially have the latent 
skills may not have the time available to invest in learning the entire of Nova.

With the number of reviews proposed to Nova, the core team should probably be 
at least double its current size[1]. There is plenty of expertize in the 
project as a whole but it is typically focused into specific areas of the 
codebase. There is nowhere we can find
20 more people with broad knowledge of the codebase who could be promoted even 
over the next year, let alone today. This is ignoring that many existing 
members of core are relatively inactive due to burnout and so need replacing. 
That means we really need anothe

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Gary Kotton
Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.
Fragmenting the project will be very unhealthy.
For what it is worth having a release date at the end of a vacation is
really bad. Look at the numbers:
http://stackalytics.com/report/contribution/nova-group/30
Thanks
Gary

On 9/4/14, 3:59 PM, "Thierry Carrez"  wrote:

>Like I mentioned before, I think the only way out of the Nova death
>spiral is to split code and give control over it to smaller dedicated
>review teams. This is one way to do it. Thanks Dan for pulling this
>together :)
>
>A couple comments inline:
>
>Daniel P. Berrange wrote:
>> [...]
>> This is a crisis. A large crisis. In fact, if you got a moment, it's
>> a twelve-storey crisis with a magnificent entrance hall, carpeting
>> throughout, 24-hour portage, and an enormous sign on the roof,
>> saying 'This Is a Large Crisis'. A large crisis requires a large
>> plan.
>> [...]
>
>I totally agree. We need a plan now, because we can't go through another
>cycle without a solution in sight.
>
>> [...]
>> This has quite a few implications for the way development would
>> operate.
>> 
>>  - The Nova core team at least, would be voluntarily giving up a big
>>amount of responsibility over the evolution of virt drivers. Due
>>to human nature, people are not good at giving up power, so this
>>may be painful to swallow. Realistically current nova core are
>>not experts in most of the virt drivers to start with, and more
>>important we clearly do not have sufficient time to do a good job
>>of review with everything submitted. Much of the current need
>>for core review of virt drivers is to prevent the mis-use of a
>>poorly defined virt driver API...which can be mitigated - See
>>later point(s)
>> 
>>  - Nova core would/should not have automatic +2 over the virt driver
>>repositories since it is unreasonable to assume they have the
>>suitable domain knowledge for all virt drivers out there. People
>>would of course be able to be members of multiple core teams. For
>>example John G would naturally be nova-core and nova-xen-core. I
>>would aim for nova-core and nova-libvirt-core, and so on. I do not
>>want any +2 responsibility over VMWare/HyperV/Docker drivers since
>>they're not my area of expertize - I only look at them today because
>>they have no other nova-core representation.
>> 
>>  - Not sure if it implies the Nova PTL would be solely focused on
>>Nova common. eg would there continue to be one PTL over all virt
>>driver implementation projects, or would each project have its
>>own PTL. Maybe this is irrelevant if a Czars approach is chosen
>>by virt driver projects for their work. I'd be inclined to say
>>that a single PTL should stay as a figurehead to represent all
>>the virt driver projects, acting as a point of contact to ensure
>>we keep communication / co-operation between the drivers in sync.
>> [...]
>
>At this point it may look like our current structure (programs, one PTL,
>single core teams...) prevents us from implementing that solution. I
>just want to say that in OpenStack, organizational structure reflects
>how we work, not the other way around. If we need to reorganize
>"official" project structure to work in smarter and long-term healthy
>ways, that's a really small price to pay.
>
>-- 
>Thierry Carrez (ttx)
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Thierry Carrez
Like I mentioned before, I think the only way out of the Nova death
spiral is to split code and give control over it to smaller dedicated
review teams. This is one way to do it. Thanks Dan for pulling this
together :)

A couple comments inline:

Daniel P. Berrange wrote:
> [...]
> This is a crisis. A large crisis. In fact, if you got a moment, it's
> a twelve-storey crisis with a magnificent entrance hall, carpeting
> throughout, 24-hour portage, and an enormous sign on the roof,
> saying 'This Is a Large Crisis'. A large crisis requires a large
> plan.
> [...]

I totally agree. We need a plan now, because we can't go through another
cycle without a solution in sight.

> [...]
> This has quite a few implications for the way development would
> operate.
> 
>  - The Nova core team at least, would be voluntarily giving up a big
>amount of responsibility over the evolution of virt drivers. Due
>to human nature, people are not good at giving up power, so this
>may be painful to swallow. Realistically current nova core are
>not experts in most of the virt drivers to start with, and more
>important we clearly do not have sufficient time to do a good job
>of review with everything submitted. Much of the current need
>for core review of virt drivers is to prevent the mis-use of a
>poorly defined virt driver API...which can be mitigated - See
>later point(s)
> 
>  - Nova core would/should not have automatic +2 over the virt driver
>repositories since it is unreasonable to assume they have the
>suitable domain knowledge for all virt drivers out there. People
>would of course be able to be members of multiple core teams. For
>example John G would naturally be nova-core and nova-xen-core. I
>would aim for nova-core and nova-libvirt-core, and so on. I do not
>want any +2 responsibility over VMWare/HyperV/Docker drivers since
>they're not my area of expertize - I only look at them today because
>they have no other nova-core representation.
> 
>  - Not sure if it implies the Nova PTL would be solely focused on
>Nova common. eg would there continue to be one PTL over all virt
>driver implementation projects, or would each project have its
>own PTL. Maybe this is irrelevant if a Czars approach is chosen
>by virt driver projects for their work. I'd be inclined to say
>that a single PTL should stay as a figurehead to represent all
>the virt driver projects, acting as a point of contact to ensure
>we keep communication / co-operation between the drivers in sync.
> [...]

At this point it may look like our current structure (programs, one PTL,
single core teams...) prevents us from implementing that solution. I
just want to say that in OpenStack, organizational structure reflects
how we work, not the other way around. If we need to reorganize
"official" project structure to work in smarter and long-term healthy
ways, that's a really small price to pay.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 12:14:39PM +, Day, Phil wrote:
> Hi Daniel,
> 
> Thanks for putting together such a thoughtful piece - I probably need to
> re-read it  few times to take in everything you're saying, but  a couple
> of thoughts that did occur to me:
> 
> - I can see how this could help where a change is fully contained within
> a virt driver, but I wonder how many of those there really are ?   Of the
> things that I've see go through recently nearly all also seem to touch the
> compute manager in someway, and a lot (like the Numa changes) also have 
> impacts into the scheduler. Isn't it going to make it harder to get
> any of those changes in if they have to be co-ordinated across two or
> more repos ?  

Actually, in my experiance of reviewing code this past cycle or two
I see a fairly significant portion of code that is entirely within
the scope of a virt driver. I'm also seeing that people are refraining
from actually doing changes to the virt drivers because of the burden
of getting code past review, so what we see today is probably not even
representative of the potential.

There are certainly some high profile exceptions such as the NUMA
work, or the new serial console work where you're going to cross the
repos. In such work we already try to break patches into isolated
pieces, so the stuff touching common code is a separate commit from
the stuff touching virt code. This is general good practice to be
encouraging. So, yes, it would need coordination across the repos
to get the full work submitted, but I don't think that burden is
unduly large compared to current practice. We do in fact already
see this need for co-ordination in other ways, For example, API
changes have parts that affect python-novaclient, and perhaps
horizon too. Storage & network changes often cross Neutron /
Cinder and Nova. If we can reduce the burden on nova-core the
stuff going into common codebase shoudl stand more chance of
getting review too.

So overall yes, this is a valid point, but I'm not particularly
concerned about the negatives impacts of it, because we're already
dealing with them today to a large extent.

> - I think you hit the nail on the head in terms of the scope of
> Nova and how few people probably really understand all of it,
> but given the amount of trust that goes with being a core wouldn't
> it also be able to make people cores on the understanding that
> they will only approve code in the areas they are expert in ?
>   It kind of feels that this happens to a large extent already,
> for example I don't see Chris or Ken'ichi  taking on work outside
> of the API layer.It kind of feels as if given a small amount
> of trust we could have additional core reviewers focused on
> specific parts of the system without having to split up the
> code base if that's where the problem is.

Yes, you are right that it happens to some extent but I think it
is quite a big jump to effectively scale it up that amount of
trust to a team that realistically would need to be 40+ people in
size.

Also this isn't soley about review bandwidth. One of the things
I raised was about how there's certain standards required for
being part of nova, such as CI testing. If you can't meet that
you're forced into  a sub-optimal development practice compared
to the rest of nova where you are out of tree at subject to be
broken by Nova changes at any time, which is what Docker and
Ironic have been facing.  Separate repos will also facilitate
more targetted application of our testing resources, so vmware
repo changes wouldn't need to suffer false failures from libvirt
tempest jobs, and similarly vmware CI could be made gating for
vmware without causing libvirt code to suffer instability.

> > -Original Message-
> > From: Daniel P. Berrange [mailto:berra...@redhat.com]
> > Sent: 04 September 2014 11:24
> > To: OpenStack Development
> > Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
> > virt
> > drivers
> > 
> > Position statement
> > ==
> > 
> > Over the past year I've increasingly come to the conclusion that Nova is
> > heading for (or probably already at) a major crisis. If steps are not taken 
> > to
> > avert this, the project is likely to loose a non-trivial amount of talent, 
> > both
> > regular code contributors and core team members. That includes myself. This
> > is not good for Nova's long term health and so should be of concern to
> > anyone involved in Nova and OpenStack.
> > 
> > For those who don't want to read the whole mail, the executive summary is
> > that the nova-core team is an unfixable bottleneck in our development
> > process with our curre

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Day, Phil
Hi Daniel,

Thanks for putting together such a thoughtful piece - I probably need to 
re-read it  few times to take in everything you're saying, but  a couple of 
thoughts that did occur to me:

- I can see how this could help where a change is fully contained within a virt 
driver, but I wonder how many of those there really are ?   Of the things that 
I've see go through recently nearly all also seem to touch the compute manager 
in someway, and a lot (like the Numa changes) also have impacts into the 
scheduler. Isn't it going to make it harder to get any of those changes in 
if they have to be co-ordinated across two or more repos ?  

- I think you hit the nail on the head in terms of the scope of Nova and how 
few people probably really understand all of it, but given the amount of trust 
that goes with being a core wouldn't it also be able to make people cores on 
the understanding that they will only approve code in the areas they are expert 
in ?It kind of feels that this happens to a large extent already, for 
example I don't see Chris or Ken'ichi  taking on work outside of the API layer. 
   It kind of feels as if given a small amount of trust we could have 
additional core reviewers focused on specific parts of the system without 
having to split up the code base if that's where the problem is.

Phil




> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: 04 September 2014 11:24
> To: OpenStack Development
> Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt
> drivers
> 
> Position statement
> ==
> 
> Over the past year I've increasingly come to the conclusion that Nova is
> heading for (or probably already at) a major crisis. If steps are not taken to
> avert this, the project is likely to loose a non-trivial amount of talent, 
> both
> regular code contributors and core team members. That includes myself. This
> is not good for Nova's long term health and so should be of concern to
> anyone involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive summary is
> that the nova-core team is an unfixable bottleneck in our development
> process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt drivers out 
> of
> tree and let them all have their own core teams in their area of code, leaving
> current nova core to focus on all the common code outside the virt driver
> impls. I, now, none the less urge people to read the whole mail.
> 
> 
> Background information
> ==
> 
> I see many factors coming together to form the crisis
> 
>  - Burn out of core team members from over work
>  - Difficulty bringing new talent into the core team
>  - Long delay in getting code reviewed & merged
>  - Marginalization of code areas which aren't popular
>  - Increasing size of nova code through new drivers
>  - Exclusion of developers without corporate backing
> 
> Each item on their own may not seem too bad, but combined they add up to
> a big problem.
> 
> Core team burn out
> --
> 
> Having been involved in Nova for several dev cycles now, it is clear that the
> backlog of code up for review never goes away. Even intensive code review
> efforts at various points in the dev cycle makes only a small impact on the
> backlog. This has a pretty significant impact on core team members, as their
> work is never done. At best, the dial is sometimes set to 10, instead of 11.
> 
> Many people, myself included, have built tools to help deal with the reviews
> in a more efficient manner than plain gerrit allows for. These certainly help,
> but they can't ever solve the problem on their own - just make it slightly
> more bearable. And this is not even considering that core team members
> might have useful contributions to make in ways beyond just code review.
> Ultimately the workload is just too high to sustain the levels of review
> required, so core team members will eventually burn out (as they have done
> many times already).
> 
> Even if one person attempts to take the initiative to heavily invest in review
> of certain features it is often to no avail.
> Unless a second dedicated core reviewer can be found to 'tag team' it is hard
> for one person to make a difference. The end result is that a patch is +2d and
> then sits idle for weeks or more until a merge conflict requires it to be
> reposted at which point even that one +2 is lost. This is a pretty 
> demotivating
> outcome for both reviewers & the patch contributor.
> 
> 
> New core team talent
> 
> 
> It can't escape atte

[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
Position statement
==

Over the past year I've increasingly come to the conclusion that
Nova is heading for (or probably already at) a major crisis. If
steps are not taken to avert this, the project is likely to loose
a non-trivial amount of talent, both regular code contributors and
core team members. That includes myself. This is not good for
Nova's long term health and so should be of concern to anyone
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive
summary is that the nova-core team is an unfixable bottleneck
in our development process with our current project structure.
The only way I see to remove the bottleneck is to split the virt
drivers out of tree and let them all have their own core teams
in their area of code, leaving current nova core to focus on
all the common code outside the virt driver impls. I, now, none
the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

 - Burn out of core team members from over work 
 - Difficulty bringing new talent into the core team
 - Long delay in getting code reviewed & merged
 - Marginalization of code areas which aren't popular
 - Increasing size of nova code through new drivers
 - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they
add up to a big problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear
that the backlog of code up for review never goes away. Even
intensive code review efforts at various points in the dev cycle
makes only a small impact on the backlog. This has a pretty
significant impact on core team members, as their work is never
done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with
the reviews in a more efficient manner than plain gerrit allows
for. These certainly help, but they can't ever solve the problem
on their own - just make it slightly more bearable. And this is
not even considering that core team members might have useful
contributions to make in ways beyond just code review. Ultimately
the workload is just too high to sustain the levels of review
required, so core team members will eventually burn out (as they
have done many times already).

Even if one person attempts to take the initiative to heavily
invest in review of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag
team' it is hard for one person to make a difference. The end
result is that a patch is +2d and then sits idle for weeks or
more until a merge conflict requires it to be reposted at which
point even that one +2 is lost. This is a pretty demotivating
outcome for both reviewers & the patch contributor.


New core team talent


It can't escape attention that the Nova core team does not grow
in size very often. When Nova was younger and its code base was
smaller, it was easier for contributors to get onto core because
the base level of knowledge required was that much smaller. To
get onto core today requires a major investment in learning Nova
over a year or more. Even people who potentially have the latent
skills may not have the time available to invest in learning the
entire of Nova.

With the number of reviews proposed to Nova, the core team should
probably be at least double its current size[1]. There is plenty of
expertize in the project as a whole but it is typically focused
into specific areas of the codebase. There is nowhere we can find
20 more people with broad knowledge of the codebase who could be
promoted even over the next year, let alone today. This is ignoring
that many existing members of core are relatively inactive due to
burnout and so need replacing. That means we really need another
25-30 people for core. That's not going to happen.


Code review delays
--

The obvious result of having too much work for too few reviewers
is that code contributors face major delays in getting their work
reviewed and merged. From personal experience, during Juno, I've
probably spent 1 week in aggregate on actual code development vs
8 weeks on waiting on code review. You have to constantly be on
alert for review comments because unless you can respond quickly
(and repost) while you still have the attention of the reviewer,
they may not be look again for days/weeks.

The length of time to get work merged serves as a demotivator to
actually do work in the first place. I've personally avoided doing
alot of code refactoring & cleanup work that would improve the
maintainability of the libvirt driver in the long term, because
I can't face the battle to get it reviewed & merged. Other people
have told me much the same. It is not uncommon to see changes that
have been pending for 2 dev cycles, not because the code was bad
but becau