Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Mon, Sep 15, 2014 at 11:00:15AM +0200, Thierry Carrez wrote: > Chris Friesen wrote: > > On 09/12/2014 04:59 PM, Joe Gordon wrote: > >> [...] > >> Can't you replace the word 'libvirt code' with 'nova code' and this > >> would still be true? Do you think landing virt driver code is harder > >> then landing non virt driver code? If so do you have any numbers to back > >> this up? > >> > >> If the issue here is 'landing code in nova is too painful', then we > >> should discuss solving that more generalized issue first, and maybe we > >> conclude that pulling out the virt drivers gets us the most bang for our > >> buck. But unless we have that more general discussion, saying the right > >> fix for that is to spend a large amount of time working specifically on > >> virt driver related issues seems premature. > > > > I agree that this is a nova issue in general, though I suspect that the > > virt drivers have quite separate developer communities so maybe they > > feel the pain more clearly. But I think the solution is the same in > > both cases: > > > > 1) Allow people to be responsible for a subset of the nova code > > (scheduler, virt, conductor, compute, or even just a single driver). > > They would have significant responsibility for that area of the code. > > This would serve several purposes--people with deep domain-specific > > knowledge would be able to review code that touches that domain, and it > > would free up the nova core team to look at the higher-level picture. > > For changes that cross domains, the people from the relevant domains > > would need to be involved. > > > > 2) Modify the gate tests such that changes that are wholly contained > > within a single area of code are not blocked by gate-blocking-bugs in > > unrelated areas of the code. > > I agree... Landing code in Nova is generally too painful, but the pain > is most apparent in areas which require specific domain expertise (like > a virt driver, where not so many -core are familiar enough with the > domain to review, while the code proposer generally is). Yes, all of Nova is suffering from the pain of merge. I am specifically attacking only the virt drivers in my proposal because I think that has the greatest liklihood of making a noticable improvement to the project. Their teams are already fairly separated from the rest of nova because of the domain expertize, and the code is also probably the most well isolated and logically makes sense as a plugin architecture. We'd be hard pressed to split of other chunks of Nova beyond the schedular that we're already talking about. > IMHO, like I said before, the solution to making Nova (or any other > project, actually) more fluid is to create separate and smaller areas of > expertise, and allow new people to step up and own things. Splitting > virt drivers (once the driver interface is cleaned up) is just one way > of doing it -- that just seems like a natural separation line to use if > we do split. But that would just be a first step: as more internal > interfaces are cleaned up we could (and should) split more. Smaller > groups responsible for smaller areas of code is the way to go. And history of OpenStack projects splitting off shows this can be very successful too Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Chris Friesen wrote: > On 09/12/2014 04:59 PM, Joe Gordon wrote: >> [...] >> Can't you replace the word 'libvirt code' with 'nova code' and this >> would still be true? Do you think landing virt driver code is harder >> then landing non virt driver code? If so do you have any numbers to back >> this up? >> >> If the issue here is 'landing code in nova is too painful', then we >> should discuss solving that more generalized issue first, and maybe we >> conclude that pulling out the virt drivers gets us the most bang for our >> buck. But unless we have that more general discussion, saying the right >> fix for that is to spend a large amount of time working specifically on >> virt driver related issues seems premature. > > I agree that this is a nova issue in general, though I suspect that the > virt drivers have quite separate developer communities so maybe they > feel the pain more clearly. But I think the solution is the same in > both cases: > > 1) Allow people to be responsible for a subset of the nova code > (scheduler, virt, conductor, compute, or even just a single driver). > They would have significant responsibility for that area of the code. > This would serve several purposes--people with deep domain-specific > knowledge would be able to review code that touches that domain, and it > would free up the nova core team to look at the higher-level picture. > For changes that cross domains, the people from the relevant domains > would need to be involved. > > 2) Modify the gate tests such that changes that are wholly contained > within a single area of code are not blocked by gate-blocking-bugs in > unrelated areas of the code. I agree... Landing code in Nova is generally too painful, but the pain is most apparent in areas which require specific domain expertise (like a virt driver, where not so many -core are familiar enough with the domain to review, while the code proposer generally is). IMHO, like I said before, the solution to making Nova (or any other project, actually) more fluid is to create separate and smaller areas of expertise, and allow new people to step up and own things. Splitting virt drivers (once the driver interface is cleaned up) is just one way of doing it -- that just seems like a natural separation line to use if we do split. But that would just be a first step: as more internal interfaces are cleaned up we could (and should) split more. Smaller groups responsible for smaller areas of code is the way to go. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/12/2014 04:59 PM, Joe Gordon wrote: On Thu, Sep 11, 2014 at 2:18 AM, Daniel P. Berrange mailto:berra...@redhat.com>> wrote: FYI, for Juno at least I really don't consider that even the libvirt driver got acceptable review times in any sense. The pain of waiting for reviews in libvirt code I've submitted this cycle is what prompted me to start this thread. All the virt drivers are suffering way more than they should be, but those without core team representation suffer Can't you replace the word 'libvirt code' with 'nova code' and this would still be true? Do you think landing virt driver code is harder then landing non virt driver code? If so do you have any numbers to back this up? If the issue here is 'landing code in nova is too painful', then we should discuss solving that more generalized issue first, and maybe we conclude that pulling out the virt drivers gets us the most bang for our buck. But unless we have that more general discussion, saying the right fix for that is to spend a large amount of time working specifically on virt driver related issues seems premature. I agree that this is a nova issue in general, though I suspect that the virt drivers have quite separate developer communities so maybe they feel the pain more clearly. But I think the solution is the same in both cases: 1) Allow people to be responsible for a subset of the nova code (scheduler, virt, conductor, compute, or even just a single driver). They would have significant responsibility for that area of the code. This would serve several purposes--people with deep domain-specific knowledge would be able to review code that touches that domain, and it would free up the nova core team to look at the higher-level picture. For changes that cross domains, the people from the relevant domains would need to be involved. 2) Modify the gate tests such that changes that are wholly contained within a single area of code are not blocked by gate-blocking-bugs in unrelated areas of the code. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 11, 2014 at 2:18 AM, Daniel P. Berrange wrote: > On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote: > > On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes wrote: > > > > > a) Sorting out the common code is already accounted for in Dan B's > original > > > proposal -- it's a prerequisite for the split. > > > > Its a big prerequisite though. I think we're talking about a release > > worth of work to get that right. I don't object to us doing that work, > > but I think we need to be honest about how long its going to take. It > > will also make the core of nova less agile, as we'll find it hard to > > change the hypervisor driver interface over time. Do we really think > > its ready to be stable? > > Yes, in my proposal I explicitly said we'd need to have Kilo > for all the prep work to clean up the virt API, before only > doing the split in Lx. > > The actual nova/virt/driver.py has been more stable over the > past few releases than I thought it would be. In terms of APIs > we're not really modified existing APIs, mostly added new ones. > Where we did modify existing APIs, we could have easily taken > the approach of adding a new API in parallel and deprecating > the old entry point to maintain compat. > > The big change which isn't visible directly is the conversion > of internal nova code to use objects. Finishing this conversion > is clearly a pre-requisite to any such split, since we'd need > to make sure all data passed into the nova virt APIs as parameters > is stable & well defined. > > > As an alternative approach... > > > > What if we pushed most of the code for a driver into a library? > > Imagine a library which controls the low level operations of a > > hypervisor -- create a vm, attach a NIC, etc. Then the driver would > > become a shim around that which was relatively thin, but owned the > > interface into the nova core. The driver handles the nova specific > > things like knowing how to create a config drive, or how to > > orchestrate with cinder, but hands over all the hypervisor operations > > to the library. If we found a bug in the library we just pin our > > dependancy on the version we know works whilst we fix things. > > > > In fact, the driver inside nova could be a relatively generic "library > > driver", and we could have multiple implementations of the library, > > one for each hypervisor. > > I don't think that particularly solves the problem, particularly > the ones you are most concerned about above of API stability. The > naive impl of any "library" for the virt driver would pretty much > mirror the nova virt API. The virt driver impls would thus have to > do the job of taking the Nova objects passed in as parameters and > turning them into something "stable" to pass to the library. Except > now instead of us only having to figure out a stable API in one > place, every single driver has to reinvent the wheel defining their > own stable interface & objects. I'd also be concerned that ongoing > work on drivers is still going to require alot of patches to Nova > to update the shims all the time, so we're still going to contend > on resource fairly highly. > > > > b) The conflict Dan is speaking of is around the current situation > where we > > > have a limited core review team bandwidth and we have to pick and > choose > > > which virt driver-specific features we will review. This leads to bad > > > feelings and conflict. > > > > The way this worked in the past is we had cores who were subject > > matter experts in various parts of the code -- there is a clear set of > > cores who "get" xen or libivrt for example and I feel like those > > drivers get reasonable review times. What's happened though is that > > we've added a bunch of drivers without adding subject matter experts > > to core to cover those drivers. Those newer drivers therefore have a > > harder time getting things reviewed and approved. > > FYI, for Juno at least I really don't consider that even the libvirt > driver got acceptable review times in any sense. The pain of waiting > for reviews in libvirt code I've submitted this cycle is what prompted > me to start this thread. All the virt drivers are suffering way more > than they should be, but those without core team representation suffer > Can't you replace the word 'libvirt code' with 'nova code' and this would still be true? Do you think landing virt driver code is harder then landing non virt driver code? If so do you have any numbers to back this up? If the issue here is 'landing code in nova is too painful', then we should discuss solving that more generalized issue first, and maybe we conclude that pulling out the virt drivers gets us the most bang for our buck. But unless we have that more general discussion, saying the right fix for that is to spend a large amount of time working specifically on virt driver related issues seems premature. > to an even greater degree. And this is ignoring the point Jay & I > were making about h
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 10 September 2014 22:23, Russell Bryant wrote: > On 09/10/2014 10:35 PM, Armando M. wrote: >> Hi, >> >> I devoured this thread, so much it was interesting and full of >> insights. It's not news that we've been pondering about this in the >> Neutron project for the past and existing cycle or so. >> >> Likely, this effort is going to take more than two cycles, and would >> require a very focused team of people working closely together to >> address this (most likely the core team members plus a few other folks >> interested). >> >> One question I was unable to get a clear answer was: what happens to >> existing/new bug fixes and features? Would the codebase go in lockdown >> mode, i.e. not accepting anything else that isn't specifically >> targeting this objective? Just using NFV as an example, I can't >> imagine having changes supporting NFV still being reviewed and merged >> while this process takes place...it would be like shooting at a moving >> target! If we did go into lockdown mode, what happens to all the >> corporate-backed agendas that aim at delivering new value to >> OpenStack? > > Yes, I imagine a temporary slow-down on new feature development makes > sense. However, I don't think it has to be across the board. Things > should be considered case by case, like usual. Aren't we trying to move away from the 'usual'? Considering things on a case by case basis still requires review cycles, etc. Keeping the status quo would mean prolonging the exact pain we're trying to address. > > For example, a feature that requires invasive changes to the virt driver > interface might have a harder time during this transition, but a more > straight forward feature isolated to the internals of a driver might be > fine to let through. Like anything else, we have to weight cost/benefit. > >> Should we relax what goes into the stable branches, i.e. considering >> having a Juno on steroids six months from now that includes some of >> the features/fixes that didn't land in time before this process kicks >> off? > > No ... maybe I misunderstand the suggestion, but I definitely would not > be in favor of a Juno branch with features that haven't landed in master. > I was thinking of the bold move of having Kilo (and beyond) developments solely focused on this transition. Until this is complete, nothing would be merged that is not directly pertaining this objective. At the same time, we'd still want pending features/fixes (and possibly new features) to land somewhere stable-ish. I fear that doing so in master, while stuff is churned up and moved out into external repos, will makes this whole task harder than it already is. Thanks, Armando > -- > Russell Bryant > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/11/2014 11:14 AM, Gary Kotton wrote: > > > On 9/11/14, 4:30 PM, "Sean Dague" wrote: > >> On 09/11/2014 09:09 AM, Gary Kotton wrote: >>> >>> >>> On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: >>> Sean Dague wrote: > [...] > Why don't we start with "let's clean up the virt interface and make it > more sane", as I don't think there is any disagreement there. If it's > going to take a cycle, it's going to take a cycle anyway (it will > probably take 2 cycles, realistically, we always underestimate these > things, remember when no-db-compute was going to be 1 cycle?). I don't > see the need to actually decide here and now that the split is clearly > at least 7 - 12 months away. A lot happens in the intervening time. Yes, that sounds like the logical next step. We can't split drivers without first doing that anyway. I still think "people need smaller areas of work", as Vish eloquently put it. I still hope that refactoring our test architecture will let us reach the same level of quality with only a fraction of the tests being run at the gate, which should address most of the harm you see in adding additional repositories. But I agree there is little point in discussing splitting virt drivers (or anything else, really) until the internal interface below that potential split is fully cleaned up and it becomes an option. >>> >>> How about we start to try and patch gerrit to provide +2 permissions for >>> people >>> Who can be assigned Œdriver core¹ status. This is something that is >>> relevant to Nova and Neutron and I guess Cinder too. >> >> If you think that's the right solution, I'd say go and investigate it >> with folks that understand enough gerrit internals to be able to figure >> out how hard it would be. Start a conversation in #openstack-infra to >> explore it. >> >> My expectation is that there is more complexity there than you give it >> credit for. That being said one of the biggest limitations we've had on >> gerrit changes is we've effectively only got one community member, Kai, >> who does any of that. If other people, or teams, were willing to dig in >> and own things like this, that might be really helpful. > > What about what Radoslav suggested? Having a background task running - > that can set a flag indicating that the code has been approved by the > driver ‘maintainers’. This can be something that driver CI should run - > that is, driver code can only be approved if it has X +1’s from the driver > maintainers and a +1 from the driver CI. There is a ton of complexity and open questions with that approach as well, largely, again because people are designing systems based on gerrit from the hip without actually understanding gerrit. If someone wants to devote time to that kind of system and architecture, they should engage the infra team to understand what can and can't be done here. And take that on as a Kilo cycle goal. It would be useful, but there is no 'simply' about it. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, 2014-09-11 at 16:20 +0100, Duncan Thomas wrote: > On 11 September 2014 15:35, James Bottomley > wrote: > > > OK, so look at a concrete example: in 2002, the Linux kernel went with > > bitkeeper precisely because we'd reached the scaling limit of a single > > integration point, so we took the kernel from a single contributing team > > to a bunch of them. This was expanded with git in 2005 and leads to the > > hundreds of contributing teams we have today. > > > One thing the kernel has that Openstack doesn't, that alter the way > this model plays out, is a couple of very strong, forthright and frank > personalities at the top who are pretty well respected. Both Andrew > and Linux (and others) regularly if not frequently rip into ideas > quite scathingly, even after they have passed other barriers and > gauntlets and just say no to things. Openstack has nothing of this > sort, and there is no evidence that e.g. the TC can, should or desire > to fill this role. Linus is the court of last appeal. It's already a team negotiation failure if stuff bubbles up to him. The somewhat abrasive response you'll get if you're being stupid acts as strong downward incentive on the teams to sort out their own API squabbles *before* they get this type of visibility. The whole point of open source is aligning the structures with the desire to fix it yourself. In an ideal world, everything would get sorted at the local level and nothing would bubble up. Of course, the world isn't ideal, so you need some court of last appeal, but it doesn't have to be an individual ... it just has to be something that's daunting, to encourage local settlement, and decisive. Every process has to have something like this anyway. If there's no process way of sorting out intractable disputes, they go on for ever and damage the project. James ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 11 September 2014 15:35, James Bottomley wrote: > OK, so look at a concrete example: in 2002, the Linux kernel went with > bitkeeper precisely because we'd reached the scaling limit of a single > integration point, so we took the kernel from a single contributing team > to a bunch of them. This was expanded with git in 2005 and leads to the > hundreds of contributing teams we have today. One thing the kernel has that Openstack doesn't, that alter the way this model plays out, is a couple of very strong, forthright and frank personalities at the top who are pretty well respected. Both Andrew and Linux (and others) regularly if not frequently rip into ideas quite scathingly, even after they have passed other barriers and gauntlets and just say no to things. Openstack has nothing of this sort, and there is no evidence that e.g. the TC can, should or desire to fill this role. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 9/11/14, 4:30 PM, "Sean Dague" wrote: >On 09/11/2014 09:09 AM, Gary Kotton wrote: >> >> >> On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: >> >>> Sean Dague wrote: [...] Why don't we start with "let's clean up the virt interface and make it more sane", as I don't think there is any disagreement there. If it's going to take a cycle, it's going to take a cycle anyway (it will probably take 2 cycles, realistically, we always underestimate these things, remember when no-db-compute was going to be 1 cycle?). I don't see the need to actually decide here and now that the split is clearly at least 7 - 12 months away. A lot happens in the intervening time. >>> >>> Yes, that sounds like the logical next step. We can't split drivers >>> without first doing that anyway. I still think "people need smaller >>> areas of work", as Vish eloquently put it. I still hope that >>>refactoring >>> our test architecture will let us reach the same level of quality with >>> only a fraction of the tests being run at the gate, which should >>>address >>> most of the harm you see in adding additional repositories. But I agree >>> there is little point in discussing splitting virt drivers (or anything >>> else, really) until the internal interface below that potential split >>>is >>> fully cleaned up and it becomes an option. >> >> How about we start to try and patch gerrit to provide +2 permissions for >> people >> Who can be assigned Œdriver core¹ status. This is something that is >> relevant to Nova and Neutron and I guess Cinder too. > >If you think that's the right solution, I'd say go and investigate it >with folks that understand enough gerrit internals to be able to figure >out how hard it would be. Start a conversation in #openstack-infra to >explore it. > >My expectation is that there is more complexity there than you give it >credit for. That being said one of the biggest limitations we've had on >gerrit changes is we've effectively only got one community member, Kai, >who does any of that. If other people, or teams, were willing to dig in >and own things like this, that might be really helpful. What about what Radoslav suggested? Having a background task running - that can set a flag indicating that the code has been approved by the driver ‘maintainers’. This can be something that driver CI should run - that is, driver code can only be approved if it has X +1’s from the driver maintainers and a +1 from the driver CI. > > -Sean > >-- >Sean Dague >https://urldefense.proofpoint.com/v1/url?u=http://dague.net/&k=oIvRg1%2BdG >AgOoM1BIlLLqw%3D%3D%0A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D% >0A&m=krRe7RLL8WDd62ypHGZ6F1MqaSzJLkWn153Ch9UZktk%3D%0A&s=9b417c5fd29939b40 >eee619ca9ed30be48192d939b824941d42d6e6ab36b1883 > >___ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, 2014-09-11 at 07:36 -0400, Sean Dague wrote: > >>> b) The conflict Dan is speaking of is around the current situation where > >>> we > >>> have a limited core review team bandwidth and we have to pick and choose > >>> which virt driver-specific features we will review. This leads to bad > >>> feelings and conflict. > >> > >> The way this worked in the past is we had cores who were subject > >> matter experts in various parts of the code -- there is a clear set of > >> cores who "get" xen or libivrt for example and I feel like those > >> drivers get reasonable review times. What's happened though is that > >> we've added a bunch of drivers without adding subject matter experts > >> to core to cover those drivers. Those newer drivers therefore have a > >> harder time getting things reviewed and approved. > > > > FYI, for Juno at least I really don't consider that even the libvirt > > driver got acceptable review times in any sense. The pain of waiting > > for reviews in libvirt code I've submitted this cycle is what prompted > > me to start this thread. All the virt drivers are suffering way more > > than they should be, but those without core team representation suffer > > to an even greater degree. And this is ignoring the point Jay & I > > were making about how the use of a single team means that there is > > always contention for feature approval, so much work gets cut right > > at the start even if maintainers of that area felt it was valuable > > and worth taking. > > I continue to not understand how N non overlapping teams makes this any > better. You have to pay the integration cost somewhere. Right now we're > trying to pay it 1 patch at a time. This model means the integration > units get much bigger, and with less common ground. OK, so look at a concrete example: in 2002, the Linux kernel went with bitkeeper precisely because we'd reached the scaling limit of a single integration point, so we took the kernel from a single contributing team to a bunch of them. This was expanded with git in 2005 and leads to the hundreds of contributing teams we have today. The reason this scales nicely is precisely because the integration costs are lower. However, there are a couple of principles that really assist us getting there. The first is internal API management: an Internal API is a contract between two teams (may be more, but usually two). If someone wants to change this API they have to negotiate between the two (or more) teams. This naturally means that only the affected components review this API change, but *only* they need to review it, so it doesn't bubble up to the whole kernel community. The second is automation: linux-next and the zero day test programme build and smoke test an integration of all our development trees. If one team does something that impacts another in their development tree, this system gives us immediate warning. Basically we run continuous integration, so when Linus does his actual integration pull, everything goes smoothly (that's how we integrate all the 300 or so trees for a kernel release in about ten days). We also now have a lot of review automation (checkpatch.pl for instance), but that's independent of the number of teams In this model the scaling comes from the local reviews and integration. The more teams the greater the scaling. The factor which obstructs scaling is the internal API ... it usually doesn't make sense to separate a component where there's no API between the two pieces ... however, if you think there should be, separating and telling the teams to figure it out is a great way to generate the API. The point here is that since an API is a contract, forcing people to negotiate and abide by the contract tends to make them think much more carefully about it. Internal API moves from being a global issue to being a local one. By the way, the extra link work is actually time well spent because it means the link APIs are negotiated by teams with use cases not just designed by abstract architecture. The greater the link pain the greater the indication that there's an API problem and the greater the pressure on the teams either end to fix it. Once the link pain is minimised, the API is likely a good one. > Look at how much active work in crossing core teams we've had to do to > make any real progress on the neutron replacing nova-network front. And > how slow that process is. I think you'll see that hugely show up here. Well, as I said, separating the components leads to API negotiation between the teams Because of the API negotiation, taking one thing and making it two does cause more work, and it's visible work because the two new teams get to do the API negotiation which didn't exist before. The trick to getting the model to scale is the network effect. The scaling comes by splitting out into high numbers of teams (say N) the added work comes in the links (the API contracts) between the N teams. If the network is star shaped (ev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Rados, personally, i'd want a human to do the +W. Also the critieria would include a 3) which is the CI for the driver if applicable. On Thu, Sep 11, 2014 at 9:53 AM, Radoslav Gerganov wrote: > On 09/11/2014 04:30 PM, Sean Dague wrote: >> >> On 09/11/2014 09:09 AM, Gary Kotton wrote: >>> >>> >>> >>> On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: >>> Sean Dague wrote: > > [...] > Why don't we start with "let's clean up the virt interface and make it > more sane", as I don't think there is any disagreement there. If it's > going to take a cycle, it's going to take a cycle anyway (it will > probably take 2 cycles, realistically, we always underestimate these > things, remember when no-db-compute was going to be 1 cycle?). I don't > see the need to actually decide here and now that the split is clearly > at least 7 - 12 months away. A lot happens in the intervening time. Yes, that sounds like the logical next step. We can't split drivers without first doing that anyway. I still think "people need smaller areas of work", as Vish eloquently put it. I still hope that refactoring our test architecture will let us reach the same level of quality with only a fraction of the tests being run at the gate, which should address most of the harm you see in adding additional repositories. But I agree there is little point in discussing splitting virt drivers (or anything else, really) until the internal interface below that potential split is fully cleaned up and it becomes an option. >>> >>> >>> How about we start to try and patch gerrit to provide +2 permissions for >>> people >>> Who can be assigned Œdriver core¹ status. This is something that is >>> relevant to Nova and Neutron and I guess Cinder too. >> >> >> If you think that's the right solution, I'd say go and investigate it >> with folks that understand enough gerrit internals to be able to figure >> out how hard it would be. Start a conversation in #openstack-infra to >> explore it. >> >> My expectation is that there is more complexity there than you give it >> credit for. That being said one of the biggest limitations we've had on >> gerrit changes is we've effectively only got one community member, Kai, >> who does any of that. If other people, or teams, were willing to dig in >> and own things like this, that might be really helpful. > > > I don't think we need to modify gerrit to support this functionality. We can > simply have a gerrit job (similar to the existing CI jobs) which is run on > every patch set and checks if: > 1) the changes are only under /nova/virt/XYZ and /nova/tests/virt/XYZ > 2) it has two +1 from maintainers of driver XYZ > > if the above conditions are met, the job will post W+1 for this patchset. > Does that make sense? > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: http://davanum.wordpress.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 11 September 2014 12:36, Sean Dague wrote: > I continue to not understand how N non overlapping teams makes this any > better. You have to pay the integration cost somewhere. Right now we're > trying to pay it 1 patch at a time. This model means the integration > units get much bigger, and with less common ground. > > Look at how much active work in crossing core teams we've had to do to > make any real progress on the neutron replacing nova-network front. And > how slow that process is. I think you'll see that hugely show up here. Cinder has also suffered extreme latency trying to make changes to the nova<->cinder interface, to a sufficient degree that work is under consideration to move the interface to give cinder more control over parts of it. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/11/2014 04:30 PM, Sean Dague wrote: On 09/11/2014 09:09 AM, Gary Kotton wrote: On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: Sean Dague wrote: [...] Why don't we start with "let's clean up the virt interface and make it more sane", as I don't think there is any disagreement there. If it's going to take a cycle, it's going to take a cycle anyway (it will probably take 2 cycles, realistically, we always underestimate these things, remember when no-db-compute was going to be 1 cycle?). I don't see the need to actually decide here and now that the split is clearly at least 7 - 12 months away. A lot happens in the intervening time. Yes, that sounds like the logical next step. We can't split drivers without first doing that anyway. I still think "people need smaller areas of work", as Vish eloquently put it. I still hope that refactoring our test architecture will let us reach the same level of quality with only a fraction of the tests being run at the gate, which should address most of the harm you see in adding additional repositories. But I agree there is little point in discussing splitting virt drivers (or anything else, really) until the internal interface below that potential split is fully cleaned up and it becomes an option. How about we start to try and patch gerrit to provide +2 permissions for people Who can be assigned Œdriver core¹ status. This is something that is relevant to Nova and Neutron and I guess Cinder too. If you think that's the right solution, I'd say go and investigate it with folks that understand enough gerrit internals to be able to figure out how hard it would be. Start a conversation in #openstack-infra to explore it. My expectation is that there is more complexity there than you give it credit for. That being said one of the biggest limitations we've had on gerrit changes is we've effectively only got one community member, Kai, who does any of that. If other people, or teams, were willing to dig in and own things like this, that might be really helpful. I don't think we need to modify gerrit to support this functionality. We can simply have a gerrit job (similar to the existing CI jobs) which is run on every patch set and checks if: 1) the changes are only under /nova/virt/XYZ and /nova/tests/virt/XYZ 2) it has two +1 from maintainers of driver XYZ if the above conditions are met, the job will post W+1 for this patchset. Does that make sense? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/11/2014 09:09 AM, Gary Kotton wrote: > > > On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: > >> Sean Dague wrote: >>> [...] >>> Why don't we start with "let's clean up the virt interface and make it >>> more sane", as I don't think there is any disagreement there. If it's >>> going to take a cycle, it's going to take a cycle anyway (it will >>> probably take 2 cycles, realistically, we always underestimate these >>> things, remember when no-db-compute was going to be 1 cycle?). I don't >>> see the need to actually decide here and now that the split is clearly >>> at least 7 - 12 months away. A lot happens in the intervening time. >> >> Yes, that sounds like the logical next step. We can't split drivers >> without first doing that anyway. I still think "people need smaller >> areas of work", as Vish eloquently put it. I still hope that refactoring >> our test architecture will let us reach the same level of quality with >> only a fraction of the tests being run at the gate, which should address >> most of the harm you see in adding additional repositories. But I agree >> there is little point in discussing splitting virt drivers (or anything >> else, really) until the internal interface below that potential split is >> fully cleaned up and it becomes an option. > > How about we start to try and patch gerrit to provide +2 permissions for > people > Who can be assigned Œdriver core¹ status. This is something that is > relevant to Nova and Neutron and I guess Cinder too. If you think that's the right solution, I'd say go and investigate it with folks that understand enough gerrit internals to be able to figure out how hard it would be. Start a conversation in #openstack-infra to explore it. My expectation is that there is more complexity there than you give it credit for. That being said one of the biggest limitations we've had on gerrit changes is we've effectively only got one community member, Kai, who does any of that. If other people, or teams, were willing to dig in and own things like this, that might be really helpful. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 07:23 PM, Michael Still wrote: On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes wrote: a) Sorting out the common code is already accounted for in Dan B's original proposal -- it's a prerequisite for the split. Its a big prerequisite though. I think we're talking about a release worth of work to get that right. I don't object to us doing that work, but I think we need to be honest about how long its going to take. It will also make the core of nova less agile, as we'll find it hard to change the hypervisor driver interface over time. Do we really think its ready to be stable? I don't. For a long time now I've wanted to split the gigantic spawn() method in the virt api into more discrete steps. I think there's some opportunity for doing some steps in parallel and the potential to have failures reported earlier and handled better. But I've been sitting on it because I wanted to use 'tasks' as a way to address the parallelization and that work hasn't happened yet. But this work would be introducing new calls which would be used based on some sort of capability query to the driver, so I don't think this work is necessarily hindered by stabilizing the interface. I also think the migration/resize methods could use some analysis before making a determination that they are what we want in a stable interface. As an alternative approach... What if we pushed most of the code for a driver into a library? Imagine a library which controls the low level operations of a hypervisor -- create a vm, attach a NIC, etc. Then the driver would become a shim around that which was relatively thin, but owned the interface into the nova core. The driver handles the nova specific things like knowing how to create a config drive, or how to orchestrate with cinder, but hands over all the hypervisor operations to the library. If we found a bug in the library we just pin our dependancy on the version we know works whilst we fix things. In fact, the driver inside nova could be a relatively generic "library driver", and we could have multiple implementations of the library, one for each hypervisor. This would make testing nova easier too, because we know how to mock libraries already. Now, that's kind of what we have in the hypervisor driver API now. What I'm proposing is that the point where we break out of the nova code base should be closer to the hypervisor than what that API presents. b) The conflict Dan is speaking of is around the current situation where we have a limited core review team bandwidth and we have to pick and choose which virt driver-specific features we will review. This leads to bad feelings and conflict. The way this worked in the past is we had cores who were subject matter experts in various parts of the code -- there is a clear set of cores who "get" xen or libivrt for example and I feel like those drivers get reasonable review times. What's happened though is that we've added a bunch of drivers without adding subject matter experts to core to cover those drivers. Those newer drivers therefore have a harder time getting things reviewed and approved. That said, a heap of cores have spent time reviewing vmware driver code this release, so its obviously not as simple as I describe above. c) It's the impact to the CI and testing load that I see being the biggest benefit to the split-out driver repos. Patches proposed to the XenAPI driver shouldn't have the Hyper-V CI tests run against the patch. Likewise, running libvirt unit tests in the VMWare driver repo doesn't make a whole lot of sense, and all of these tests add a not-insignificant load to the overall upstream and external CI systems. The long wait time for tests to come back means contributors get frustrated, since many reviewers tend to wait until Jenkins returns some result before they review. All of this leads to increased conflict that would be somewhat ameliorated by having separate code repos for the virt drivers. It is already possible to filter CI runs to specific paths in the code. We just didn't choose to do that for policy reasons. We could change that right now with a trivial tweak to each CI system's zuul config. Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 9/11/14, 2:55 PM, "Thierry Carrez" wrote: >Sean Dague wrote: >> [...] >> Why don't we start with "let's clean up the virt interface and make it >> more sane", as I don't think there is any disagreement there. If it's >> going to take a cycle, it's going to take a cycle anyway (it will >> probably take 2 cycles, realistically, we always underestimate these >> things, remember when no-db-compute was going to be 1 cycle?). I don't >> see the need to actually decide here and now that the split is clearly >> at least 7 - 12 months away. A lot happens in the intervening time. > >Yes, that sounds like the logical next step. We can't split drivers >without first doing that anyway. I still think "people need smaller >areas of work", as Vish eloquently put it. I still hope that refactoring >our test architecture will let us reach the same level of quality with >only a fraction of the tests being run at the gate, which should address >most of the harm you see in adding additional repositories. But I agree >there is little point in discussing splitting virt drivers (or anything >else, really) until the internal interface below that potential split is >fully cleaned up and it becomes an option. How about we start to try and patch gerrit to provide +2 permissions for people Who can be assigned Œdriver core¹ status. This is something that is relevant to Nova and Neutron and I guess Cinder too. Thanks Gary > >-- >Thierry Carrez (ttx) > >___ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Sean Dague wrote: > [...] > Why don't we start with "let's clean up the virt interface and make it > more sane", as I don't think there is any disagreement there. If it's > going to take a cycle, it's going to take a cycle anyway (it will > probably take 2 cycles, realistically, we always underestimate these > things, remember when no-db-compute was going to be 1 cycle?). I don't > see the need to actually decide here and now that the split is clearly > at least 7 - 12 months away. A lot happens in the intervening time. Yes, that sounds like the logical next step. We can't split drivers without first doing that anyway. I still think "people need smaller areas of work", as Vish eloquently put it. I still hope that refactoring our test architecture will let us reach the same level of quality with only a fraction of the tests being run at the gate, which should address most of the harm you see in adding additional repositories. But I agree there is little point in discussing splitting virt drivers (or anything else, really) until the internal interface below that potential split is fully cleaned up and it becomes an option. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/11/2014 05:18 AM, Daniel P. Berrange wrote: > On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote: >> On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes wrote: >> >>> a) Sorting out the common code is already accounted for in Dan B's original >>> proposal -- it's a prerequisite for the split. >> >> Its a big prerequisite though. I think we're talking about a release >> worth of work to get that right. I don't object to us doing that work, >> but I think we need to be honest about how long its going to take. It >> will also make the core of nova less agile, as we'll find it hard to >> change the hypervisor driver interface over time. Do we really think >> its ready to be stable? > > Yes, in my proposal I explicitly said we'd need to have Kilo > for all the prep work to clean up the virt API, before only > doing the split in Lx. > > The actual nova/virt/driver.py has been more stable over the > past few releases than I thought it would be. In terms of APIs > we're not really modified existing APIs, mostly added new ones. > Where we did modify existing APIs, we could have easily taken > the approach of adding a new API in parallel and deprecating > the old entry point to maintain compat. > > The big change which isn't visible directly is the conversion > of internal nova code to use objects. Finishing this conversion > is clearly a pre-requisite to any such split, since we'd need > to make sure all data passed into the nova virt APIs as parameters > is stable & well defined. > >> As an alternative approach... >> >> What if we pushed most of the code for a driver into a library? >> Imagine a library which controls the low level operations of a >> hypervisor -- create a vm, attach a NIC, etc. Then the driver would >> become a shim around that which was relatively thin, but owned the >> interface into the nova core. The driver handles the nova specific >> things like knowing how to create a config drive, or how to >> orchestrate with cinder, but hands over all the hypervisor operations >> to the library. If we found a bug in the library we just pin our >> dependancy on the version we know works whilst we fix things. >> >> In fact, the driver inside nova could be a relatively generic "library >> driver", and we could have multiple implementations of the library, >> one for each hypervisor. > > I don't think that particularly solves the problem, particularly > the ones you are most concerned about above of API stability. The > naive impl of any "library" for the virt driver would pretty much > mirror the nova virt API. The virt driver impls would thus have to > do the job of taking the Nova objects passed in as parameters and > turning them into something "stable" to pass to the library. Except > now instead of us only having to figure out a stable API in one > place, every single driver has to reinvent the wheel defining their > own stable interface & objects. I'd also be concerned that ongoing > work on drivers is still going to require alot of patches to Nova > to update the shims all the time, so we're still going to contend > on resource fairly highly. > >>> b) The conflict Dan is speaking of is around the current situation where we >>> have a limited core review team bandwidth and we have to pick and choose >>> which virt driver-specific features we will review. This leads to bad >>> feelings and conflict. >> >> The way this worked in the past is we had cores who were subject >> matter experts in various parts of the code -- there is a clear set of >> cores who "get" xen or libivrt for example and I feel like those >> drivers get reasonable review times. What's happened though is that >> we've added a bunch of drivers without adding subject matter experts >> to core to cover those drivers. Those newer drivers therefore have a >> harder time getting things reviewed and approved. > > FYI, for Juno at least I really don't consider that even the libvirt > driver got acceptable review times in any sense. The pain of waiting > for reviews in libvirt code I've submitted this cycle is what prompted > me to start this thread. All the virt drivers are suffering way more > than they should be, but those without core team representation suffer > to an even greater degree. And this is ignoring the point Jay & I > were making about how the use of a single team means that there is > always contention for feature approval, so much work gets cut right > at the start even if maintainers of that area felt it was valuable > and worth taking. I continue to not understand how N non overlapping teams makes this any better. You have to pay the integration cost somewhere. Right now we're trying to pay it 1 patch at a time. This model means the integration units get much bigger, and with less common ground. Look at how much active work in crossing core teams we've had to do to make any real progress on the neutron replacing nova-network front. And how slow that process is. I think you'll see that hugely show u
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 06:11 PM, Jay Pipes wrote: > > > On 09/10/2014 05:55 PM, Chris Friesen wrote: >> On 09/10/2014 02:44 AM, Daniel P. Berrange wrote: >>> On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: >> I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to implement it. Maybe it's time to think about an implementation. We have been thinking about mentors https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? Sub-team with +1.5 scoring capabilities? >>> >>> I think that setting up subteams is neccessary to stop us imploding but >>> I don't think it is enough. As long as we have one repo we're forever >>> going to have conflict & contention in deciding which features to >>> accept, >>> which is a big factor in problems today. >> >> If each hypervisor team mostly only modifies their own code, why would >> there be conflict? >> >> As I see it, the only causes for conflict would be in the shared code, >> and you'd still need to sort out the issues with the shared code even if >> you split out the individual drivers into separate repos. > > a) Sorting out the common code is already accounted for in Dan B's > original proposal -- it's a prerequisite for the split. > > b) The conflict Dan is speaking of is around the current situation where > we have a limited core review team bandwidth and we have to pick and > choose which virt driver-specific features we will review. This leads to > bad feelings and conflict. > > c) It's the impact to the CI and testing load that I see being the > biggest benefit to the split-out driver repos. Patches proposed to the > XenAPI driver shouldn't have the Hyper-V CI tests run against the patch. > Likewise, running libvirt unit tests in the VMWare driver repo doesn't > make a whole lot of sense, and all of these tests add a > not-insignificant load to the overall upstream and external CI systems. > The long wait time for tests to come back means contributors get > frustrated, since many reviewers tend to wait until Jenkins returns some > result before they review. All of this leads to increased conflict that > would be somewhat ameliorated by having separate code repos for the virt > drivers. So I haven't done the math recently, what do you expect the time savings to be here? Because unit tests aren't run by 3rd party today. On my fancy desktop (test time including testr overhead): * tox -epy27: 330s * tox -epy27 libvirt: 18s * tox -epy27 vmware: 9s * tox -epy27 xen: 18s * tox -epy27 hyperv: 13s The testr overhead is about 8s for discovery (yes, I do realize that's probably more than it should be, that's a different story), so we'd be looking at a reduction of about 10% of the total run time of unit tests if we don't have the virt drivers in tree. That's not very much. The only reason we're asking 3rd party CI folks to test everything... is policy. I don't think it's a big deal to only require them to test changes that hit their driver. Just decide that. The conflict isn't going to go away, it's going to now exist on integration, where there isn't a single core team to work through it in a holistic way. This is hugely more painful place to pay it. ... Right now the top of the gate is 26 hrs. One of the reasons that that continues to grow and get worse over time is related to the total # of git trees that we have to integrate that don't have common core teams across them that understand their interactions correctly. I firmly believe that anything that creates more git trees that we have to integrate after the fact makes that worse. I believe the 10+ oslo lib trees have made this worse. I believe continuing to add new integrated projects has made this worse. And I believe that a virt driver split by any project will make it worse, if we expect to test that code upstream. The docker driver in stackforge has been a success in merging docker code. It's been much less of a success in terms of making it easy for anyone to run it, use it, or for us to get on common ground for a containers service moving forward. The bulk of the folks that would be on the driver teams don't really look at failures that expose in the gate. So I have a lot of trepidation about the claims that this will make integration better by folks that don't spend a lot of time looking and helping on our current integration. ... That being said, I'm entirely pro cleaning up the virt interfaces as a matter of paying down debt. That was a blueprint when I first joined the project, that died on the vine somewhere. I think more common infrastructure for virt drivers would be a good thing, and make the code a lot more understandable. And as it's the prereq for any of this, so let's do it. Why don't we start with "let's clean up the virt interface and make it more sane", as I don't think there is any disagreement there. If it's
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Wed, Sep 10, 2014 at 07:35:05PM -0700, Armando M. wrote: > Hi, > > I devoured this thread, so much it was interesting and full of > insights. It's not news that we've been pondering about this in the > Neutron project for the past and existing cycle or so. > > Likely, this effort is going to take more than two cycles, and would > require a very focused team of people working closely together to > address this (most likely the core team members plus a few other folks > interested). > > One question I was unable to get a clear answer was: what happens to > existing/new bug fixes and features? Would the codebase go in lockdown > mode, i.e. not accepting anything else that isn't specifically > targeting this objective? Just using NFV as an example, I can't > imagine having changes supporting NFV still being reviewed and merged > while this process takes place...it would be like shooting at a moving > target! If we did go into lockdown mode, what happens to all the > corporate-backed agendas that aim at delivering new value to > OpenStack? I don't think it is credible to say we'd go into lockldown refusing all other feature proposals, precisely for the kind of reasons you mention. We have to recognise that people will want to continue to contribute stuff and that's fine in general. The primary impact will be around prioritization of work. eg in the event of contention for attention / approval, work on refactoring would be given priority over other feature work. I'd expect that we'd still acceptable a reasonable amount of other feature work, because mythical man month paradigm means you can't put every contributor & reviewer working on the same refactoring problem at once. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote: > On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes wrote: > > > a) Sorting out the common code is already accounted for in Dan B's original > > proposal -- it's a prerequisite for the split. > > Its a big prerequisite though. I think we're talking about a release > worth of work to get that right. I don't object to us doing that work, > but I think we need to be honest about how long its going to take. It > will also make the core of nova less agile, as we'll find it hard to > change the hypervisor driver interface over time. Do we really think > its ready to be stable? Yes, in my proposal I explicitly said we'd need to have Kilo for all the prep work to clean up the virt API, before only doing the split in Lx. The actual nova/virt/driver.py has been more stable over the past few releases than I thought it would be. In terms of APIs we're not really modified existing APIs, mostly added new ones. Where we did modify existing APIs, we could have easily taken the approach of adding a new API in parallel and deprecating the old entry point to maintain compat. The big change which isn't visible directly is the conversion of internal nova code to use objects. Finishing this conversion is clearly a pre-requisite to any such split, since we'd need to make sure all data passed into the nova virt APIs as parameters is stable & well defined. > As an alternative approach... > > What if we pushed most of the code for a driver into a library? > Imagine a library which controls the low level operations of a > hypervisor -- create a vm, attach a NIC, etc. Then the driver would > become a shim around that which was relatively thin, but owned the > interface into the nova core. The driver handles the nova specific > things like knowing how to create a config drive, or how to > orchestrate with cinder, but hands over all the hypervisor operations > to the library. If we found a bug in the library we just pin our > dependancy on the version we know works whilst we fix things. > > In fact, the driver inside nova could be a relatively generic "library > driver", and we could have multiple implementations of the library, > one for each hypervisor. I don't think that particularly solves the problem, particularly the ones you are most concerned about above of API stability. The naive impl of any "library" for the virt driver would pretty much mirror the nova virt API. The virt driver impls would thus have to do the job of taking the Nova objects passed in as parameters and turning them into something "stable" to pass to the library. Except now instead of us only having to figure out a stable API in one place, every single driver has to reinvent the wheel defining their own stable interface & objects. I'd also be concerned that ongoing work on drivers is still going to require alot of patches to Nova to update the shims all the time, so we're still going to contend on resource fairly highly. > > b) The conflict Dan is speaking of is around the current situation where we > > have a limited core review team bandwidth and we have to pick and choose > > which virt driver-specific features we will review. This leads to bad > > feelings and conflict. > > The way this worked in the past is we had cores who were subject > matter experts in various parts of the code -- there is a clear set of > cores who "get" xen or libivrt for example and I feel like those > drivers get reasonable review times. What's happened though is that > we've added a bunch of drivers without adding subject matter experts > to core to cover those drivers. Those newer drivers therefore have a > harder time getting things reviewed and approved. FYI, for Juno at least I really don't consider that even the libvirt driver got acceptable review times in any sense. The pain of waiting for reviews in libvirt code I've submitted this cycle is what prompted me to start this thread. All the virt drivers are suffering way more than they should be, but those without core team representation suffer to an even greater degree. And this is ignoring the point Jay & I were making about how the use of a single team means that there is always contention for feature approval, so much work gets cut right at the start even if maintainers of that area felt it was valuable and worth taking. > > c) It's the impact to the CI and testing load that I see being the biggest > > benefit to the split-out driver repos. Patches proposed to the XenAPI driver > > shouldn't have the Hyper-V CI tests run against the patch. Likewise, running > > libvirt unit tests in the VMWare driver repo doesn't make a whole lot of > > sense, and all of these tests add a not-insignificant load to the overall > > upstream and external CI systems. The long wait time for tests to come back > > means contributors get frustrated, since many reviewers tend to wait until > > Jenkins returns some result before they review. All of this leads to > > incre
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 10:35 PM, Armando M. wrote: > Hi, > > I devoured this thread, so much it was interesting and full of > insights. It's not news that we've been pondering about this in the > Neutron project for the past and existing cycle or so. > > Likely, this effort is going to take more than two cycles, and would > require a very focused team of people working closely together to > address this (most likely the core team members plus a few other folks > interested). > > One question I was unable to get a clear answer was: what happens to > existing/new bug fixes and features? Would the codebase go in lockdown > mode, i.e. not accepting anything else that isn't specifically > targeting this objective? Just using NFV as an example, I can't > imagine having changes supporting NFV still being reviewed and merged > while this process takes place...it would be like shooting at a moving > target! If we did go into lockdown mode, what happens to all the > corporate-backed agendas that aim at delivering new value to > OpenStack? Yes, I imagine a temporary slow-down on new feature development makes sense. However, I don't think it has to be across the board. Things should be considered case by case, like usual. For example, a feature that requires invasive changes to the virt driver interface might have a harder time during this transition, but a more straight forward feature isolated to the internals of a driver might be fine to let through. Like anything else, we have to weight cost/benefit. > Should we relax what goes into the stable branches, i.e. considering > having a Juno on steroids six months from now that includes some of > the features/fixes that didn't land in time before this process kicks > off? No ... maybe I misunderstand the suggestion, but I definitely would not be in favor of a Juno branch with features that haven't landed in master. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Hi, I devoured this thread, so much it was interesting and full of insights. It's not news that we've been pondering about this in the Neutron project for the past and existing cycle or so. Likely, this effort is going to take more than two cycles, and would require a very focused team of people working closely together to address this (most likely the core team members plus a few other folks interested). One question I was unable to get a clear answer was: what happens to existing/new bug fixes and features? Would the codebase go in lockdown mode, i.e. not accepting anything else that isn't specifically targeting this objective? Just using NFV as an example, I can't imagine having changes supporting NFV still being reviewed and merged while this process takes place...it would be like shooting at a moving target! If we did go into lockdown mode, what happens to all the corporate-backed agendas that aim at delivering new value to OpenStack? Should we relax what goes into the stable branches, i.e. considering having a Juno on steroids six months from now that includes some of the features/fixes that didn't land in time before this process kicks off? I like the end goal of having a leaner Nova (or Neutron for that matter), it's the transition that scares me a bit! Armando ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes wrote: > a) Sorting out the common code is already accounted for in Dan B's original > proposal -- it's a prerequisite for the split. Its a big prerequisite though. I think we're talking about a release worth of work to get that right. I don't object to us doing that work, but I think we need to be honest about how long its going to take. It will also make the core of nova less agile, as we'll find it hard to change the hypervisor driver interface over time. Do we really think its ready to be stable? As an alternative approach... What if we pushed most of the code for a driver into a library? Imagine a library which controls the low level operations of a hypervisor -- create a vm, attach a NIC, etc. Then the driver would become a shim around that which was relatively thin, but owned the interface into the nova core. The driver handles the nova specific things like knowing how to create a config drive, or how to orchestrate with cinder, but hands over all the hypervisor operations to the library. If we found a bug in the library we just pin our dependancy on the version we know works whilst we fix things. In fact, the driver inside nova could be a relatively generic "library driver", and we could have multiple implementations of the library, one for each hypervisor. This would make testing nova easier too, because we know how to mock libraries already. Now, that's kind of what we have in the hypervisor driver API now. What I'm proposing is that the point where we break out of the nova code base should be closer to the hypervisor than what that API presents. > b) The conflict Dan is speaking of is around the current situation where we > have a limited core review team bandwidth and we have to pick and choose > which virt driver-specific features we will review. This leads to bad > feelings and conflict. The way this worked in the past is we had cores who were subject matter experts in various parts of the code -- there is a clear set of cores who "get" xen or libivrt for example and I feel like those drivers get reasonable review times. What's happened though is that we've added a bunch of drivers without adding subject matter experts to core to cover those drivers. Those newer drivers therefore have a harder time getting things reviewed and approved. That said, a heap of cores have spent time reviewing vmware driver code this release, so its obviously not as simple as I describe above. > c) It's the impact to the CI and testing load that I see being the biggest > benefit to the split-out driver repos. Patches proposed to the XenAPI driver > shouldn't have the Hyper-V CI tests run against the patch. Likewise, running > libvirt unit tests in the VMWare driver repo doesn't make a whole lot of > sense, and all of these tests add a not-insignificant load to the overall > upstream and external CI systems. The long wait time for tests to come back > means contributors get frustrated, since many reviewers tend to wait until > Jenkins returns some result before they review. All of this leads to > increased conflict that would be somewhat ameliorated by having separate > code repos for the virt drivers. It is already possible to filter CI runs to specific paths in the code. We just didn't choose to do that for policy reasons. We could change that right now with a trivial tweak to each CI system's zuul config. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 04:11 PM, Jay Pipes wrote: On 09/10/2014 05:55 PM, Chris Friesen wrote: If each hypervisor team mostly only modifies their own code, why would there be conflict? As I see it, the only causes for conflict would be in the shared code, and you'd still need to sort out the issues with the shared code even if you split out the individual drivers into separate repos. a) Sorting out the common code is already accounted for in Dan B's original proposal -- it's a prerequisite for the split. Fair enough. b) The conflict Dan is speaking of is around the current situation where we have a limited core review team bandwidth and we have to pick and choose which virt driver-specific features we will review. This leads to bad feelings and conflict. Why does the core review team need to review virt driver-specific stuff. If we're looking at making subteams responsible for the libvirt code then it really doesn't matter where the code resides as long as everyone knows who owns it. c) It's the impact to the CI and testing load that I see being the biggest benefit to the split-out driver repos. Patches proposed to the XenAPI driver shouldn't have the Hyper-V CI tests run against the patch. Likewise, running libvirt unit tests in the VMWare driver repo doesn't make a whole lot of sense, and all of these tests add a not-insignificant load to the overall upstream and external CI systems. The long wait time for tests to come back means contributors get frustrated, since many reviewers tend to wait until Jenkins returns some result before they review. All of this leads to increased conflict that would be somewhat ameliorated by having separate code repos for the virt drivers. Has anyone considered making the CI tools smarter? Maybe have a way to determine which tests to run based on the code being modified? If someone makes a change in nova/virt/libvirt there's a limited set of tests that make sense to run...there's no need to run xen/hyperv/vmware CI tests against it, for example. Similarly, there's no need to run all the nova-scheduler, neutron, server groups, etc. tests. That way we could give a subteam real responsibility for a specific area of the code, and submissions to that area of the code would not be gated by bugs in unrelated areas of the code. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 05:55 PM, Chris Friesen wrote: On 09/10/2014 02:44 AM, Daniel P. Berrange wrote: On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to implement it. Maybe it's time to think about an implementation. We have been thinking about mentors https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? Sub-team with +1.5 scoring capabilities? I think that setting up subteams is neccessary to stop us imploding but I don't think it is enough. As long as we have one repo we're forever going to have conflict & contention in deciding which features to accept, which is a big factor in problems today. If each hypervisor team mostly only modifies their own code, why would there be conflict? As I see it, the only causes for conflict would be in the shared code, and you'd still need to sort out the issues with the shared code even if you split out the individual drivers into separate repos. a) Sorting out the common code is already accounted for in Dan B's original proposal -- it's a prerequisite for the split. b) The conflict Dan is speaking of is around the current situation where we have a limited core review team bandwidth and we have to pick and choose which virt driver-specific features we will review. This leads to bad feelings and conflict. c) It's the impact to the CI and testing load that I see being the biggest benefit to the split-out driver repos. Patches proposed to the XenAPI driver shouldn't have the Hyper-V CI tests run against the patch. Likewise, running libvirt unit tests in the VMWare driver repo doesn't make a whole lot of sense, and all of these tests add a not-insignificant load to the overall upstream and external CI systems. The long wait time for tests to come back means contributors get frustrated, since many reviewers tend to wait until Jenkins returns some result before they review. All of this leads to increased conflict that would be somewhat ameliorated by having separate code repos for the virt drivers. Best, jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 02:44 AM, Daniel P. Berrange wrote: On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to implement it. Maybe it's time to think about an implementation. We have been thinking about mentors https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? Sub-team with +1.5 scoring capabilities? I think that setting up subteams is neccessary to stop us imploding but I don't think it is enough. As long as we have one repo we're forever going to have conflict & contention in deciding which features to accept, which is a big factor in problems today. If each hypervisor team mostly only modifies their own code, why would there be conflict? As I see it, the only causes for conflict would be in the shared code, and you'd still need to sort out the issues with the shared code even if you split out the individual drivers into separate repos. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
-Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Wednesday, September 10, 2014 1:45 AM On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: > > To me, this means you don't really want a sin bin where you dump > > drivers and tell them not to come out until they're fit to be > > reviewed by the core; You want a trusted driver community which does > > its own reviews and means the core doesn't have to review them. > > I think we're going somewhere here, based on your comment and other's: > we may achieve some result if we empower a new set of people to manage > drivers, keeping them in the same repositories where they are now. This > new set of people may not be the current core reviewers but other with > different skillsets and more capable of understanding the driver's > ecosystem, needs, motivations, etc. > > I have the impression this idea has been circling around for a while but > for some reason or another (like lack of capabilities in gerrit and > other reasons) we never tried to implement it. Maybe it's time to think > about an implementation. We have been thinking about mentors > https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? > Sub-team with +1.5 scoring capabilities? I think that setting up subteams is neccessary to stop us imploding but I don't think it is enough. As long as we have one repo we're forever going to have conflict & contention in deciding which features to accept, which is a big factor in problems today. I favour the strong split of the drivers into separate repositories to remove the contente between the teams as much as is practical. [Rocky Grober] +100 There is a huge benefit to getting the drivers into separate repositories. Once the APIs/interfaces in Nova are clean enough to support the move, they will stay cleaner than if the drivers are in the same repository. And the subteams will ensure that the drivers are to their level of quality. The CI system will be easier to manage with thirdparty CIs for each of the drivers. And to get changes into Nova Core, the subteams will need to cooperate, as any core change that affects one driver will most likely affect others, so it will be in the subteams' best interests to keep the driver/core APIs clean and free of special cases. >From Kyle's later mail: >I think that is absolutely the case: sub-team leaders need to be vetted based >on their upstream communication skills. I also think what we're looking at in >Neutron is giving sub-teams a shelf-life, and spinning them down rather than >letting them live long-term, lose focus, and wander aimlessly. This is also a very important point that I'd like to expand on. The subteams really should form a "drivers" team composed of each subteams' PTLs. This drivers team would be the interface to Nova Core and would need those upstream communications skills. This team could also be the place Nova Core/Driver API changes get discussed and finalized from the drivers' perspective. Maybe the Drivers PTL team should even start with electing a Nova Core from its PTLs as the Drivers team lead. This team would also be the perfect place for Nova PTL and team to work with Drivers teams to collaborate on specs and issues. Unlike in Neutron, the subteams wouldn't roll back into the Nova core, as their charter/purpose will continue to develop as hypervisors, containers, bare metal and other new virtual control planes develop. Getting these teams right will mean more agility, higher quality and better consistency within the Nova ecosystem. The drivers team should become strong partners with Nova core in allowing Nova to innovate more quickly while addressing technical debt to increase quality around the Nova/drivers interactions. --Rocky [/Rocky Grober] Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Wed, Sep 10, 2014 at 12:34 PM, Stefano Maffulli wrote: > On 09/10/2014 02:27 AM, Sylvain Bauza wrote: >> Well, both proposals can be done : we can create subteams and the >> Subteam-Approval Gerrit label right know before Kilo, and we could split >> the virt repos by later once the interfaces and prereqs are done. > > That's what I mean in fact: create sub team is fairly easy to do, we can > start very soon. Splitting the repos will require the cleanup in > internal interfaces, documentation and the other things that (while > important and needed anyway) will require probably one cycle, if not more. > > On 09/10/2014 09:07 AM, Kyle Mestery wrote: >> I would be cautious around sub-teams. Our experience in Neutron has >> been that we do a very good job of setting up sub-teams, but a >> terrible job at deciding when they should be spun-down and folded back >> in. Because in a lot of cases, a sub-team's existance should be for a >> short period of time. The other problem is that sub-teams can tend to >> wander away from the broader team, making it harder for their work to >> be integrated back into the whole. All of this is to say that >> sub-teams require coordination and lots of communication, and should >> be carefully watched, groomed, and culled when necessary. > > This is great feedback. Maybe picking the leaders of the sub-teams based > on their communication skills and their understanding of the bigger > picture would help? How would you do things differently, based on your > experience? > I think that is absolutely the case: sub-team leaders need to be vetted based on their upstream communication skills. I also think what we're looking at in Neutron is giving sub-teams a shelf-life, and spinning them down rather than letting them live long-term, lose focus, and wander aimlessly. Thanks, Kyle > /stef > > -- > Ask and answer questions on https://ask.openstack.org > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/10/2014 02:27 AM, Sylvain Bauza wrote: > Well, both proposals can be done : we can create subteams and the > Subteam-Approval Gerrit label right know before Kilo, and we could split > the virt repos by later once the interfaces and prereqs are done. That's what I mean in fact: create sub team is fairly easy to do, we can start very soon. Splitting the repos will require the cleanup in internal interfaces, documentation and the other things that (while important and needed anyway) will require probably one cycle, if not more. On 09/10/2014 09:07 AM, Kyle Mestery wrote: > I would be cautious around sub-teams. Our experience in Neutron has > been that we do a very good job of setting up sub-teams, but a > terrible job at deciding when they should be spun-down and folded back > in. Because in a lot of cases, a sub-team's existance should be for a > short period of time. The other problem is that sub-teams can tend to > wander away from the broader team, making it harder for their work to > be integrated back into the whole. All of this is to say that > sub-teams require coordination and lots of communication, and should > be carefully watched, groomed, and culled when necessary. This is great feedback. Maybe picking the leaders of the sub-teams based on their communication skills and their understanding of the bigger picture would help? How would you do things differently, based on your experience? /stef -- Ask and answer questions on https://ask.openstack.org ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Wed, Sep 10, 2014 at 3:44 AM, Daniel P. Berrange wrote: > On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: >> > To me, this means you don't really want a sin bin where you dump >> > drivers and tell them not to come out until they're fit to be >> > reviewed by the core; You want a trusted driver community which does >> > its own reviews and means the core doesn't have to review them. >> >> I think we're going somewhere here, based on your comment and other's: >> we may achieve some result if we empower a new set of people to manage >> drivers, keeping them in the same repositories where they are now. This >> new set of people may not be the current core reviewers but other with >> different skillsets and more capable of understanding the driver's >> ecosystem, needs, motivations, etc. >> >> I have the impression this idea has been circling around for a while but >> for some reason or another (like lack of capabilities in gerrit and >> other reasons) we never tried to implement it. Maybe it's time to think >> about an implementation. We have been thinking about mentors >> https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? >> Sub-team with +1.5 scoring capabilities? > > I think that setting up subteams is neccessary to stop us imploding but > I don't think it is enough. As long as we have one repo we're forever > going to have conflict & contention in deciding which features to accept, > which is a big factor in problems today. I favour the strong split of the > drivers into separate repositories to remove the contente between the > teams as much as is practical. > I would be cautious around sub-teams. Our experience in Neutron has been that we do a very good job of setting up sub-teams, but a terrible job at deciding when they should be spun-down and folded back in. Because in a lot of cases, a sub-team's existance should be for a short period of time. The other problem is that sub-teams can tend to wander away from the broader team, making it harder for their work to be integrated back into the whole. All of this is to say that sub-teams require coordination and lots of communication, and should be carefully watched, groomed, and culled when necessary. Thanks, Kyle > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Le 10/09/2014 10:44, Daniel P. Berrange a écrit : On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: To me, this means you don't really want a sin bin where you dump drivers and tell them not to come out until they're fit to be reviewed by the core; You want a trusted driver community which does its own reviews and means the core doesn't have to review them. I think we're going somewhere here, based on your comment and other's: we may achieve some result if we empower a new set of people to manage drivers, keeping them in the same repositories where they are now. This new set of people may not be the current core reviewers but other with different skillsets and more capable of understanding the driver's ecosystem, needs, motivations, etc. I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to implement it. Maybe it's time to think about an implementation. We have been thinking about mentors https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? Sub-team with +1.5 scoring capabilities? I think that setting up subteams is neccessary to stop us imploding but I don't think it is enough. As long as we have one repo we're forever going to have conflict & contention in deciding which features to accept, which is a big factor in problems today. I favour the strong split of the drivers into separate repositories to remove the contente between the teams as much as is practical. Regards, Daniel Well, both proposals can be done : we can create subteams and the Subteam-Approval Gerrit label right know before Kilo, and we could split the virt repos by later once the interfaces and prereqs are done. Having subteams would be even better for the virt split, as you could find whose halfcores (here, that's how I call subteam's people) are good for becoming virt cores once the repository is set up. -Sylvain ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: > > To me, this means you don't really want a sin bin where you dump > > drivers and tell them not to come out until they're fit to be > > reviewed by the core; You want a trusted driver community which does > > its own reviews and means the core doesn't have to review them. > > I think we're going somewhere here, based on your comment and other's: > we may achieve some result if we empower a new set of people to manage > drivers, keeping them in the same repositories where they are now. This > new set of people may not be the current core reviewers but other with > different skillsets and more capable of understanding the driver's > ecosystem, needs, motivations, etc. > > I have the impression this idea has been circling around for a while but > for some reason or another (like lack of capabilities in gerrit and > other reasons) we never tried to implement it. Maybe it's time to think > about an implementation. We have been thinking about mentors > https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? > Sub-team with +1.5 scoring capabilities? I think that setting up subteams is neccessary to stop us imploding but I don't think it is enough. As long as we have one repo we're forever going to have conflict & contention in deciding which features to accept, which is a big factor in problems today. I favour the strong split of the drivers into separate repositories to remove the contente between the teams as much as is practical. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/09/2014 06:55 AM, James Bottomley wrote: > CLAs are a well known and documented barrier to casual contributions I'm not convinced about this statement, at all. And since I think it's secondary to what we're discussing, I'll leave it as is and go on. > I've done both ... I do prefer the patch workflow to the gerrit one, [...] Do you consider yourself a 'committed' developer or a casual one? Because ultimately I think this is what it comes down to: a developer who has a commitment to get a patch landed in tree has a different motivation and set of incentives that make climbing the learning curve more appealing. A casual contributor is a different persona. > Bad code is a bit of a pejorative term. I used the wrong term, I apologize if I offended someone: it wasn't my intention. > However, I can sympathize with the view: In the Linux Kernel, drivers > are often the biggest source of coding style and maintenance issues. > I maintain a driver subsystem and I would have to admit that a lot of > code that goes into those drivers that wouldn't be of sufficient > quality to be admitted to the core kernel without a lot more clean up > and flow changes. thanks for saying this a lot more nicely than my rough expression. > To me, this means you don't really want a sin bin where you dump > drivers and tell them not to come out until they're fit to be > reviewed by the core; You want a trusted driver community which does > its own reviews and means the core doesn't have to review them. I think we're going somewhere here, based on your comment and other's: we may achieve some result if we empower a new set of people to manage drivers, keeping them in the same repositories where they are now. This new set of people may not be the current core reviewers but other with different skillsets and more capable of understanding the driver's ecosystem, needs, motivations, etc. I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to implement it. Maybe it's time to think about an implementation. We have been thinking about mentors https://wiki.openstack.org/wiki/Mentors, maybe that's a way to go? Sub-team with +1.5 scoring capabilities? /stef -- Ask and answer questions on https://ask.openstack.org ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Mon, 2014-09-08 at 17:20 -0700, Stefano Maffulli wrote: > On 09/05/2014 07:07 PM, James Bottomley wrote: > > Actually, I don't think this analysis is accurate. Some people are > > simply interested in small aspects of a project. It's the "scratch your > > own itch" part of open source. The thing which makes itch scratchers > > not lone wolfs is the desire to go the extra mile to make what they've > > done useful to the community. If they never do this, they likely have a > > forked repo with only their changes (and are the epitome of a lone > > wolf). If you scratch your own itch and make the effort to get it > > upstream, you're assisting the community (even if that's the only piece > > of code you do) and that assistance makes you (at least for a time) part > > of the community. > > I'm starting to think that the processes we have implemented are slowing > down (if not preventing) "scratch your own itch" contributions. The CLA > has been identified as the cause for this but after carefully looking at > our development processes and the documentation, I think that's only one > part of the problem (and maybe not even as big as initially thought). CLAs are a well known and documented barrier to casual contributions (just look at all the project harmony discussion), they affect one offs disproportionately since they require an investment of effort to understand and legal resources are often unavailable to individuals. The key problem for individuals in the US is usually do I or my employer own my contribution? Because that makes a huge difference to the process for signing. > The gerrit workflow for example is something that requires quite an > investment in time and energy and casual developers (think operators > fixing small bugs in code, or documentation) have little incentive to go > through the learning curve. I've done both ... I do prefer the patch workflow to the gerrit one, but I think that's just because the former is what I used for ten years and I'm very comfortable with it. The good thing about the patch workflow is that the initial barrier is very low. However, the later barriers can be as high or higher. > To go back in topic, to the proposal to split drivers out of tree, I > think we may want to evaluate other, simpler, paths before we embark in > a huge task which is already quite clear will require more cross-project > coordination. > > From conversations with PTLs and core reviewers I get the impression > that lots of drivers contributions come with bad code. Bad code is a bit of a pejorative term. However, I can sympathize with the view: In the Linux Kernel, drivers are often the biggest source of coding style and maintenance issues. I maintain a driver subsystem and I would have to admit that a lot of code that goes into those drivers that wouldn't be of sufficient quality to be admitted to the core kernel without a lot more clean up and flow changes. However, is this bad code? It mostly works, so it does the job it's designed for. Usually the company producing the device is the one maintaining the driver so as long as they have the maintenance burden and do their job there's no real harm. It's a balance, and sometimes I get it wrong, but I do know from bitter effort that there's a limit to what you can get busy developers to do in the driver space. > These require a > lot of time and reviewers energy to be cleaned up, causing burn out and > bad feelings on all sides. What if we establish a new 'place' of some > sort where we can send people to improve their code (or dump it without > interfering with core?) Somewhere there may be a workflow > "go-improve-over-there" where a Community Manager (or mentors or some > other program we may invent) takes over and does what core reviewers > have been trying to do 'on the side'? The advantage is that this way we > don't have to change radically how current teams operate, we may be able > to start this immediately with Kilo. Thoughts? I think it's a question of communities, like Daniel said. In the kernel, the driver reviewers are a different community from the core kernel code reviewers. Most core reviewers would probably fry their own eyeballs before they'd review device driver code. So the solution is not to make them; instead we set up a review community of people who understand driver code and make allowances for some of its eccentricities. At the end of the day, bad code is measured by defect count which impacts usability for drivers and the reputation of that driver is what suffers. I'm sure in OpenStack, driver reputation is an easy way to encourage better drivers ... after all hypervisors are pretty fungible: if the Bar hypervisor driver is awful, you can use the Foo hypervisor instead. People who want you to switch to the Baz hypervisor would need to make sure you have a pretty awesome experience when you take it for a spin, so they're most naturally inclined to spend the time writing good code. To me,
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/09/14 01:20, Stefano Maffulli wrote: > From conversations with PTLs and core reviewers I get the impression > that lots of drivers contributions come with bad code. These require a > lot of time and reviewers energy to be cleaned up, causing burn out and > bad feelings on all sides. What if we establish a new 'place' of some > sort where we can send people to improve their code (or dump it without > interfering with core?) Somewhere there may be a workflow > "go-improve-over-there" where a Community Manager (or mentors or some > other program we may invent) takes over and does what core reviewers > have been trying to do 'on the side'? The advantage is that this way we > don't have to change radically how current teams operate, we may be able > to start this immediately with Kilo. Thoughts? I can't speak for other areas of the codebase, but certainly in the VMware driver the technical debt has been allowed to accrue in the past precisely because the review process itself is so tortuously slow. This results in a death spiral of code quality, and ironically the review process has been the cause, not the solution. In Juno we have put our major focus on refactor work, which has meant essentially no feature work for an entire cycle. This is painful, but unfortunately necessary with the current process. As an exercise, look at what has been merged in the VMware driver during Juno. Consider how many developer weeks that should reasonably have taken. Then consider how many developer weeks it actually took. Is the current process conducive to productivity? The answer is clearly and emphatically no. Is it worth it in its current form? Obviously not. So, should driver contributors be forced to play in the sandpit before mixing with the big boys? If a tortuously slow review process is a primary cause of technical debt, will adding more steps to it improve the situation? I hope the answer is obvious. And I'll be honest, I found the suggestion more than a little patronising. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Mon, Sep 08, 2014 at 05:20:54PM -0700, Stefano Maffulli wrote: > On 09/05/2014 07:07 PM, James Bottomley wrote: > > Actually, I don't think this analysis is accurate. Some people are > > simply interested in small aspects of a project. It's the "scratch your > > own itch" part of open source. The thing which makes itch scratchers > > not lone wolfs is the desire to go the extra mile to make what they've > > done useful to the community. If they never do this, they likely have a > > forked repo with only their changes (and are the epitome of a lone > > wolf). If you scratch your own itch and make the effort to get it > > upstream, you're assisting the community (even if that's the only piece > > of code you do) and that assistance makes you (at least for a time) part > > of the community. [snip] > From conversations with PTLs and core reviewers I get the impression > that lots of drivers contributions come with bad code. These require a > lot of time and reviewers energy to be cleaned up, causing burn out and > bad feelings on all sides. What if we establish a new 'place' of some > sort where we can send people to improve their code (or dump it without > interfering with core?) Somewhere there may be a workflow > "go-improve-over-there" where a Community Manager (or mentors or some > other program we may invent) takes over and does what core reviewers > have been trying to do 'on the side'? The advantage is that this way we > don't have to change radically how current teams operate, we may be able > to start this immediately with Kilo. Thoughts? I don't really I agree with the suggestion that contributions to drivers are largely "bad code". Sure there are some contributors who are worse than others, and when reviewing I've seen that pretty much anywhere in the code tree. That's just life when you have a project that welcomes contributions from anyone. I wouldn't want to send new contributors to a different place to "improve their code" as it would be yet another thing for them to go through before getting their code accepted and I don't think it'd really make a significant difference to our workload overall. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
A somewhat self-serving way to start to solve this is to make training and mentoring as the first steps to getting involved with contributions. We would continue to gradually give more responsibilities as the experience and skills increase. We do this last part already, but are missing the support for the mentoring and training. I think landing this mentoring responsibility into the ambassador program makes some sense. This doesn't directly solve the subject of this thread. But it does start the process of giving help to those that are trying to learn inline while the cores are trying to land quality code. What do you think? ~sean > On Sep 8, 2014, at 5:20 PM, Stefano Maffulli wrote: > >> On 09/05/2014 07:07 PM, James Bottomley wrote: >> Actually, I don't think this analysis is accurate. Some people are >> simply interested in small aspects of a project. It's the "scratch your >> own itch" part of open source. The thing which makes itch scratchers >> not lone wolfs is the desire to go the extra mile to make what they've >> done useful to the community. If they never do this, they likely have a >> forked repo with only their changes (and are the epitome of a lone >> wolf). If you scratch your own itch and make the effort to get it >> upstream, you're assisting the community (even if that's the only piece >> of code you do) and that assistance makes you (at least for a time) part >> of the community. > > I'm starting to think that the processes we have implemented are slowing > down (if not preventing) "scratch your own itch" contributions. The CLA > has been identified as the cause for this but after carefully looking at > our development processes and the documentation, I think that's only one > part of the problem (and maybe not even as big as initially thought). > > The gerrit workflow for example is something that requires quite an > investment in time and energy and casual developers (think operators > fixing small bugs in code, or documentation) have little incentive to go > through the learning curve. > > To go back in topic, to the proposal to split drivers out of tree, I > think we may want to evaluate other, simpler, paths before we embark in > a huge task which is already quite clear will require more cross-project > coordination. > > From conversations with PTLs and core reviewers I get the impression > that lots of drivers contributions come with bad code. These require a > lot of time and reviewers energy to be cleaned up, causing burn out and > bad feelings on all sides. What if we establish a new 'place' of some > sort where we can send people to improve their code (or dump it without > interfering with core?) Somewhere there may be a workflow > "go-improve-over-there" where a Community Manager (or mentors or some > other program we may invent) takes over and does what core reviewers > have been trying to do 'on the side'? The advantage is that this way we > don't have to change radically how current teams operate, we may be able > to start this immediately with Kilo. Thoughts? > > /stef > > -- > Ask and answer questions on https://ask.openstack.org > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 07:07 PM, James Bottomley wrote: > Actually, I don't think this analysis is accurate. Some people are > simply interested in small aspects of a project. It's the "scratch your > own itch" part of open source. The thing which makes itch scratchers > not lone wolfs is the desire to go the extra mile to make what they've > done useful to the community. If they never do this, they likely have a > forked repo with only their changes (and are the epitome of a lone > wolf). If you scratch your own itch and make the effort to get it > upstream, you're assisting the community (even if that's the only piece > of code you do) and that assistance makes you (at least for a time) part > of the community. I'm starting to think that the processes we have implemented are slowing down (if not preventing) "scratch your own itch" contributions. The CLA has been identified as the cause for this but after carefully looking at our development processes and the documentation, I think that's only one part of the problem (and maybe not even as big as initially thought). The gerrit workflow for example is something that requires quite an investment in time and energy and casual developers (think operators fixing small bugs in code, or documentation) have little incentive to go through the learning curve. To go back in topic, to the proposal to split drivers out of tree, I think we may want to evaluate other, simpler, paths before we embark in a huge task which is already quite clear will require more cross-project coordination. >From conversations with PTLs and core reviewers I get the impression that lots of drivers contributions come with bad code. These require a lot of time and reviewers energy to be cleaned up, causing burn out and bad feelings on all sides. What if we establish a new 'place' of some sort where we can send people to improve their code (or dump it without interfering with core?) Somewhere there may be a workflow "go-improve-over-there" where a Community Manager (or mentors or some other program we may invent) takes over and does what core reviewers have been trying to do 'on the side'? The advantage is that this way we don't have to change radically how current teams operate, we may be able to start this immediately with Kilo. Thoughts? /stef -- Ask and answer questions on https://ask.openstack.org ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 03:54:28PM -0700, Stefano Maffulli wrote: > Thanks Daniel for taking the time to write such deep message. Obviously > you have thought about this issue for a long time and your opinion comes > from deep personal understanding. I'm adding tags for neutron and > cinder, as I know they're having similar conversations. > > I don't have a strong opinion on the solution you and Kyle seem to be > leaning toward, I just have a couple of comments/warnings below > > On 09/04/2014 03:24 AM, Daniel P. Berrange wrote: > > saying 'This Is a Large Crisis'. A large crisis requires a large > > plan. > > Not necessarily, quite the contrary indeed. To address and solve big > problems, experience and management literature suggest it's a lot better > to make *one* small change, measure its effect and make one more change, > measure its effect, and on and on until perfection. The discussion > triggered by TripleO about 'what to measure' goes in the right > direction.[1] FWIW, don't read too much into that particular paragraph/sentance - it is a humourous joke/quote from a british TV comedy :-) > Your proposal seem to require a long term investment before its effects > can be visible, although some of the things necessary for the split will > be needed anyway. Do you think there are small changes with high impact > that we can refine in Paris and put in place for Juno? If we wanted to do a short term improvement, we'd probably have to look at relaxing the way we apply our current 2 x +2 == +A policy in some way. eg we'd have to look at perhaps identifying core virt driver team members, and then treating their +1s as equivalent to a +2 if given on a virt-driver only change, and so setting +A after only getting one +2. > The other comment I have is about the risks of splitting teams and > create new ones whose only expertise is their company's domain. I'm > concerned of the bad side effect of having teams in Nova Program with > very limited or no incentive at all to participate in nova-common > project since all they care about will be their little (proprietary) > hypervisor or network driver. I fear we may end up with nova-common > owned by a handful of people from a couple of companies, limping along, > while nova-drivers devs throw stones or ignore. > Maybe this worst case scenario of disenfranchised membership is not as > bad as I think it would be, I'm voicing my concern also to gauge this > risk better. What are your thoughts on this specific risk? How can we > mitigate it? One of the important things I think we need todo is firm up the nova internal virt driver API to make it more well specified, as a way to prevent some of the sloopy bad practice all the virt driers engage in today. I still see a fairly reasonable number of feature requests that will involve nova common code, so even with a virt driver split, the virt driver teams are forced to engage with the nova common code to get some non-trivial portion of their work done. So if virt driver teams don't help out with nova common code work, they're going to find life hard for themselves when they do have features that involve nova common. In many ways I think we are already suffering quite alot from the problem you describe today in several ways. A large portion of the people contributing to all the virt drivers only really focus their attention on their own area of interest, ignoring nova common. I cannot entirely blame them for that because learning more of nova is a significant investment of effort. This is one of the reasons we struggle to identify enough people with broad enough knowledge to promote to nova core. I think I can also see parallels in the relationship between the major projects (nova, neutron, cinder, etc) and the olso project. It is hard go get the downsteam consumer projects to take an active interest in work on oslo itself. This was probably worse when oslo first started out, but it is now a more established team. I accept that splitting the drivers out from nova common will probably re-inforce the separation of work to some degree. The biggest benefits will come to the virt driver teams themselves by unblocking them from all competing for the same finite core reviewer resource. The remaining nova core team will probably gain a little bit more time (perhaps 10-15%) by not having to pay attention to the virt driver code changes directly but overall it wouldn't be a drammatic improvement there. The overall reduction in repo size might help new contributors get up the on-ramp to being part of the team, since smaller codebases are easier to learn in general. Overall I don't have a knockout answer to your concern though, other than to say we're already facing that problem to quite a large extent and modularization as a general concept has proved quite successful for the growth of openstack projects that have split out from nova in the past. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.co
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
Thanks Daniel for taking the time to write such deep message. Obviously you have thought about this issue for a long time and your opinion comes from deep personal understanding. I'm adding tags for neutron and cinder, as I know they're having similar conversations. I don't have a strong opinion on the solution you and Kyle seem to be leaning toward, I just have a couple of comments/warnings below On 09/04/2014 03:24 AM, Daniel P. Berrange wrote: > saying 'This Is a Large Crisis'. A large crisis requires a large > plan. Not necessarily, quite the contrary indeed. To address and solve big problems, experience and management literature suggest it's a lot better to make *one* small change, measure its effect and make one more change, measure its effect, and on and on until perfection. The discussion triggered by TripleO about 'what to measure' goes in the right direction.[1] Your proposal seem to require a long term investment before its effects can be visible, although some of the things necessary for the split will be needed anyway. Do you think there are small changes with high impact that we can refine in Paris and put in place for Juno? The other comment I have is about the risks of splitting teams and create new ones whose only expertise is their company's domain. I'm concerned of the bad side effect of having teams in Nova Program with very limited or no incentive at all to participate in nova-common project since all they care about will be their little (proprietary) hypervisor or network driver. I fear we may end up with nova-common owned by a handful of people from a couple of companies, limping along, while nova-drivers devs throw stones or ignore. Maybe this worst case scenario of disenfranchised membership is not as bad as I think it would be, I'm voicing my concern also to gauge this risk better. What are your thoughts on this specific risk? How can we mitigate it? /stef [1] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044689.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev