Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model

Zane Bitter Wed, 24 Sep 2014 10:57:43 -0700

On 18/09/14 14:53, Monty Taylor wrote:

Hey all,


I've recently been thinking a lot about Sean's Layers stuff. So I wrote
a blog post which Jim Blair and Devananda were kind enough to help me edit.

http://inaugust.com/post/108


Thanks Monty, I think there are some very interesting ideas in here.

I'm particularly glad to see the 'big tent' camp reasserting itself,because I have no sympathy with anyone who wants to join the OpenStackcommunity and then bolt the door behind them. Anyone who contributes toa project that is related to OpenStack's goals, is willing to do thingsthe OpenStack way, and submits itself to the scrutiny of the TC deservesto be treated as a member of our community with voting rights, entry tothe Design Summit and so on.

I'm curious how you're suggesting we decide which projects satisfy thosecriteria though. Up until now, we've done it through the incubationprocess (or technically, the new program approval process... but inpractice we've never added a project that was targeted for eventualinclusion in the integrated release to a program without incubating it).Would the TC continue to judge whether a project is doing things theOpenStack way prior to inclusion, or would we let projects self-certify?What does it mean for a project to submit itself to TC scrutiny if itknows that realistically the TC will never have time to actuallyscrutinise it? Or are you not suggesting a change to the currentincubation process, just a willingness to incubate multiple projects inthe same problem space?

I feel like I need to play devil's advocate here, because overall I'mjust not sure I understand the purpose of arbitrarily - and it *is*arbitrary - declaring "Layer #1" to be anything required to runWordpress. To anyone whose goal is not to run Wordpress, how is thatrelevant?


Speaking of arbitrary, I had to laugh a little at this bit:

Also, please someone notice that the above is too many steps andshould be:

openstack boot gentoo on-a 2G-VM with-a publicIP with-a 10G-volumecall-it blog.inaugust.com

That's kinda sorta exactly what Heat does ;) Minus the part aboutassuming there is only one kind of application, obviously.

I think there are a number of unjustified assumptions behind thisarrangement of things. I'm going to list some here, but I don't wantanyone to interpret this as a personal criticism of Monty. The point isthat we all suffer from biases - not for any questionable reasons butpurely as a result of our own experiences, who we spend our time talkingto and what we spend our time thinking about - and therefore we shouldall be extremely circumspect about trying to bake our own mental modelsof what OpenStack should be into the organisational structure of theproject itself.


* Assumption #1: The purpose of OpenStack is to provide a Compute cloud

This assumption is front-and-centre throughout everything Monty wrote.Yet this wasn't how the OpenStack project started. In fact there are nowat least three services - Swift, Nova, Zaqar - that could each makesense as the core of a standalone product.

Yes, it's true that Nova effectively depends on Glance and Neutron (andeverything depends on Keystone). We should definitely document thatsomewhere. But why does it make Nova special?


* Assumption #2: Yawnoc's Law

Don't bother Googling that, I just made it up. It's the reverse ofConway's Law:


  Infra engineers who design governance structures for OpenStack are
  constrained to produce designs that are copies of the structure of
  Tempest.

I just don't understand why that needs to be the case. Currently, forunderstandable historic reasons, every project gates against every otherproject. That makes no sense any more, completely independently of theproject governance structure. We should just change it! There is noorganisational obstacle to changing how gating works.

Even this proposal doesn't entirely make sense on this front - e.g.Designate requires only Neutron and Keystone... why should Nova, Glanceand every other project in "Layer 1" gate against it, and vice-versa?

I suggested in another thread[1] a model where each project wouldpublish a set of tests, each project would decide which sets of tests topull in and gate on, and Tempest would just be a shell for setting upthe environment and running the selected tests. Maybe that idea is crazyor at least needs more work (it certainly met with only crickets andtumbleweeds on the mailing list), but implementing it wouldn't requireTC intervention and certainly not by-laws changes. It just requires...implementing it.

Perhaps the idea here is that by designating "Layer 1" the TC isindicating to projects which other projects they should accept gate testjobs from (a function previously fulfilled by Incubation). I'd arguethat this is a very bad way to do it, because (a) it says nothing toprojects outside of "Layer 1" how they should decide, and (b) it jumpsstraight to the TC mandating the result without even letting theprojects try to sort it out amongst themselves.

For example, I would actually prefer that Nova not gate against Heatbecause Nova is pretty unlikely to break us and the trade-off of puttingus in a position to accidentally break them is not worth it. No edictfrom the TC required. On the other hand, I would push very strongly forall of the python-*client libraries to gate against both Heat andHorizon, because they can easily break us - and if they break us,they're probably breaking other users out there too, so I'm confident Icould convince people that this would be mutually beneficial. (It couldpotentially even extend so far as running the unit tests of Heat andHorizon in the client gates, to avoid issues like [2].)

[1]http://lists.openstack.org/pipermail/openstack-dev/2014-September/045446.html[2]http://lists.openstack.org/pipermail/openstack-dev/2014-September/046686.html


* Assumption #3: The world is static

This is a giant red flag:

  "the set of things in Layer #1 should never change -- unless we
   refactor something already in Layer #1 into a new project."

There is no greater act of hubris than to stick a stake in the groundand declare that "we will never know more than we do at this moment;we'll only get dumber from here, so we must precommit to all of ourfuture decisions based on the information we have at present".

What if, for example, Nova wanted to add a dependency on Zaqar? They'dbe prevented from doing so because Zaqar is not used by Wordpress. Howis that relevant? A rigid ban on dependencies is a death knell forinnovation.

Can you really never imagine a time where it might be better to runWordpress on a container service rather than a full-fledged VM? I guessthat's OK but only as long as it starts in Nova and then gets split out?Because... nova-core don't have enough to do?

And none of this is any help at all to projects outside of "Layer 1",because they get no guidance at all on what makes sense to depend on.This is already hurting with our current system (for example, Mistral isimplementing a bunch of notification stuff that should properly bedelegated to Zaqar, and in fact as of 6 months ago it was thecentrepiece of the design), and the TC abdicating all interest in thesubject will make it even worse.


* Assumption #4: The sky is falling

From reading openstack-dev, it's pretty clear that both the QA and Novaprograms are facing a scaling crisis of sorts. It's easy to see whyanybody deeply involved with either or both of those two would indeedthink that radical change is required. I'm not sure, however, that thesame sense of crisis pervades all of the other projects. We all have alot of work to do, but I suspect that most projects would say that theyare trucking along nicely. Meanwhile, the proposal is to change prettymuch everything about how OpenStack is organised *except* QA and Nova(in fact, it creates incentives to stick even more stuff inside Nova),which remain sacrosanct. That doesn't seem like attacking the problem atits source.

So we've identified the minimum set of OpenStack services required tosensibly run Wordpress. Awesome! Somebody should totally write a blogpost about that. But officially and permanently baking that in as thestructure of the OpenStack project? I hate to use the c-word, but thebottom line is that "Layer 1" just resurrects Core with a pretext tofinally kick Swift out. That seems particularly ironic, because I wouldpay good money to be a fly on the wall in a board meeting where anyonebut Monty proposed such a thing in those terms, just to watch hisreaction. Given that the TC informed the DefCore committee that itregarded everything that has graduated to the integrated release as the"designated sections" for DefCore purposes and told them to go do theirown dirty work, you can bet your last dollar that this will beinterpreted as a TC endorsement for permanently excluding Swift - andall the other non-"Layer 1" projects - from the designated sections. Infact, by removing only those tests from Tempest it's likely to have theside-effect of eliminating them from RefStack altogether.

Let's sum up, first by looking at a list of questions that developers,distributors, operators and users might ask about a project:


1) Are they "one of us"?
2) Should I gate against it?
3) Can I add a dependency on it?
4) Should this be widely distributed as part of OpenStack?
5) Can I use this knowing that the API will be somewhat stable?
6) Should this be used at scale in production?


Here's how the TC is answering those questions at the moment:

1) New program acceptance + incubation or adoption processes
2) Incubation process
3) Graduation process
4) Graduation process
5) Graduation process
6) You're on your own

Here's Monty's answers:

1) ???
2) No
3) No
4) You're on your own
5) You're on your own?
6) "CERN test"

Both of those feel unsatisfactory in different ways. Monty's suggestionsseem like an overly radical change to me; I would like to try somethinga bit more incremental to give us the chance to see how the communityadapts:


1) Incubation process (much lower bar)
2) Do your own cost/benefit analysis
3) Graduation process
4) Graduation process (maintain high bar, but less capricious)
5) Graduation process
6) TC/UC production-readiness review

Finally, since the motivation for change is that we think the currentstructure isn't scaling, let's examine the individual things that arecurrently pain points:


* Continuous Integration

We all agree that the gate doesn't scale. I submit that it doesn't scalebecause it tests every project against every other project, and thatkicking projects out of the gate not only fails to solve the problem inthe long term (since the projects that _are_ in will continue to grow),but also ignores the actual risks that the gate is meant to guardagainst in favour of an arbitrary designation.

We should scale the gate by only gating projects against other projectswhere the benefit in reduced risk outweighs the cost in increased riskof false negatives. For projects that don't depend on each other at all,the benefit is precisely zero (beyond the install-only gate suggested byMonty, which I support). We should apply the same cost-benefitcalculation regardless of how involved the projects in question are withrunning Wordpress, and we should let projects themselves decide what togate against in the first instance, with the TC only stepping in in theevent that consensus can't be reached by other means.


* Documentation

This is a tricky one, and not an area of OpenStack that I am an experton. It does seem to me that the only real solution is to make projectsmore responsible for their own documentation. Arbitrarily splittingprojects into a category where they're not responsible at all and acategory where they're completely on their own doesn't seem like a goodsolution.


* Release Management

This is something we have not really even attempted to scale beyondThierry. As a first step, there is no real organisational obstacle tohaving a different release manager for incubated projects than forintegrated projects, it's more a matter of making it known to either theFoundation or the various companies who employ contributors that we needone. I don't want to make that process sound trivial, but I'm confidentthat the release management program could handle it, and I think weshould at least give them a chance to try before pre-emptively kickinganything non-Wordpress-related out of the release forever.


* Technical Committee

It is inevitable that we will reach a point where the TechnicalCommittee itself does not scale. I'm surprised, because I thought thatwas a ways off, but after watching the latest Zaqar fiasco I think wehave to consider the possibility that we have reached that point already.

Perhaps we should consider having subcommittees, maybe based on thegroupings identified by John (Dickinson), possibly comprised of therelevant PTLs plus a representative of the TC. These subcommittees woulddo the legwork of investigating new projects making their way throughthe incubation/graduation process and report summaries andrecommendations to the TC.


cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model

Reply via email to