On 18/09/14 14:53, Monty Taylor wrote:
Hey all,

I've recently been thinking a lot about Sean's Layers stuff. So I wrote
a blog post which Jim Blair and Devananda were kind enough to help me edit.


Thanks Monty, I think there are some very interesting ideas in here.

I'm particularly glad to see the 'big tent' camp reasserting itself, because I have no sympathy with anyone who wants to join the OpenStack community and then bolt the door behind them. Anyone who contributes to a project that is related to OpenStack's goals, is willing to do things the OpenStack way, and submits itself to the scrutiny of the TC deserves to be treated as a member of our community with voting rights, entry to the Design Summit and so on.

I'm curious how you're suggesting we decide which projects satisfy those criteria though. Up until now, we've done it through the incubation process (or technically, the new program approval process... but in practice we've never added a project that was targeted for eventual inclusion in the integrated release to a program without incubating it). Would the TC continue to judge whether a project is doing things the OpenStack way prior to inclusion, or would we let projects self-certify? What does it mean for a project to submit itself to TC scrutiny if it knows that realistically the TC will never have time to actually scrutinise it? Or are you not suggesting a change to the current incubation process, just a willingness to incubate multiple projects in the same problem space?

I feel like I need to play devil's advocate here, because overall I'm just not sure I understand the purpose of arbitrarily - and it *is* arbitrary - declaring "Layer #1" to be anything required to run Wordpress. To anyone whose goal is not to run Wordpress, how is that relevant?

Speaking of arbitrary, I had to laugh a little at this bit:

Also, please someone notice that the above is too many steps and should be:

openstack boot gentoo on-a 2G-VM with-a publicIP with-a 10G-volume call-it blog.inaugust.com

That's kinda sorta exactly what Heat does ;) Minus the part about assuming there is only one kind of application, obviously.

I think there are a number of unjustified assumptions behind this arrangement of things. I'm going to list some here, but I don't want anyone to interpret this as a personal criticism of Monty. The point is that we all suffer from biases - not for any questionable reasons but purely as a result of our own experiences, who we spend our time talking to and what we spend our time thinking about - and therefore we should all be extremely circumspect about trying to bake our own mental models of what OpenStack should be into the organisational structure of the project itself.

* Assumption #1: The purpose of OpenStack is to provide a Compute cloud

This assumption is front-and-centre throughout everything Monty wrote. Yet this wasn't how the OpenStack project started. In fact there are now at least three services - Swift, Nova, Zaqar - that could each make sense as the core of a standalone product.

Yes, it's true that Nova effectively depends on Glance and Neutron (and everything depends on Keystone). We should definitely document that somewhere. But why does it make Nova special?

* Assumption #2: Yawnoc's Law

Don't bother Googling that, I just made it up. It's the reverse of Conway's Law:

  Infra engineers who design governance structures for OpenStack are
  constrained to produce designs that are copies of the structure of

I just don't understand why that needs to be the case. Currently, for understandable historic reasons, every project gates against every other project. That makes no sense any more, completely independently of the project governance structure. We should just change it! There is no organisational obstacle to changing how gating works.

Even this proposal doesn't entirely make sense on this front - e.g. Designate requires only Neutron and Keystone... why should Nova, Glance and every other project in "Layer 1" gate against it, and vice-versa?

I suggested in another thread[1] a model where each project would publish a set of tests, each project would decide which sets of tests to pull in and gate on, and Tempest would just be a shell for setting up the environment and running the selected tests. Maybe that idea is crazy or at least needs more work (it certainly met with only crickets and tumbleweeds on the mailing list), but implementing it wouldn't require TC intervention and certainly not by-laws changes. It just requires... implementing it.

Perhaps the idea here is that by designating "Layer 1" the TC is indicating to projects which other projects they should accept gate test jobs from (a function previously fulfilled by Incubation). I'd argue that this is a very bad way to do it, because (a) it says nothing to projects outside of "Layer 1" how they should decide, and (b) it jumps straight to the TC mandating the result without even letting the projects try to sort it out amongst themselves.

For example, I would actually prefer that Nova not gate against Heat because Nova is pretty unlikely to break us and the trade-off of putting us in a position to accidentally break them is not worth it. No edict from the TC required. On the other hand, I would push very strongly for all of the python-*client libraries to gate against both Heat and Horizon, because they can easily break us - and if they break us, they're probably breaking other users out there too, so I'm confident I could convince people that this would be mutually beneficial. (It could potentially even extend so far as running the unit tests of Heat and Horizon in the client gates, to avoid issues like [2].)

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-September/045446.html [2] http://lists.openstack.org/pipermail/openstack-dev/2014-September/046686.html

* Assumption #3: The world is static

This is a giant red flag:

  "the set of things in Layer #1 should never change -- unless we
   refactor something already in Layer #1 into a new project."

There is no greater act of hubris than to stick a stake in the ground and declare that "we will never know more than we do at this moment; we'll only get dumber from here, so we must precommit to all of our future decisions based on the information we have at present".

What if, for example, Nova wanted to add a dependency on Zaqar? They'd be prevented from doing so because Zaqar is not used by Wordpress. How is that relevant? A rigid ban on dependencies is a death knell for innovation.

Can you really never imagine a time where it might be better to run Wordpress on a container service rather than a full-fledged VM? I guess that's OK but only as long as it starts in Nova and then gets split out? Because... nova-core don't have enough to do?

And none of this is any help at all to projects outside of "Layer 1", because they get no guidance at all on what makes sense to depend on. This is already hurting with our current system (for example, Mistral is implementing a bunch of notification stuff that should properly be delegated to Zaqar, and in fact as of 6 months ago it was the centrepiece of the design), and the TC abdicating all interest in the subject will make it even worse.

* Assumption #4: The sky is falling

From reading openstack-dev, it's pretty clear that both the QA and Nova programs are facing a scaling crisis of sorts. It's easy to see why anybody deeply involved with either or both of those two would indeed think that radical change is required. I'm not sure, however, that the same sense of crisis pervades all of the other projects. We all have a lot of work to do, but I suspect that most projects would say that they are trucking along nicely. Meanwhile, the proposal is to change pretty much everything about how OpenStack is organised *except* QA and Nova (in fact, it creates incentives to stick even more stuff inside Nova), which remain sacrosanct. That doesn't seem like attacking the problem at its source.

So we've identified the minimum set of OpenStack services required to sensibly run Wordpress. Awesome! Somebody should totally write a blog post about that. But officially and permanently baking that in as the structure of the OpenStack project? I hate to use the c-word, but the bottom line is that "Layer 1" just resurrects Core with a pretext to finally kick Swift out. That seems particularly ironic, because I would pay good money to be a fly on the wall in a board meeting where anyone but Monty proposed such a thing in those terms, just to watch his reaction. Given that the TC informed the DefCore committee that it regarded everything that has graduated to the integrated release as the "designated sections" for DefCore purposes and told them to go do their own dirty work, you can bet your last dollar that this will be interpreted as a TC endorsement for permanently excluding Swift - and all the other non-"Layer 1" projects - from the designated sections. In fact, by removing only those tests from Tempest it's likely to have the side-effect of eliminating them from RefStack altogether.

Let's sum up, first by looking at a list of questions that developers, distributors, operators and users might ask about a project:

1) Are they "one of us"?
2) Should I gate against it?
3) Can I add a dependency on it?
4) Should this be widely distributed as part of OpenStack?
5) Can I use this knowing that the API will be somewhat stable?
6) Should this be used at scale in production?

Here's how the TC is answering those questions at the moment:

1) New program acceptance + incubation or adoption processes
2) Incubation process
3) Graduation process
4) Graduation process
5) Graduation process
6) You're on your own

Here's Monty's answers:

1) ???
2) No
3) No
4) You're on your own
5) You're on your own?
6) "CERN test"

Both of those feel unsatisfactory in different ways. Monty's suggestions seem like an overly radical change to me; I would like to try something a bit more incremental to give us the chance to see how the community adapts:

1) Incubation process (much lower bar)
2) Do your own cost/benefit analysis
3) Graduation process
4) Graduation process (maintain high bar, but less capricious)
5) Graduation process
6) TC/UC production-readiness review

Finally, since the motivation for change is that we think the current structure isn't scaling, let's examine the individual things that are currently pain points:

* Continuous Integration

We all agree that the gate doesn't scale. I submit that it doesn't scale because it tests every project against every other project, and that kicking projects out of the gate not only fails to solve the problem in the long term (since the projects that _are_ in will continue to grow), but also ignores the actual risks that the gate is meant to guard against in favour of an arbitrary designation.

We should scale the gate by only gating projects against other projects where the benefit in reduced risk outweighs the cost in increased risk of false negatives. For projects that don't depend on each other at all, the benefit is precisely zero (beyond the install-only gate suggested by Monty, which I support). We should apply the same cost-benefit calculation regardless of how involved the projects in question are with running Wordpress, and we should let projects themselves decide what to gate against in the first instance, with the TC only stepping in in the event that consensus can't be reached by other means.

* Documentation

This is a tricky one, and not an area of OpenStack that I am an expert on. It does seem to me that the only real solution is to make projects more responsible for their own documentation. Arbitrarily splitting projects into a category where they're not responsible at all and a category where they're completely on their own doesn't seem like a good solution.

* Release Management

This is something we have not really even attempted to scale beyond Thierry. As a first step, there is no real organisational obstacle to having a different release manager for incubated projects than for integrated projects, it's more a matter of making it known to either the Foundation or the various companies who employ contributors that we need one. I don't want to make that process sound trivial, but I'm confident that the release management program could handle it, and I think we should at least give them a chance to try before pre-emptively kicking anything non-Wordpress-related out of the release forever.

* Technical Committee

It is inevitable that we will reach a point where the Technical Committee itself does not scale. I'm surprised, because I thought that was a ways off, but after watching the latest Zaqar fiasco I think we have to consider the possibility that we have reached that point already.

Perhaps we should consider having subcommittees, maybe based on the groupings identified by John (Dickinson), possibly comprised of the relevant PTLs plus a representative of the TC. These subcommittees would do the legwork of investigating new projects making their way through the incubation/graduation process and report summaries and recommendations to the TC.


OpenStack-dev mailing list

Reply via email to