On 04/09/14 08:14, Sean Dague wrote:

I've been one of the consistent voices concerned about a hard
requirement on adding NoSQL into the mix. So I'll explain that thinking
a bit more.

I feel like when the TC makes an integration decision previously this
has been about evaluating the project applying for integration, and if
they met some specific criteria they were told about some time in the
past. I think that's the wrong approach. It's a locally optimized
approach that fails to ask the more interesting question.

Is OpenStack better as a whole if this is a mandatory component of
OpenStack? Better being defined as technically better (more features,
less janky code work arounds, less unexpected behavior from the stack).
Better from the sense of easier or harder to run an actual cloud by our
Operators (taking into account what kinds of moving parts they are now
expected to manage). Better from the sense of a better user experience
in interacting with OpenStack as whole. Better from a sense that the
OpenStack release will experience less bugs, less unexpected cross
project interactions, an a greater overall feel of consistency so that
the OpenStack API feels like one thing.

https://dague.net/2014/08/26/openstack-as-layers/

I don't want to get off-topic here, but I want to state before this becomes the de-facto starting point for a layering discussion that I don't accept this model at all. It is not based on any analysis whatsoever but appears to be entirely arbitrary - a collection of individual prejudices arranged visually.

On a hopefully more constructive note, I believe there are at least two analyses that _would_ produce interesting data here:

1) Examine the dependencies, both hard and optional, between projects and enumerate the things you lose when ignoring each optional one. 2) Analyse projects based on the type of user consuming the service - e.g. Nova is mostly used (directly or indirectly via e.g. Heat and/or Horizon) by actual, corporeal persons, while Zaqar is used by both persons (to set up queues) and services (which actually send and receive messages) - of both OpenStack and applications. I believe, BTW that this analysis will uncover a lot of missing features in Keystone[1].

What you can _not_ produce is a linear model of the different types of clouds for different use cases, because different organisations have wildly differing needs.

One of the interesting qualities of Layers 1 & 2 is they all follow an
AMQP + RDBMS pattern (excepting swift). You can have a very effective
IaaS out of that stack. They are the things that you can provide pretty
solid integration testing on (and if you look at where everything stood
before the new TC mandates on testing / upgrade that was basically what
was getting integration tested). (Also note, I'll accept Barbican is
probably in the wrong layer, and should be a Layer 2 service.)

Swift is the current exception here, but one could argue, and people have[2], that Swift is also the only project that actually conforms to our stated design tenets for OpenStack. I'd struggle to tell the Zaqar folks they've done the Wrong Thing... especially when abandoning the RDBMS driver was done largely at the direction of the TC iirc.

Speaking of Swift, I would really love to see it investigated as a potential storage backend for Zaqar. If it proves to have the right guarantees (and durability is the crucial one, so it sounds promising) then that has the potential to smooth over a lot of the deployment problem.

While large shops can afford to have a dedicated team to figure out how
to make mongo or redis HA, provide monitoring, have a DR plan for when a
huricane requires them to flip datacenters, that basically means
OpenStack heads further down the path of "only for the big folks". I
don't want OpenStack to be only for the big folks, I want OpenStack to
be for all sized folks. I really do want to have all the local small
colleges around here have OpenStack clouds, because it's something that
people believe they can do and manage. I know the people that work in
this places, they all come out to the LUG I run. We've talked about
this. OpenStack is basically seen as too complex for them to use as it
stands, and that pains me a ton.

This is a great point, and one that we definitely have to keep in mind.

It's also worth noting that small organisations also get the most benefit. Rather than having to stand up a cluster of reliable message brokers (large organisations are much more likely to need this kind of flexibility anyway) - potentially one cluster per application - they can have their IT department deploy e.g. a single Redis cluster and have messaging handled for every application in their cloud with all the benefits of multitenancy.

Part of the move to the cloud is inevitably going to mean organisational changes in a lot of places, where the operations experts will increasingly focus on maintaining the cloud itself, rather than the applications running in it. We need to be wary of producing a product with a major impedance mismatch to the organisations that will use it, but we should also remember that we are not doing this in a vacuum. Change is coming whether anyone likes it or not; the big question is if we'll get a foot in the door or if everything will switch over to proprietary clouds.

Vish brought up an interesting idea at the TC meeting a couple of weeks back, of having "components" that could be deployed by users instead of _needing_ operators to do it (though on bigger clouds they likely would). To some extent this is already possible for things like Trove - for example, you can write a Heat template containing a Nova server running MySQL. On a small local cloud, you can pass an environment file that maps the OS::Trove::Instance resource type to this template, so that you get a MySQL server that you administer yourself. Then, we you move to a bigger cloud you launch the same template without the environment mapping and automatically get the managed Trove service with no changes. (Murano developers will be showing up shortly to tell you that they can make it even easier.) Unfortunately, this model doesn't work so well for something like Zaqar, where it needs to scale at a very fine granularity. Maybe it could be done (Zaqar can run standalone, I believe) if you're willing to give up multitenancy and run one copy for a number of applications... but at that point it's easier to run it as part of the cloud. If we had a Docker driver in Nova - or, preferably, a Nova-like Container API - then I can imagine this concept having more legs. It would still be expensive in the messaging case because of the durability requirements though. Something to think about.

So I think Zaqar is good software, and really useful part of our
ecosystem, but this added step function burden of a 3rd class of support
software that has to be maintained... seems like it takes us further
away from OpenStack at a small scale. If we were thinking about Zaqar as
a thing that we could replace olso.messaging with, that becomes
interesting in a different way, because we could instead of having 3
classes of support software, remain at 2, just take a sideways shift on
one of them. But that's not actually the path we are on.

So, honestly, I'll probably remain -1 on the final integration vote, not
because Zaqar is bad, but because I'm feeling more firmly that for
OpenStack to not leave the small deployers behind we need to redefine
the tightly integrated piece of OpenStack to basically the Layer 1 & 2
parts of my diagram, and consider the rest of the layers exciting parts
of our ecosystem that more advanced users may choose to deploy to meet
their needs. Smaller tent, big ecosystem, easier on ramp.

Let's assume for a moment that I agree with the 'small tent' concept and don't find it in any way appalling.

I would argue that Marconi belongs very close to the centre of even the smallest of pup tents. Just behind Nova, but well ahead of, say, Neutron. Let's not forget that SQS actually pre-dates(!) EC2 by two years.

I'll give you an example of one use case we have for it in Heat. Ceilometer generates alarms that trigger autoscaling events in Heat. We could easily have Ceilometer simply call a Heat API endpoint with some data, but that's actually extremely limiting for the user. What if the user wants a particular alarm to cause a scaling event on the second Tuesday after a full moon and BTW Nagios is going bat****? We have some basic signal conditioning in Heat, but we don't want to turn it into a Turing-complete programming language or anything, and we don't want the user to have to give up on using data from Ceilometer altogether as soon as things get complicated. The best solution available for now, and the one that was implemented, is to make it a webhook - the user can either pass the webhook URL supplied by Heat to Ceilometer, or they can use it themselves and pass their own webhook to Ceilometer to do their own conditioning in between.

I submit that this is a horrible solution.

It's horrible because it can potentially turn Ceilometer into an engine for launching DOS attacks at arbitrary servers. (Operators themselves are actually the most vulnerable, though, because it comes from inside their control plane network. They have to be aware not to trust outgoing connections from the machine running Ceilometer.) It's horrible because it requires users to make the endpoint for the signal conditioner public (effective outsourcing security from the operator, who need only implement SSL and Keystone, to the user who is much more likely to get it wrong). It's horrible because this operation should have the semantics of a queue, but when it fails the choice is between losing the alarm or effectively reimplementing Zaqar inside Ceilometer.

All of those problems, and more, would be solved by using Zaqar instead, but we couldn't use it at the time because it didn't exist yet. And this use case is just the beginning! I (genuinely) lost count somewhere in the double digits of the number of new features facing the same dilemma that we have users and developers chomping at the bit for in Heat. I announced elsewhere that we decided to just go ahead and implement them using Zaqar because it's just so much work trying to hold developers and their webhook hacks at bay while we wait for Zaqar to graduate.

There's so much more than just Heat too. We have a "Dashboard" that isn't able to tell users about stuff that happens to their infrastructure for want of asynchronous notifications. That's nuts! This is a critical, fundamental part of a cloud that I would certainly arbitrarily place in layer 2, if not layer 1, of your diagram. Lots of core stuff needs to depend on it, and keeping it outside the tent makes about as much sense to me as keeping, say, Glance outside the tent.

Or, in other words, Zaqar will give us "more features, less janky code work arounds, [and] less unexpected behavior from the stack".

Finally, of course, this is without even considering the *main* use cases for Zaqar. As pointed out elsewhere in this thread, there is an entire class of applications - the most popular class of applications - where substantially every new one written will need something like this. So we have to balance the number of organisations who will be turned off having their own OpenStack because operating it is too complicated (a problem not created by Zaqar, as you note above) with the number who won't even consider it for lack of demand because all their users are comfortably locked in to AWS, which has had this functionality for nigh on 10 years now.

I realize that largely means Zaqar would be caught up in a definition
discussion outside of it's control, and that's kind of unfortunate, as
Flavio and team have been doing a bang up job of late. But we need to
stop considering "integration" as the end game of all interesting
software in the OpenStack ecosystem, and I think it's better to have
that conversation sooner rather than later.

I think it's clear that we need to have that conversation again, but in the specific case of Zaqar I believe it is such a fundamental building block that the question of how many building blocks are allowed inside the tent (sorry) is not relevant to the question at hand.

cheers,
Zane.

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-August/043871.html
[2] http://blog.linux2go.dk/2013/08/30/openstack-design-tenets-part-2/

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to