Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Zane Bitter Thu, 11 Sep 2014 13:11:21 -0700

On 04/09/14 08:14, Sean Dague wrote:


I've been one of the consistent voices concerned about a hard
requirement on adding NoSQL into the mix. So I'll explain that thinking
a bit more.

I feel like when the TC makes an integration decision previously this
has been about evaluating the project applying for integration, and if
they met some specific criteria they were told about some time in the
past. I think that's the wrong approach. It's a locally optimized
approach that fails to ask the more interesting question.

Is OpenStack better as a whole if this is a mandatory component of
OpenStack? Better being defined as technically better (more features,
less janky code work arounds, less unexpected behavior from the stack).
Better from the sense of easier or harder to run an actual cloud by our
Operators (taking into account what kinds of moving parts they are now
expected to manage). Better from the sense of a better user experience
in interacting with OpenStack as whole. Better from a sense that the
OpenStack release will experience less bugs, less unexpected cross
project interactions, an a greater overall feel of consistency so that
the OpenStack API feels like one thing.

https://dague.net/2014/08/26/openstack-as-layers/

I don't want to get off-topic here, but I want to state before thisbecomes the de-facto starting point for a layering discussion that Idon't accept this model at all. It is not based on any analysiswhatsoever but appears to be entirely arbitrary - a collection ofindividual prejudices arranged visually.

On a hopefully more constructive note, I believe there are at least twoanalyses that _would_ produce interesting data here:

1) Examine the dependencies, both hard and optional, between projectsand enumerate the things you lose when ignoring each optional one.2) Analyse projects based on the type of user consuming the service -e.g. Nova is mostly used (directly or indirectly via e.g. Heat and/orHorizon) by actual, corporeal persons, while Zaqar is used by bothpersons (to set up queues) and services (which actually send and receivemessages) - of both OpenStack and applications. I believe, BTW that thisanalysis will uncover a lot of missing features in Keystone[1].

What you can _not_ produce is a linear model of the different types ofclouds for different use cases, because different organisations havewildly differing needs.

One of the interesting qualities of Layers 1 & 2 is they all follow an
AMQP + RDBMS pattern (excepting swift). You can have a very effective
IaaS out of that stack. They are the things that you can provide pretty
solid integration testing on (and if you look at where everything stood
before the new TC mandates on testing / upgrade that was basically what
was getting integration tested). (Also note, I'll accept Barbican is
probably in the wrong layer, and should be a Layer 2 service.)

Swift is the current exception here, but one could argue, and peoplehave[2], that Swift is also the only project that actually conforms toour stated design tenets for OpenStack. I'd struggle to tell the Zaqarfolks they've done the Wrong Thing... especially when abandoning theRDBMS driver was done largely at the direction of the TC iirc.

Speaking of Swift, I would really love to see it investigated as apotential storage backend for Zaqar. If it proves to have the rightguarantees (and durability is the crucial one, so it sounds promising)then that has the potential to smooth over a lot of the deployment problem.

While large shops can afford to have a dedicated team to figure out how
to make mongo or redis HA, provide monitoring, have a DR plan for when a
huricane requires them to flip datacenters, that basically means
OpenStack heads further down the path of "only for the big folks". I
don't want OpenStack to be only for the big folks, I want OpenStack to
be for all sized folks. I really do want to have all the local small
colleges around here have OpenStack clouds, because it's something that
people believe they can do and manage. I know the people that work in
this places, they all come out to the LUG I run. We've talked about
this. OpenStack is basically seen as too complex for them to use as it
stands, and that pains me a ton.


This is a great point, and one that we definitely have to keep in mind.

It's also worth noting that small organisations also get the mostbenefit. Rather than having to stand up a cluster of reliable messagebrokers (large organisations are much more likely to need this kind offlexibility anyway) - potentially one cluster per application - they canhave their IT department deploy e.g. a single Redis cluster and havemessaging handled for every application in their cloud with all thebenefits of multitenancy.

Part of the move to the cloud is inevitably going to mean organisationalchanges in a lot of places, where the operations experts willincreasingly focus on maintaining the cloud itself, rather than theapplications running in it. We need to be wary of producing a productwith a major impedance mismatch to the organisations that will use it,but we should also remember that we are not doing this in a vacuum.Change is coming whether anyone likes it or not; the big question is ifwe'll get a foot in the door or if everything will switch over toproprietary clouds.

Vish brought up an interesting idea at the TC meeting a couple of weeksback, of having "components" that could be deployed by users instead of_needing_ operators to do it (though on bigger clouds they likelywould). To some extent this is already possible for things like Trove -for example, you can write a Heat template containing a Nova serverrunning MySQL. On a small local cloud, you can pass an environment filethat maps the OS::Trove::Instance resource type to this template, sothat you get a MySQL server that you administer yourself. Then, we youmove to a bigger cloud you launch the same template without theenvironment mapping and automatically get the managed Trove service withno changes. (Murano developers will be showing up shortly to tell youthat they can make it even easier.) Unfortunately, this model doesn'twork so well for something like Zaqar, where it needs to scale at a veryfine granularity. Maybe it could be done (Zaqar can run standalone, Ibelieve) if you're willing to give up multitenancy and run one copy fora number of applications... but at that point it's easier to run it aspart of the cloud. If we had a Docker driver in Nova - or, preferably, aNova-like Container API - then I can imagine this concept having morelegs. It would still be expensive in the messaging case because of thedurability requirements though. Something to think about.

So I think Zaqar is good software, and really useful part of our
ecosystem, but this added step function burden of a 3rd class of support
software that has to be maintained... seems like it takes us further
away from OpenStack at a small scale. If we were thinking about Zaqar as
a thing that we could replace olso.messaging with, that becomes
interesting in a different way, because we could instead of having 3
classes of support software, remain at 2, just take a sideways shift on
one of them. But that's not actually the path we are on.

So, honestly, I'll probably remain -1 on the final integration vote, not
because Zaqar is bad, but because I'm feeling more firmly that for
OpenStack to not leave the small deployers behind we need to redefine
the tightly integrated piece of OpenStack to basically the Layer 1 & 2
parts of my diagram, and consider the rest of the layers exciting parts
of our ecosystem that more advanced users may choose to deploy to meet
their needs. Smaller tent, big ecosystem, easier on ramp.

Let's assume for a moment that I agree with the 'small tent' concept anddon't find it in any way appalling.

I would argue that Marconi belongs very close to the centre of even thesmallest of pup tents. Just behind Nova, but well ahead of, say,Neutron. Let's not forget that SQS actually pre-dates(!) EC2 by two years.

I'll give you an example of one use case we have for it in Heat.Ceilometer generates alarms that trigger autoscaling events in Heat. Wecould easily have Ceilometer simply call a Heat API endpoint with somedata, but that's actually extremely limiting for the user. What if theuser wants a particular alarm to cause a scaling event on the secondTuesday after a full moon and BTW Nagios is going bat****? We have somebasic signal conditioning in Heat, but we don't want to turn it into aTuring-complete programming language or anything, and we don't want theuser to have to give up on using data from Ceilometer altogether as soonas things get complicated. The best solution available for now, and theone that was implemented, is to make it a webhook - the user can eitherpass the webhook URL supplied by Heat to Ceilometer, or they can use itthemselves and pass their own webhook to Ceilometer to do their ownconditioning in between.


I submit that this is a horrible solution.

It's horrible because it can potentially turn Ceilometer into an enginefor launching DOS attacks at arbitrary servers. (Operators themselvesare actually the most vulnerable, though, because it comes from insidetheir control plane network. They have to be aware not to trust outgoingconnections from the machine running Ceilometer.) It's horrible becauseit requires users to make the endpoint for the signal conditioner public(effective outsourcing security from the operator, who need onlyimplement SSL and Keystone, to the user who is much more likely to getit wrong). It's horrible because this operation should have thesemantics of a queue, but when it fails the choice is between losing thealarm or effectively reimplementing Zaqar inside Ceilometer.

All of those problems, and more, would be solved by using Zaqar instead,but we couldn't use it at the time because it didn't exist yet. And thisuse case is just the beginning! I (genuinely) lost count somewhere inthe double digits of the number of new features facing the same dilemmathat we have users and developers chomping at the bit for in Heat. Iannounced elsewhere that we decided to just go ahead and implement themusing Zaqar because it's just so much work trying to hold developers andtheir webhook hacks at bay while we wait for Zaqar to graduate.

There's so much more than just Heat too. We have a "Dashboard" thatisn't able to tell users about stuff that happens to theirinfrastructure for want of asynchronous notifications. That's nuts! Thisis a critical, fundamental part of a cloud that I would certainlyarbitrarily place in layer 2, if not layer 1, of your diagram. Lots ofcore stuff needs to depend on it, and keeping it outside the tent makesabout as much sense to me as keeping, say, Glance outside the tent.

Or, in other words, Zaqar will give us "more features, less janky codework arounds, [and] less unexpected behavior from the stack".

Finally, of course, this is without even considering the *main* usecases for Zaqar. As pointed out elsewhere in this thread, there is anentire class of applications - the most popular class of applications -where substantially every new one written will need something like this.So we have to balance the number of organisations who will be turned offhaving their own OpenStack because operating it is too complicated (aproblem not created by Zaqar, as you note above) with the number whowon't even consider it for lack of demand because all their users arecomfortably locked in to AWS, which has had this functionality for nighon 10 years now.

I realize that largely means Zaqar would be caught up in a definition
discussion outside of it's control, and that's kind of unfortunate, as
Flavio and team have been doing a bang up job of late. But we need to
stop considering "integration" as the end game of all interesting
software in the OpenStack ecosystem, and I think it's better to have
that conversation sooner rather than later.

I think it's clear that we need to have that conversation again, but inthe specific case of Zaqar I believe it is such a fundamental buildingblock that the question of how many building blocks are allowed insidethe tent (sorry) is not relevant to the question at hand.


cheers,
Zane.

[1]http://lists.openstack.org/pipermail/openstack-dev/2014-August/043871.html

[2] http://blog.linux2go.dk/2013/08/30/openstack-design-tenets-part-2/

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Reply via email to