Re: [openstack-dev] [tc] Active or passive role with our database layer

Octave J. Orgeron Tue, 23 May 2017 10:13:36 -0700

Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:

Hi all!
As the discussion around PostgreSQL has progressed, it has come clearto me that there is a decently deep philosophical question on which wedo not currently share either definition or agreement. I believe thatthe lack of clarity on this point is one of the things that makes thePostgreSQL conversation difficult.
I believe the question is between these two things:
* Should OpenStack assume the existence of an external databaseservice that it treat as an black-box on the other side of aconnection string?
* Should OpenStack take an active and/or opinionated role in managingthe database service?
A potentially obvious question about that (asked by Mike Bayer in adifferent thread) is: "what do you mean by managing?"
What I mean by managing is doing all of the things you can do relatedto database operational controls short of installing the software,writing the basic db config files to disk and stopping and startingthe services. It means being much more prescriptive about what typesof config we support, validating config settings that cannot beoverridden at runtime and refusing to operate if they are unworkable.

I think it's helpful and important for us to have automation toolinglike tripleo, puppet, etc. that can stand up a MySQL database. But wealso have to realize that as shops mature, they will deploy morecomplicated database topologies, clustered configurations, andreplication scenarios. So I think we shouldn't go overboard with beingprescriptive. We also have to realize that in the enterprise space,databases are usually deployed and managed by a separate database team,which means less control over that layer. So we shouldn't force peopleinto this model. We should provide best practice documentation, examples(tripleo, puppet, ansible, etc.), and leave it up to the operator.

Why would we want to be 'more active'? When managing and tuningdatabases, there are some things that are driven by the environmentand some things that are driven by the application.
Things that are driven by the environment include things like theamount of RAM actually available, whether or not the machines runningthe database are dedicated or shared, firewall settings, selinuxsettings and what versions of software are available.

This is a good example of an area that we should focus on documentingbest practices and leave it to the operator to implement. Guidelinesaround cpu, memory, security settings, tunables, etc. are what's neededhere. Today, there isn't really any guidance or best practices on evensizing the database(s) for a given deployment size.

Things that are driven by the application are things like characterset and collation, schema design, data types, schema upgrade and HAstrategies.


These are things that we can have a bit more control or direction on.

One might argue that HA strategies are an operator concern, but inreality the set of workable HA strategies is tightly constrained byhow the application works, and the pairing an application expectingone HA strategy with a deployment implementing a different one canhave negative results ranging from unexpected downtime to datacorruption.
For example: An HA strategy using slave promotion and a VIP thatpoints at the current write master paired with an applicationincorrectly configured to do such a thing can lead to writes to thewrong host after a failover event and an application that seems to berunning fine until the data turns up weird after a while.

This is definitely a more complicated area that becomes more and morespecific to the clustering technology being used. Galera vs. MySQLCluster is a good example. Galera has an active/passive architecturewhere the above issues become a concern for sure. While MySQL Cluster(NDB) is an active/active architecture, so losing a node only effectsany uncommitted transactions, that could easily be addressed with aretry. These topologies will become more complicated as people startlooking at cross regional replication and DR.

For the areas in which the characteristics of the database are tiedclosely to the application behavior, there is a constrained set ofvalid choices at the database level. Sometimes that constrained setonly has one member.
The approach to those is what I'm talking about when I ask thequestion about "external" or "active".
In the "external" approach, we document the expectations and thenwrite the code assuming that the database is set up appropriately. Wemay provide some helper tools, such as 'nova-manage db sync' anddocumentation on the sequence of steps the operator should take.
In the "active" approach, we still document expectations, but we alsovalidate them. If they are not what we expect but can be changed atruntime, we change them overriding conflicting environmental config,and if we can't, we hard-stop indicating an unsuitable environment.Rather than providing helper tools, we perform the steps neededourselves, in the order they need to be performed, ensuring that theyare done in the manner in which they need to be done.

This might be a trickier situation, especially if the database(s) are ina separate or dedicated environment that the OpenStack service processesdon't have access to. Of course for SQL commands, this isn't a problem.But changing the configuration files and restarting the database may bea harder thing to expect.

Some examples:

* Character Sets / Collations
We currently enforce at testing time that all database migrations areexplicit about InnoDB. We also validate in oslo.db that tablecharacter sets have the string 'utf8' in them. (only on MySQL) We donot have any check for case-sensitive or case-insensitive collations(these affect sorting and comparison operations) Because we don't,different server config settings or different database backends fordifferent clouds can actually behave differently through the REST API.
To deal with that:
First we'd have to decide whether case sensitive or case insensitivewas what we wanted. If we decided we wanted case sensitive, we couldadd an enforcement of that in oslo.db, and write migrations to getfrom case insensitive indexes to case sensitive indexes on tableswhere we detected that a case insensitive collation had been used. Ifwe decided we wanted to stick with case insensitive we could similarlyadd code to enforce it on MySQL. To enforce it actively onPostgresSQL, we'd need to either switch our code that's usingcomparisons to use the sqlalchemy case-insensitive versionsexplicitly, or maybe write some sort of overloaded driver for PG thatturns all comparisons into case-insensitive, which would wrap bothsides of comparisons in lower() calls (which has some indexingconcerns, but let's ignore that for the moment) We could also take the'external' approach and just document it, then define API tests andtry to tie the insensitive behavior in the API to Interop Compliance.I'm not 100% sure how a db operator would remediate this - but PG hassome fancy computed index features - so maybe it would be possible.

I think that abstraction with oslo.db would be the right path here. Butyou are also right that we need to have a consistent compliance policyat the API layer. We may fix things down at the DB level with oslo.db,but everything on top of that needs to also fall in-line. There is avery high chance that there are hard-coded workarounds or assumptions inthe services and apis today.

A similar issue lurks with the fact that MySQL unicode storage is3-byte by default and 4-byte is opt-in. We could take the 'external'approach and document it and assume the operator has configured theirmy.cnf with the appropriate default, or taken an 'active' approachwhere we override it in all the models and make migrations to get usfrom 3 to 4 byte.

I think an active approach on this would be ideal, just like the utf8and InnoDB settings are today. FYI, not all services are enforcing thesein a consistent manor today. Another example of something that should beabstracted at the oslo.db layer and get the human element out of the way.

* Schema Upgrades
The way you roll out online schema changes is highly dependent on yourdatabase architecture.
Just limiting to the MySQL world:
If you do Galera, you can do roll them out in Total Order or Rollingfashion. Total Order locks basically everything while it's happening,so isn't a candidate for "online". In rolling you apply the schemachange to one node at a time. If you do that, the application has tobe able to deal with both forms of the table, and you have to dealwith ensuring that data can replicate appropriately while the schemachange is happening.
If you do DRBD active/passive or a single-node deployment you onlyhave one upgrade operation to perform, but you will only lock certainthings - depending on what schema change operations you were performing.
If you do master/slave, you can roll out the schema change to yourslaves one at a time, wait for them all to catch up, then promote aslave taking the current master out of commission - update the oldmaster then then put it into the slave pool. Like Galera rolling, theapp needs to be able to handle old and new versions and thereplication stream needs to be able to replicate between the versions.
Making sure that the stream is able to replicate puts a set oflimitations on the types of schema changes you can perform, but it isan understandable constrained set.
In either approach the OpenStack service has to be able to talk toboth old and new versions of the schema. And in either approach weneed to make sure to limit the schema change operations to the setthat can be accomplished in an online fashion. We also have to becareful to not start writing values to new columns until all of thenodes have been updated, because the replication stream can'treplicate the new column value to nodes that don't have the new column.

This is another area where something like MySQL Cluster (NDB) wouldoperate differently because it's an active/active architecture. Solimiting the number of online changes while a table is locked across thecluster would be very important. There is also the timeouts for theapplications to consider, something that could be abstracted again withoslo.db.

In either approach we can decide to limit the number of architectureswe support for "online" upgrades.
In an 'external' approach, we make sure to do those things, we writedocumentation and we assume the database will be updatedappropriately. We can document that if the deployer chooses to doTotal Order on Galera, they will not have online upgrades. There willalso have to be a deployer step to let the services know that they canstart writing values to the new schema format once the upgrade iscomplete.
In an 'active' approach, we can notice that we have an updateavailable to run, and we can drive it from code. We can check forGalera, and if it's there we can run the upgrade in Rolling fashionone node at a time with no work needed on the part of the deployer.Since we're driving the upgrade, we know when it's done, so we cansignal ourselves to start using the new version. We'd obviously haveto pick the set of acceptable architectures we can handle consistentlyorchestrating.

This would be an interesting idea to expand to a autonomic orchestrationframework within the control plane to handle the database upgradesonline and the restarting of the dependent services in the correctorder. If we only focus on the database piece, it may not be asinteresting for operators.

* Versions
It's worth noting that behavior for schema updates and other thingschange over time with backend database version. We set minimumversions of other things, like libvirt and OVS - so we might also wantto set minimum versions for what we can support in the database. Thatway we can know for a given release of OpenStack what DDL operationsare safe to use for a rolling upgrade and what are not. That meansdetecting such a version and potentially refusing to perform anupgrade if the version isn't acceptable. That reduces the operator'sability to choose what version of the database software to run, butincreases our ability to be able to provide tooling and operationsthat we can be confident will work.

Validating the MySQL database version is a good idea. The features dochange over time. A good example is how in 5.7, you'll get warningsabout duplicate indexes being dropped in a future release which willdefinitely affect multiple services today.

== Summary ==
These are just a couple of examples - but I hope they're at leastmildly useful to explain some of the sorts of issues at hand - and whyI think we need to clarify what our intent is separate from the issueof what databases we "support".
Some operations have one and only one "right" way to be done. Forthose operations if we take an 'active' approach, we can implementthem once and not make all of our deployers and distributors eachimplement and run them. However, there is a cost to that. Automaticand prescriptive behavior has a higher dev cost that is proportionalto the number of supported architectures. This then implies a need tolimit deployer architecture choices.
On the other hand, taking an 'external' approach allows us to federatethe work of supporting the different architectures to the deployers.This means more work on the deployer's part, but also potentially agreater amount of freedom on their part to deploy supporting servicesthe way they want. It means that some of the things that have beenrequested of us - such as easier operation and an increase in thenumber of things that can be upgraded with no-downtime - might becomeprohibitively costly for us to implement.
I honestly think that both are acceptable choices we can make and thatfor any given topic there are middle grounds to be found at any givenmoment in time.
BUT - without a decision as to what our long-term philosophical intentin this space is that is clear and understandable to everyone, wecannot have successful discussions about the impact of implementationchoices, since we will not have a shared understanding of the problemspace or the solutions we're talking about.
For my part - I hear complaints that OpenStack is 'difficult' tooperate and requests for us to make it easier. This is why I have beenadvocating some actions that are clearly rooted in an 'active' worldview.
Finally, this is focused on the database layer but similar questionsarise in other places. What is our philosophy on prescriptive/activechoices on our part coupled with automated action and ease ofoperation vs. expanded choices for the deployer at the expense ofconfiguration and operational complexity. For now let's see if we cananswer it for databases, and see where that gets us.
Thanks for reading.

Monty
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] Active or passive role with our database layer

Reply via email to