Re: [openstack-dev] Online Migrations.

Mike Bayer Mon, 15 Jun 2015 15:40:46 -0700


On 6/15/15 4:21 PM, Andrew Laski wrote:

On 06/15/15 at 03:23pm, Mike Bayer wrote:
1. at runtime? e.g. your nova service is running, it's doing "SELECTx, y FROM thing", then some magic thing happens somewhere and the appsuddenly sees, hey "y" is gone! change all queries to "SELECT x FROMthing". What would this magic thing be? Are you going to run areflection of the table schema on every query (you definitelyaren't). So I don't know that this is possible.
Would it be dangerous to signal that 'y' is gone by having a queryfail and at that point the model could be updated? In other words, isthere a chance of a query failing in such a way as to leave data in aninconsistent or undesirable state?

Nova currently breaks up its database activities into many smalldatabase transactions, because it calls upon get_session() brand newwithin most of its methods. So already it has a problem that thefailure of a database transaction is not necessarily atomic againstother things that have happened in a particular API request. We'relooking to improve this with enginefacade however I don't know that someNova operations don't currently rely on this transactional structure inorder to succeed.

As far as the effects of a transaction that fails because a column wasremoved as the transaction proceeded, on the MySQL side I'd not besurprised if some bad things can happen there as its DDL operations arenot transactional, but I don't have knowledge on something specific. Asfar as, the column was removed some number of seconds ago, and a brandnew transaction targets that column unaware that it was removed earlier,that query / transaction just fails in the traditional way, opening usup only to similar issues as any other failure along a transaction doesright now.

But an approach that builds on this way is at the very least far outsidethe mainstream of how relational databases are normally used. Itmeans that Nova is being built such that service failures on a widescale are now part of its design; any time a table or column isremoved, all running nodes will experience failures guaranteed becausewe are relying on a purely optimistic approach. All nodes and evenindividual threads/greenlets unless we build in a highly synchronizedsystem will all be rushing out to the database to perform live schemainspection in order to literally fix its own bugs on the fly, because wedon't have any specific kind of messaging (either versioning, ormessages that indicate a list of columns that have been dropped)referring to what changes have been made. It also means that thisstep has to take place on application startup in any case because theschema state is unknown except from live inspection of the DB.

If I had to visualize what an approach looks like that does thissomewhat cleanly, other than just putting off contract until the API hasnaturally moved beyond it, it would involve a fixed and structuredsource of truth about the specific changes we care about, such as aversioning table or other data table indicating specific "remove()"directives we're checking for, and the application would be organizedsuch that it can always get to this information from an in-memory-cachedsource before it makes decisions about queries. The information wouldneed to support being pushed in from the outside such as via a messagequeue. This would still not protect against operations currently inprogress failing but at least would prevent future operations fromfailing a first time.

We also need to decide on "change the model" vs. "change thequeries". I keep thinking it's going to have to be "change thequeries". ORM and schema models aren't designed to be mutable in asubtractive sense at runtime (e.g. there is no "remove column"; removesare much more difficult to book-keep around than additions), and even ifthey were, the whole scheme would not be safe for concurrency; that is,if 10 greenlets / threads all decided to change the model at the sametime, only the first greenlet/thread would win, and the operation woulddefinitely fail if multiple threads tried to do it at once. Also, theNova Cells model, if I understand correctly, means that the same set ofmodel classes can be used to talk to multiple versions of the databaseat once; so even if we did go through all the trouble to change themodels on the fly, that would then break in a Cells environment assumingnot every database had the same contract steps run.

2. at application start time? e.g. nova service starts up,something happens before "MyThing" is first declared where MyThingknows that "y" is no longer there for this run (or something thatwill impact all the queries and persistence operations, less desirable).
#2 is much more possible. But still, how does it run? How do weknow that "y" is there on one run, and is not there on another? do we:
2a. When the app starts up, we run reflection queries against the DB(e.g. what autogenerate / OSM does, looking in schema catalogs).This is doable, but can get expensive on startup if we really havelots of columns/tables to worry about; it also means that either thechanges to the queries here happen totally at query time (intricate,difficult-ish), as for the change to happen at model definition time(simple, easy) means the app needs to be connected to the databasebefore it imports the models, and this is the complete opposite ofhow Nova's api.py is constructed right now. Plus the feature needsto accommodate for Cells, where there's a totally different databasehappening (maybe this has to be query time for that reason alone).
2b. In a config file somewhere? Some kind of directive that says,"hey we have now dropped "thing.y". What would that look like?
2c. Based on some kind of version number in the database? Not toomuch different from #2a.
That said, I still think we should get the original thing merged. Even
if we did contractions purely with the manual migrations for the
foreseeable future, that'd be something we could deal with.

--Dan
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Online Migrations.

Reply via email to