Hi, At the meeting recently it was mentioned that our rolling upgrades efforts are pursuing an "elusive unicorn" that makes development a lot more complicated and restricted. I want to try to clarify this a bit, explain the strategy more and give an update on the status of the whole affair.
So first of all - it's definitely achievable, as Nova supports rolling upgrades from Kilo. It makes developer's life harder, but the feature is useful, e.g. CERN was able to upgrade their compute nodes after control plane services in their enormously big environment in their Juno->Kilo upgrade [1]. Rolling upgrades are all about interoperability of services running in different versions. We want to give operators ability to upgrade service instances one-by-one, starting form c-api, through c-sch to c-vol and c-bak. Moreover we want to be sure that old and new version of a single service can coexist. This means we need to be backward compatible with at least one previous release. There are 3 planes on which incompatibilities may happen: * API of RPC methods * Structure of composite data sent over RPC * DB schemas API of RPC methods ------------------ Here we're strictly following Nova's solution described in [2]. We need to support RPC version pinning, so each RPC API addition needs to be versioned and we need to be able to downgrade the request to required version in rpcapi.py modules. On the other side manager.py should be able to process the request even when it doesn't receive newly added parameter. There are already some examples of this approach in tree ([3], [4]). Until the upgrade is completed the RPC API version is pinned so everything should be compatible with older release. Once only new services are running the pin may be released. Structure of composite data sent over RPC ----------------------------------------- Again RPC version pinning is utilized with addition of versioned objects. Before sending the object we will translate it to the lower version - according to the version pin. This will make sure that object can be understand by older services. Note that newer services can translate the object back to the new version when receiving an old one. DB schemas ---------- This is a hard one. We've needed to adapt approach described in [5] to our needs as we're calling the DB from all of our services and not only from nova-conductor as Nova does. This means that in case of a non-backward compatible migration we need to stretch the process through 3 releases. Good news is that we haven't needed such migration since Juno (in M we have a few candidates… :(). Process for Cinder is described in [6]. In general we want to ban migrations that are non-backward compatible or exclusively lock the table for an extended period of time ([7] is a good source of truth for MySQL) and allow them only if they follow 3-relase period of migration (so that N+2 release has no notion of a column or table so we can drop it). Right now we're finishing the oslo.versionedobjects adoption - outstanding patches can be found in [8] (there are still a few to come - look at table at the bottom of [9]). In case of DB schemas upgrades we've merged the spec and a test that's banning contracting migrations is in review [10]. In case of RPC API compatibility I'm actively reviewing the patches to make sure every change there is done properly. Apart from that in the backlog is documenting all this in devref and implementing partial upgrade Grenade tests that will gate on version interoperability. I hope this clarifies a bit how we're progressing to be able to upgrade Cinder with minimal or no downtime. [1] http://openstack-in-production.blogspot.de/2015/11/our-cloud-in-kilo.html [2] http://www.danplanet.com/blog/2015/10/05/upgrades-in-nova-rpc-apis/ [3] https://github.com/openstack/cinder/blob/12e4d9236/cinder/scheduler/rpcapi.py#L89-L93 [4] https://github.com/openstack/cinder/blob/12e4d9236/cinder/scheduler/manager.py#L124-L128 [5] http://www.danplanet.com/blog/2015/10/07/upgrades-in-nova-database-migrations/ [6] https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/online-schema-upgrades.html [7] https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-overview.html [8] https://review.openstack.org/#/q/branch:master+topic:bp/cinder-objects,n,z [9] https://etherpad.openstack.org/p/cinder-rolling-upgrade [10] https://review.openstack.org/#/q/branch:master+topic:bp/online-schema-upgrades,n,z __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
