Warning: wall of text incoming :-) On 26/05/2017 03:55, Carter, Kevin wrote: > If you've taken on an adventure like this how did you approach > it? Did it work? Any known issues, gotchas, or things folks should be > generally aware of?
We're fresh out of a Juno-to-Mitaka upgrade. It worked, but it required significant downtime of the user VMs for an OS upgrade on all compute nodes (we fell behind CentOS update schedule due to some code requiring specific kernel versions, so we could not perform a no-downtime upgrade even though we're using LinuxBridge for the data plane). We took a significant amount of time to automate almost everything (OS updates, OpenStack updates and configuration management), but the control plane migration was performed manually with a lot of verification steps to ensure the databases would not end up in shambles (the procedure was carefully written in a runbook and tested on a separate testbed and on a snapshot of all production databases). As I said, the update worked but we hit a few snags: 1. glance and neutron DBs were created with latin1 as default charset, so we had to convert both to UTF8 (dump, iconv, fix the definition, restore) - this is an operational issue on our side, though 2. on the testbed we found that nova created duplicated entries for all hypervisors after starting all services, we traced that down to compute_nodes.host being NULL for all HVs 3. [cache]/enable in nova.conf *must* be set to true if there are multiple instances of nova-consoleauth/nova-novncproxy, in previous releases we'd just point nova to our memcache servers and it would work (probably we overlooked something in the docs) > During our chat today we generally landed on an in-place upgrade with > known API service downtime and little (at least as little as possible) > data plane downtime. The process discussed was basically: > a1. Create utility "thing-a-me" (container, venv, etc) which contains > the required code to run a service through all of the required upgrades. > a2. Stop service(s). > a3. Run migration(s)/upgrade(s) for all releases using the utility > "thing-a-me". > a4. Repeat for all services. > > b1. Once all required migrations are complete run a deployment using the > target release. > b2. Ensure all services are restarted. > b3. Ensure cloud is functional. > b4. profit! That was our basic workflow, except the "thing-a-me" was myself :-) Joking aside, we kept one controller host out of the "mass upgrade" loop and carefully performed single-version upgrades of the packages, running all required DB migrations for each version. > Also, the tooling is not very general purpose or portable outside of OSA > but it could serve as a guide or just a general talking point.> Are there > other tools out there that solve for the multi-release upgrade? Not that I know. AFAIR, the BlueBox guys (now IBM) had some Ansible-based tooling for automating a single-version upgrade, I don't know if they ever considered skip-level upgrades. > Best practices? 1. automate as much as possible 2. use a configuration management tool to deploy the final configuration to all nodes (Puppet, Ansible, Chef...) 3. have a testing environment which resembles *as closely as possible* the production environment 4. simulate all migrations on a snapshot of all production databases to catch any issue early > Do folks believe tools are the right way to solve this or would > comprehensive upgrade documentation be better for the general community? Both, actually. A generic upgrade tool would need to cover *a lot* of deployment scenarios, so it would probably end up being a "reference implementation" only. Comprehensive skip-level upgrade documentation would be optimal (in our case we had to rebuild Kilo and Liberty docs from sources). > As most of the upgrade issues center around database migrations, we > discussed some of the potential pitfalls at length. One approach was to > roll-up all DB migrations into a single repository and run all upgrades > for a given project in one step. Another was to simply have mutliple > python virtual environments and just run in-line migrations from a > version specific venv (this is what the OSA tooling does). Does one way > work better than the other? Any thoughts on how this could be better? > Would having N+2/3 migrations addressable within the projects, even if > they're not tested any longer, be helpful? Some projects apparently keep shipping all migrations, even though they're not supported. > It was our general thought that folks would be interested in having the > ability to skip releases so we'd like to hear from the community to > validate our thinking. That's good to know :-) -- Matteo Panella INFN CNAF Via Ranzani 13/2 c - 40127 Bologna, Italy Phone: +39 051 609 2903
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators