Re: [openstack-dev] [tripleo] Upgrade plans for RDO Manager - Brainstorming

Zane Bitter Wed, 09 Sep 2015 08:37:59 -0700

On 24/08/15 15:12, Emilien Macchi wrote:

Hi,


So I've been working on OpenStack deployments for 4 years now and so far
RDO Manager is the second installer -after SpinalStack [1]- I'm working on.

SpinalStack already had interested features [2] that allowed us to
upgrade our customer platforms almost every months, with full testing
and automation.

Now, we have RDO Manager, I would be happy to share my little experience
on the topic and help to make it possible in the next cycle.

For that, I created an etherpad [3], which is not too long and focused
on basic topics for now. This is technical and focused on Infrastructure
upgrade automation.

Feel free to continue discussion on this thread or directly in the etherpad.

[1] http://spinalstack.enovance.com
[2] http://spinalstack.enovance.com/en/latest/dev/upgrade.html
[3] https://etherpad.openstack.org/p/rdo-manager-upgrades

I added some notes on the etherpad, but I think this discussion poses alarger question: what is TripleO? Why are we using Heat? Because to methe major benefit of Heat is that it maintains a record of the currentstate of the system that can be used to manage upgrades. And if we'renot going to make use of that - if we're going to determine the state ofthe system by introspecting nodes and update it by using Ansible scriptswithout Heat's knowledge, then we probably shouldn't be using Heat at all.

I'm not saying that to close off the option - I think if Heat is not thebest tool for the job then we should definitely consider other options.And right now it really is not the best tool for the job. AdoptingPuppet (which was a necessary choice IMO) has meant that theresponsibility for what I call "software orchestration"[1] is splitawkwardly between Puppet and Heat. For example, the Puppet manifests arebaked in to images on the servers, so Heat doesn't know when they'vechanged and can't retrigger Puppet to update the configuration when theydo. We're left trying to reverse-engineer what is supposed to be adeclarative model from the workflow that we want for things likeupdates/upgrades.

That said, I think there's still some cause for optimism: in a worldwhere every service is deployed in a container and every container hasits own Heat SoftwareDeployment, the boundary between Heat'sresponsibilities and Puppet's would be much clearer. The deploymentcould conceivably fit a declarative model much better, and even offer alot of flexibility in which services run on which nodes. We won't reallyknow until we try, but it seems distinctly possible to aspire towardHeat actually making things easier rather than just not making them toomuch harder. And there is stuff on the long-term roadmap that could bereally great if only we had time to devote to it - for example, as Imentioned in the etherpad, I'd love to get Heat's user hooks integratedwith Mistral so that we could have fully-automated, highly-available (ina hypothetical future HA undercloud) live migration of workloads offcompute nodes during updates.

In the meantime, however, I do think that we have all the tools in Heatthat we need to cobble together what we need to do. In Liberty, Heatsupports batched rolling updates of ResourceGroups, so we won't need touse user hooks to cobble together poor-man's batched update support anymore. We can use the user hooks for their intended purpose of notifyingthe client when to live-migrate compute workloads off a server that isabout to upgraded. The Heat templates should already tell us exactlywhich services are running on which nodes. We can trigger particularsoftware deployments on a stack update with a parameter value change (aswe already do with the yum update deployment). For operations thathappen in isolation on a single server, we can model them asSoftwareDeployment resources within the individual server templates. Foroperations that are synchronised across a group of servers (e.g.disabling services on the controller nodes in preparation for a DBmigration) we can model them as a SoftwareDeploymentGroup resource inthe parent template. And for chaining multiple sequential operations(e.g. disable services, migrate database, enable services), we can chainoutputs to inputs to handle both ordering and triggering. I'm sure therewill be many subtleties, but I don't think we *need* Ansible in the mix.

So it's really up to the wider TripleO project team to decide which pathto go down. I am genuinely not bothered whether we choose Heat orAnsible. There may even be ways they can work together withoutcompromising either model. But I would be pretty uncomfortable with amix where we use Heat for deployment and Ansible for doing upgradesbehind Heat's back.


cheers,
Zane.

[1]http://www.zerobanana.com/archive/2014/05/08#heat-configuration-management


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Upgrade plans for RDO Manager - Brainstorming

Reply via email to