Re: [openstack-dev] [tripleo] Upgrade plans for RDO Manager - Brainstorming

2015-09-17 Thread Jan Provaznik

On 09/09/2015 05:34 PM, Zane Bitter wrote:

On 24/08/15 15:12, Emilien Macchi wrote:

Hi,

So I've been working on OpenStack deployments for 4 years now and so far
RDO Manager is the second installer -after SpinalStack [1]- I'm
working on.

SpinalStack already had interested features [2] that allowed us to
upgrade our customer platforms almost every months, with full testing
and automation.

Now, we have RDO Manager, I would be happy to share my little experience
on the topic and help to make it possible in the next cycle.

For that, I created an etherpad [3], which is not too long and focused
on basic topics for now. This is technical and focused on Infrastructure
upgrade automation.

Feel free to continue discussion on this thread or directly in the
etherpad.

[1] http://spinalstack.enovance.com
[2] http://spinalstack.enovance.com/en/latest/dev/upgrade.html
[3] https://etherpad.openstack.org/p/rdo-manager-upgrades


I added some notes on the etherpad, but I think this discussion poses a
larger question: what is TripleO? Why are we using Heat? Because to me
the major benefit of Heat is that it maintains a record of the current
state of the system that can be used to manage upgrades. And if we're
not going to make use of that - if we're going to determine the state of
the system by introspecting nodes and update it by using Ansible scripts
without Heat's knowledge, then we probably shouldn't be using Heat at all.

I'm not saying that to close off the option - I think if Heat is not the
best tool for the job then we should definitely consider other options.
And right now it really is not the best tool for the job. Adopting
Puppet (which was a necessary choice IMO) has meant that the
responsibility for what I call "software orchestration"[1] is split
awkwardly between Puppet and Heat. For example, the Puppet manifests are
baked in to images on the servers, so Heat doesn't know when they've
changed and can't retrigger Puppet to update the configuration when they
do. We're left trying to reverse-engineer what is supposed to be a
declarative model from the workflow that we want for things like
updates/upgrades.

That said, I think there's still some cause for optimism: in a world
where every service is deployed in a container and every container has
its own Heat SoftwareDeployment, the boundary between Heat's
responsibilities and Puppet's would be much clearer. The deployment
could conceivably fit a declarative model much better, and even offer a
lot of flexibility in which services run on which nodes. We won't really
know until we try, but it seems distinctly possible to aspire toward
Heat actually making things easier rather than just not making them too
much harder. And there is stuff on the long-term roadmap that could be
really great if only we had time to devote to it - for example, as I
mentioned in the etherpad, I'd love to get Heat's user hooks integrated
with Mistral so that we could have fully-automated, highly-available (in
a hypothetical future HA undercloud) live migration of workloads off
compute nodes during updates.



TBH I don't expect that using containers will significantly simplify (or 
make clearer) the upgrade process. It would work nicely if upgrade would 
mean just replacing one container with another (where a container is 
represented by a heat resource). But I'm convinced that a container 
replacement will actually involve a complex workflow of actions which 
have to be done before and after.



In the meantime, however, I do think that we have all the tools in Heat
that we need to cobble together what we need to do. In Liberty, Heat
supports batched rolling updates of ResourceGroups, so we won't need to
use user hooks to cobble together poor-man's batched update support any
more. We can use the user hooks for their intended purpose of notifying
the client when to live-migrate compute workloads off a server that is


Unfortunately rolling_updates supports only "pause time" between update 
batches, so if any workflow would be needed between batches (e.g. pause 
before next batch until user validates that previous batch update was 
successful), we still have to use user hooks. But I guess adding hooks 
support to rolling_updates wouldn't be too difficult.



about to upgraded. The Heat templates should already tell us exactly
which services are running on which nodes. We can trigger particular
software deployments on a stack update with a parameter value change (as
we already do with the yum update deployment). For operations that
happen in isolation on a single server, we can model them as
SoftwareDeployment resources within the individual server templates. For
operations that are synchronised across a group of servers (e.g.
disabling services on the controller nodes in preparation for a DB
migration) we can model them as a SoftwareDeploymentGroup resource in
the parent template. And for chaining multiple sequential operations
(e.g. disable services, migrate database, enable 

Re: [openstack-dev] [tripleo] Upgrade plans for RDO Manager - Brainstorming

2015-09-09 Thread Zane Bitter

On 24/08/15 15:12, Emilien Macchi wrote:

Hi,

So I've been working on OpenStack deployments for 4 years now and so far
RDO Manager is the second installer -after SpinalStack [1]- I'm working on.

SpinalStack already had interested features [2] that allowed us to
upgrade our customer platforms almost every months, with full testing
and automation.

Now, we have RDO Manager, I would be happy to share my little experience
on the topic and help to make it possible in the next cycle.

For that, I created an etherpad [3], which is not too long and focused
on basic topics for now. This is technical and focused on Infrastructure
upgrade automation.

Feel free to continue discussion on this thread or directly in the etherpad.

[1] http://spinalstack.enovance.com
[2] http://spinalstack.enovance.com/en/latest/dev/upgrade.html
[3] https://etherpad.openstack.org/p/rdo-manager-upgrades


I added some notes on the etherpad, but I think this discussion poses a 
larger question: what is TripleO? Why are we using Heat? Because to me 
the major benefit of Heat is that it maintains a record of the current 
state of the system that can be used to manage upgrades. And if we're 
not going to make use of that - if we're going to determine the state of 
the system by introspecting nodes and update it by using Ansible scripts 
without Heat's knowledge, then we probably shouldn't be using Heat at all.


I'm not saying that to close off the option - I think if Heat is not the 
best tool for the job then we should definitely consider other options. 
And right now it really is not the best tool for the job. Adopting 
Puppet (which was a necessary choice IMO) has meant that the 
responsibility for what I call "software orchestration"[1] is split 
awkwardly between Puppet and Heat. For example, the Puppet manifests are 
baked in to images on the servers, so Heat doesn't know when they've 
changed and can't retrigger Puppet to update the configuration when they 
do. We're left trying to reverse-engineer what is supposed to be a 
declarative model from the workflow that we want for things like 
updates/upgrades.


That said, I think there's still some cause for optimism: in a world 
where every service is deployed in a container and every container has 
its own Heat SoftwareDeployment, the boundary between Heat's 
responsibilities and Puppet's would be much clearer. The deployment 
could conceivably fit a declarative model much better, and even offer a 
lot of flexibility in which services run on which nodes. We won't really 
know until we try, but it seems distinctly possible to aspire toward 
Heat actually making things easier rather than just not making them too 
much harder. And there is stuff on the long-term roadmap that could be 
really great if only we had time to devote to it - for example, as I 
mentioned in the etherpad, I'd love to get Heat's user hooks integrated 
with Mistral so that we could have fully-automated, highly-available (in 
a hypothetical future HA undercloud) live migration of workloads off 
compute nodes during updates.


In the meantime, however, I do think that we have all the tools in Heat 
that we need to cobble together what we need to do. In Liberty, Heat 
supports batched rolling updates of ResourceGroups, so we won't need to 
use user hooks to cobble together poor-man's batched update support any 
more. We can use the user hooks for their intended purpose of notifying 
the client when to live-migrate compute workloads off a server that is 
about to upgraded. The Heat templates should already tell us exactly 
which services are running on which nodes. We can trigger particular 
software deployments on a stack update with a parameter value change (as 
we already do with the yum update deployment). For operations that 
happen in isolation on a single server, we can model them as 
SoftwareDeployment resources within the individual server templates. For 
operations that are synchronised across a group of servers (e.g. 
disabling services on the controller nodes in preparation for a DB 
migration) we can model them as a SoftwareDeploymentGroup resource in 
the parent template. And for chaining multiple sequential operations 
(e.g. disable services, migrate database, enable services), we can chain 
outputs to inputs to handle both ordering and triggering. I'm sure there 
will be many subtleties, but I don't think we *need* Ansible in the mix.


So it's really up to the wider TripleO project team to decide which path 
to go down. I am genuinely not bothered whether we choose Heat or 
Ansible. There may even be ways they can work together without 
compromising either model. But I would be pretty uncomfortable with a 
mix where we use Heat for deployment and Ansible for doing upgrades 
behind Heat's back.


cheers,
Zane.


[1] 
http://www.zerobanana.com/archive/2014/05/08#heat-configuration-management


__
OpenStack Development Mailing List 

Re: [openstack-dev] [tripleo] Upgrade plans for RDO Manager - Brainstorming

2015-09-03 Thread Emilien Macchi


On 08/24/2015 03:12 PM, Emilien Macchi wrote:
> Hi,
> 
> So I've been working on OpenStack deployments for 4 years now and so far
> RDO Manager is the second installer -after SpinalStack [1]- I'm working on.
> 
> SpinalStack already had interested features [2] that allowed us to
> upgrade our customer platforms almost every months, with full testing
> and automation.
> 
> Now, we have RDO Manager, I would be happy to share my little experience
> on the topic and help to make it possible in the next cycle.
> 
> For that, I created an etherpad [3], which is not too long and focused
> on basic topics for now. This is technical and focused on Infrastructure
> upgrade automation.
> 

One week without discussion or thoughts in the etherpad.
Can anyone who cares about upgrades participate to the thread?

Thank you,
-- 
Emilien Macchi



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev