Re: [openstack-dev] [TripleO] our update story: can people live with it?

Angus Thomas Thu, 23 Jan 2014 05:17:58 -0800

On 22/01/14 20:54, Clint Byrum wrote:

>
>I don't understand the aversion to using existing, well-known tools to handle 
this?
>

These tools are of course available to users and nobody is stopping them
from using them. We are optimizing for not needing them. They are there
and we're not going to explode if you use them. You just lose one aspect
of what we're aiming at. I believe that having image based deploys will
be well received as long as it is simple to understand.

>A hybrid model (blending 2 and 3, above) here I think would work best where
>TripleO lays down a baseline image and the cloud operator would employ an 
well-known
>and support configuration tool for any small diffs.
>

These tools are popular because they control entropy and make it at
least more likely that what you tested ends up on the boxes.

A read-only root partition is a much stronger control on entropy.

>The operator would then be empowered to make the call for any major upgrades 
that
>would adversely impact the infrastructure (and ultimately the users/apps).  
He/She
>could say, this is a major release, let's deploy the image.
>
>Something logically like this, seems reasonable:
>
>     if (system_change > 10%) {
>       use TripleO;
>       } else {
>       use Existing_Config_Management;
>     }
>

I think we can make deploying minor updates minimally invasive.

We've kept it simple enough, this should be a fairly straight forward
optimization cycle. And the win there is that we also improve things
for the 11% change.


Hi Clint,

For deploying minimally-invasive minor updates, the idea, if I'veunderstood it correctly, would be to deploy a tarball which replacedselected files on the (usually read-only) root filesystem. That wouldallow for selective restarting of only the services which are directlyaffected. The alternative, pushing out a complete root filesystem image,would necessitate the same amount of disruption in all cases.

There are a handful of costs with that approach which concern me: Itsimplifies the deployment itself, but increases the complexity ofpreparing the deployment. The administrator is going to have to identifythe services which need to be restarted, based on the particular set oflibraries which are touched in their partial update, and put togetherthe service restart scripts accordingly.

We're also making the administrator responsible for managing thesequence in which incremental updates are deployed. Since eachincremetal update will re-write a particular set of files, any machinewhich gets updates 1,2, 3, there's an oversight, and then update 5 isdeployed would end up in an odd state, which would require additionaltooling to detect. Package based updates, with versioning and dependencytracking on each package, mitigate that risk.

Then there's the relationship between the state of running machines,with applied partial updates, and the images which are put onto newmachines by Ironic. We would need to apply the partial updates to theimages which Ironic writes, or to have the tooling to ensure that newlydeployed machines immediately apply the set of applicable partialupdates, in sequence.

Solving these issues feels like it'll require quite a lot of additionaltooling.



Angus




_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Reply via email to