On 22/01/14 20:54, Clint Byrum wrote:
>
>I don't understand the aversion to using existing, well-known tools to handle 
this?
>
These tools are of course available to users and nobody is stopping them
from using them. We are optimizing for not needing them. They are there
and we're not going to explode if you use them. You just lose one aspect
of what we're aiming at. I believe that having image based deploys will
be well received as long as it is simple to understand.

>A hybrid model (blending 2 and 3, above) here I think would work best where
>TripleO lays down a baseline image and the cloud operator would employ an 
well-known
>and support configuration tool for any small diffs.
>
These tools are popular because they control entropy and make it at
least more likely that what you tested ends up on the boxes.

A read-only root partition is a much stronger control on entropy.

>The operator would then be empowered to make the call for any major upgrades 
that
>would adversely impact the infrastructure (and ultimately the users/apps).  
He/She
>could say, this is a major release, let's deploy the image.
>
>Something logically like this, seems reasonable:
>
>     if (system_change > 10%) {
>       use TripleO;
>       } else {
>       use Existing_Config_Management;
>     }
>
I think we can make deploying minor updates minimally invasive.

We've kept it simple enough, this should be a fairly straight forward
optimization cycle. And the win there is that we also improve things
for the 11% change.


Hi Clint,

For deploying minimally-invasive minor updates, the idea, if I've understood it correctly, would be to deploy a tarball which replaced selected files on the (usually read-only) root filesystem. That would allow for selective restarting of only the services which are directly affected. The alternative, pushing out a complete root filesystem image, would necessitate the same amount of disruption in all cases.

There are a handful of costs with that approach which concern me: It simplifies the deployment itself, but increases the complexity of preparing the deployment. The administrator is going to have to identify the services which need to be restarted, based on the particular set of libraries which are touched in their partial update, and put together the service restart scripts accordingly.

We're also making the administrator responsible for managing the sequence in which incremental updates are deployed. Since each incremetal update will re-write a particular set of files, any machine which gets updates 1,2, 3, there's an oversight, and then update 5 is deployed would end up in an odd state, which would require additional tooling to detect. Package based updates, with versioning and dependency tracking on each package, mitigate that risk.

Then there's the relationship between the state of running machines, with applied partial updates, and the images which are put onto new machines by Ironic. We would need to apply the partial updates to the images which Ironic writes, or to have the tooling to ensure that newly deployed machines immediately apply the set of applicable partial updates, in sequence.

Solving these issues feels like it'll require quite a lot of additional tooling.


Angus




_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to