Excerpts from Chris Jones's message of 2014-10-07 04:35:33 -0700:
> Hi
> 
> > On 6 Oct 2014, at 17:41, Clint Byrum <[email protected]> wrote:
> > We have to be _extremely_ careful in how we manage this. I actually think
> > it has potential to really blow up in our faces.
> 
> Yes, anything we do here has the potential to be extremely ruinous for 
> operators, but the reality is that any existing TripleO deployment is at 
> pretty severe risk of blowing up because of UIDs/GIDs changing when they 
> update.
> 
> > We need to give people
> > a way to move forward without us merging a patch, and at the same time
> > we need to make sure we provide a consistent set of UIDs for anything
> > people may want to deploy with diskimage-builder.
> 
> IMO the only desirable option *has* to be that we statically define UIDs and 
> GIDs in the elements, because:
>  1: Requires no data fragments to be kept safe and fed to subsequent build 
> processes
>  2: Doesn't do anything dynamic on first boot that could take hours/days
>  3: Can be thoroughly audited at build time to ensure correctness
> 
> As you rightly point out though, any existing deployments will definitely be 
> disrupted by this, but as I said above, all we'd be doing there is moving the 
> needle from "possible/probable" to "definite".
> 
> Since the only leftovers we have from their previous image builds, are the 
> images themselves, we could add the ability for a DIB run to extract IDs from 
> a previous image, but this couldn't be required as a default build option, so 
> we'd still risk existing deployments if they don't notice this feature.
> 
> We could create a script that would spider an existing cloud and extract its 
> ID mappings, to produce a fragment to feed into future builds, but again 
> we're relying on operators to know that they need to do this.
> 

Welllll... they'd know they need to do _something_ because their UIDs
and GIDs are all horked up (technical term).

> Instead, I agree with Greg's view that this is our fault and we should fix 
> it. We didn't think of this sooner, and as a result, our users are at risk. 
> If we don't entirely fix this ourselves, we will be both expecting them to 
> become aware of this issue and expecting them to do additional work to 
> mitigate it.
> 
> To that end, I think we should audit all of our elements for use of 
> /mnt/state/ and use the specific knowledge we have of the software they 
> relate to, to build one-time ID migration scripts, which would:
>  1: Execute before any related services start
>  2: Compare the now-static ID mappings against known files in /mnt/state
>  3: chown/chgrp any files/directories that need migrating
>  4: store a flag file in /mnt/state indicating that this process doesn't need 
> to run again
> 
> It does mean they have a potentially painfully long update process once, but 
> the result will be a completely stable, static arrangement that will not 
> require them to preserve precious build fragments for the rest of time. Nor 
> does it require some odd run-time remapping, or any additional mechanisms to 
> centralise user management (e.g. LDAP. Please, no LDAP!)
> 
> I think that tying ourselves and our operators into knots because we're 
> afraid of the hit of one-time data migration, is crazy.
> 
> AFAICS, the only risk left at that point, is elements that other people are 
> maintaining. If we consider that to be a sufficient risk, we can still build 
> the mechanism for injecting ID values from a previous build (essentially just 
> seeding the static values that we'd be setting anyway) and apologise to the 
> users who need that, or who don't discover its existence and break their 
> clouds.

I'm not afraid of running migrations once. I want to make sure we never
_plan_ to run migrations as part of regular operation.

I agree with most of what you've written, but first I'd start with this:

* Create an element which exports /etc/passwd and /etc/group from build
process.

* Create an element which imports /etc/passwd and /etc/group from local
disk into image. This will have an element-provides of uid-gid-map

* Create a separate element called 'static-users' which also provides
uid-gid-map. Contains a map of uids and gids, and creates users early on
with static UIDs/GIDs only. Disables usual commands used to add users and
groups (error message should explain well enough that user can add their
own element that provides uid-gid-map or switch to importing/exporting).

* Make use-ephemeral depend on uid-gid-map.

* Make tripleo-ci build with static-users, and recommend it in TripleO
documentation.

Once that is done, we will be producing builds with static users. If you
want to create a user for base TripleO, you'll need to do it by hand in
the static-users element. If you are downstream and want to do things
differently that should be easy, just provide your own uid-gid-map
element.

As for migrations, that is fairly simple and can be done generically,
I've already written a script that does it fairly reliably. The only
worry is of course that large collections of files will take a long
time. I'll submit that as a separate element called 'fix-state-uid-gid'
or something like that. We might as well include it in the default build,
so that our images start fixing this problem now. :-P

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to