Excerpts from Chris Jones's message of 2014-10-07 04:35:33 -0700: > Hi > > > On 6 Oct 2014, at 17:41, Clint Byrum <[email protected]> wrote: > > We have to be _extremely_ careful in how we manage this. I actually think > > it has potential to really blow up in our faces. > > Yes, anything we do here has the potential to be extremely ruinous for > operators, but the reality is that any existing TripleO deployment is at > pretty severe risk of blowing up because of UIDs/GIDs changing when they > update. > > > We need to give people > > a way to move forward without us merging a patch, and at the same time > > we need to make sure we provide a consistent set of UIDs for anything > > people may want to deploy with diskimage-builder. > > IMO the only desirable option *has* to be that we statically define UIDs and > GIDs in the elements, because: > 1: Requires no data fragments to be kept safe and fed to subsequent build > processes > 2: Doesn't do anything dynamic on first boot that could take hours/days > 3: Can be thoroughly audited at build time to ensure correctness > > As you rightly point out though, any existing deployments will definitely be > disrupted by this, but as I said above, all we'd be doing there is moving the > needle from "possible/probable" to "definite". > > Since the only leftovers we have from their previous image builds, are the > images themselves, we could add the ability for a DIB run to extract IDs from > a previous image, but this couldn't be required as a default build option, so > we'd still risk existing deployments if they don't notice this feature. > > We could create a script that would spider an existing cloud and extract its > ID mappings, to produce a fragment to feed into future builds, but again > we're relying on operators to know that they need to do this. >
Welllll... they'd know they need to do _something_ because their UIDs and GIDs are all horked up (technical term). > Instead, I agree with Greg's view that this is our fault and we should fix > it. We didn't think of this sooner, and as a result, our users are at risk. > If we don't entirely fix this ourselves, we will be both expecting them to > become aware of this issue and expecting them to do additional work to > mitigate it. > > To that end, I think we should audit all of our elements for use of > /mnt/state/ and use the specific knowledge we have of the software they > relate to, to build one-time ID migration scripts, which would: > 1: Execute before any related services start > 2: Compare the now-static ID mappings against known files in /mnt/state > 3: chown/chgrp any files/directories that need migrating > 4: store a flag file in /mnt/state indicating that this process doesn't need > to run again > > It does mean they have a potentially painfully long update process once, but > the result will be a completely stable, static arrangement that will not > require them to preserve precious build fragments for the rest of time. Nor > does it require some odd run-time remapping, or any additional mechanisms to > centralise user management (e.g. LDAP. Please, no LDAP!) > > I think that tying ourselves and our operators into knots because we're > afraid of the hit of one-time data migration, is crazy. > > AFAICS, the only risk left at that point, is elements that other people are > maintaining. If we consider that to be a sufficient risk, we can still build > the mechanism for injecting ID values from a previous build (essentially just > seeding the static values that we'd be setting anyway) and apologise to the > users who need that, or who don't discover its existence and break their > clouds. I'm not afraid of running migrations once. I want to make sure we never _plan_ to run migrations as part of regular operation. I agree with most of what you've written, but first I'd start with this: * Create an element which exports /etc/passwd and /etc/group from build process. * Create an element which imports /etc/passwd and /etc/group from local disk into image. This will have an element-provides of uid-gid-map * Create a separate element called 'static-users' which also provides uid-gid-map. Contains a map of uids and gids, and creates users early on with static UIDs/GIDs only. Disables usual commands used to add users and groups (error message should explain well enough that user can add their own element that provides uid-gid-map or switch to importing/exporting). * Make use-ephemeral depend on uid-gid-map. * Make tripleo-ci build with static-users, and recommend it in TripleO documentation. Once that is done, we will be producing builds with static users. If you want to create a user for base TripleO, you'll need to do it by hand in the static-users element. If you are downstream and want to do things differently that should be easy, just provide your own uid-gid-map element. As for migrations, that is fairly simple and can be done generically, I've already written a script that does it fairly reliably. The only worry is of course that large collections of files will take a long time. I'll submit that as a separate element called 'fix-state-uid-gid' or something like that. We might as well include it in the default build, so that our images start fixing this problem now. :-P _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
