Hi

> On 6 Oct 2014, at 17:41, Clint Byrum <[email protected]> wrote:
> We have to be _extremely_ careful in how we manage this. I actually think
> it has potential to really blow up in our faces.

Yes, anything we do here has the potential to be extremely ruinous for 
operators, but the reality is that any existing TripleO deployment is at pretty 
severe risk of blowing up because of UIDs/GIDs changing when they update.

> We need to give people
> a way to move forward without us merging a patch, and at the same time
> we need to make sure we provide a consistent set of UIDs for anything
> people may want to deploy with diskimage-builder.

IMO the only desirable option *has* to be that we statically define UIDs and 
GIDs in the elements, because:
 1: Requires no data fragments to be kept safe and fed to subsequent build 
processes
 2: Doesn't do anything dynamic on first boot that could take hours/days
 3: Can be thoroughly audited at build time to ensure correctness

As you rightly point out though, any existing deployments will definitely be 
disrupted by this, but as I said above, all we'd be doing there is moving the 
needle from "possible/probable" to "definite".

Since the only leftovers we have from their previous image builds, are the 
images themselves, we could add the ability for a DIB run to extract IDs from a 
previous image, but this couldn't be required as a default build option, so 
we'd still risk existing deployments if they don't notice this feature.

We could create a script that would spider an existing cloud and extract its ID 
mappings, to produce a fragment to feed into future builds, but again we're 
relying on operators to know that they need to do this.

Instead, I agree with Greg's view that this is our fault and we should fix it. 
We didn't think of this sooner, and as a result, our users are at risk. If we 
don't entirely fix this ourselves, we will be both expecting them to become 
aware of this issue and expecting them to do additional work to mitigate it.

To that end, I think we should audit all of our elements for use of /mnt/state/ 
and use the specific knowledge we have of the software they relate to, to build 
one-time ID migration scripts, which would:
 1: Execute before any related services start
 2: Compare the now-static ID mappings against known files in /mnt/state
 3: chown/chgrp any files/directories that need migrating
 4: store a flag file in /mnt/state indicating that this process doesn't need 
to run again

It does mean they have a potentially painfully long update process once, but 
the result will be a completely stable, static arrangement that will not 
require them to preserve precious build fragments for the rest of time. Nor 
does it require some odd run-time remapping, or any additional mechanisms to 
centralise user management (e.g. LDAP. Please, no LDAP!)

I think that tying ourselves and our operators into knots because we're afraid 
of the hit of one-time data migration, is crazy.

AFAICS, the only risk left at that point, is elements that other people are 
maintaining. If we consider that to be a sufficient risk, we can still build 
the mechanism for injecting ID values from a previous build (essentially just 
seeding the static values that we'd be setting anyway) and apologise to the 
users who need that, or who don't discover its existence and break their clouds.

Cheers,

Chris
_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to