Jan Setje-Eilers wrote:
> Thanks to everyone who commented so far. > > Here's an attempt summarize what I've gathered so far on the archive > > The primary concern is the editable files in the archive, not the ones >delivered and managed by the install and patching tools. > > Further, the most serious concern is the system stopping during boot >and requiring the administrator to sign off on its state. This is >followed by the concern about the effect of using old data. > > It sounds like it would be acceptable for whatever tool is used to >update these files to do whatever sync is needed atomically as part of >the update. Sadly for most of these files this tool is still a text >editor and we don't get to fix that in a patch release. :( > > However it seems like the files of concern fall into a category of >their own that is currently being dealt with in a manner that should >actually be reserved for out of sync kernel binaries. This suggest we >have the following three types of files in the archive: > > 1) Kernel binaries (including modules and drivers) > > If any of these are out of sync and not just new, the concern > is that things with miss-matched interfaces may be running and > the kernel may act unpredictably. > > So, for these we stop hard in a panic like fashion and refuse > to mount root until someone active takes responsibility for > doing so. > > This is the classic check. > > 2) Files that are either caches or only grow or can be safely re-read > later. > > These are what's currently in filelist.safe. These files do > not cause the system to stop during boot and trigger an > archive update later in boot. This is the refinement that went > back into nv44 and u4. > > 3) Files that etc/system that contain information that can't always > be usefully processed later during boot, but if out of date _do > not_ leave the system in a dangerously unstable state. > > These files are currently being treated just like the kernel > binaries. However since the system is not dangerously unstable > at this point, it is reasonable to drive on and mount root > read-write. > > This means they should really get their own check. > > If this check fails, I propose the following: > > Print a warning to console. > > Leave a service (which won't block multi-user) in > maintenance mode so the state is communicated via svcs > -x. > > Drive on and mount root read-write. > If the root is zfs, it will already have been mounted read-write (there is no need to do a read-only mount since zfs has no fsck.) so no need for a remount at this point. I don't think this changes the overall logic here though. > > Update the archive. > > And potentially reboot immediately to the device we > just booted from now that the archive is updated. > > The auto-reboot still makes us a little nervous, so it may be > something that needs to be explicitly enabled based on site > policy, but at least on sparc we have a solid idea of what > boot device we booted from, so it may turn out to a reasonable > default action to take. > > Clearly the exact service dependencies will differ depending > on whether or not the system will automatically reboot to > pick up the changed files. > > Ideas, thoughts, comments? > >-jan > > > >
