Thanks to everyone who commented so far.
Here's an attempt summarize what I've gathered so far on the archive
The primary concern is the editable files in the archive, not the ones
delivered and managed by the install and patching tools.
Further, the most serious concern is the system stopping during boot
and requiring the administrator to sign off on its state. This is
followed by the concern about the effect of using old data.
It sounds like it would be acceptable for whatever tool is used to
update these files to do whatever sync is needed atomically as part of
the update. Sadly for most of these files this tool is still a text
editor and we don't get to fix that in a patch release. :(
However it seems like the files of concern fall into a category of
their own that is currently being dealt with in a manner that should
actually be reserved for out of sync kernel binaries. This suggest we
have the following three types of files in the archive:
1) Kernel binaries (including modules and drivers)
If any of these are out of sync and not just new, the concern
is that things with miss-matched interfaces may be running and
the kernel may act unpredictably.
So, for these we stop hard in a panic like fashion and refuse
to mount root until someone active takes responsibility for
doing so.
This is the classic check.
2) Files that are either caches or only grow or can be safely re-read
later.
These are what's currently in filelist.safe. These files do
not cause the system to stop during boot and trigger an
archive update later in boot. This is the refinement that went
back into nv44 and u4.
3) Files that etc/system that contain information that can't always
be usefully processed later during boot, but if out of date _do
not_ leave the system in a dangerously unstable state.
These files are currently being treated just like the kernel
binaries. However since the system is not dangerously unstable
at this point, it is reasonable to drive on and mount root
read-write.
This means they should really get their own check.
If this check fails, I propose the following:
Print a warning to console.
Leave a service (which won't block multi-user) in
maintenance mode so the state is communicated via svcs
-x.
Drive on and mount root read-write.
Update the archive.
And potentially reboot immediately to the device we
just booted from now that the archive is updated.
The auto-reboot still makes us a little nervous, so it may be
something that needs to be explicitly enabled based on site
policy, but at least on sparc we have a solid idea of what
boot device we booted from, so it may turn out to a reasonable
default action to take.
Clearly the exact service dependencies will differ depending
on whether or not the system will automatically reboot to
pick up the changed files.
Ideas, thoughts, comments?
-jan