Jan Setje-Eilers wrote:

> Thanks to everyone who commented so far.
>
> Here's an attempt summarize what I've gathered so far on the archive
>
> The primary concern is the editable files in the archive, not the ones
>delivered and managed by the install and patching tools.
>
> Further, the most serious concern is the system stopping during boot
>and requiring the administrator to sign off on its state. This is
>followed by the concern about the effect of using old data.
>
> It sounds like it would be acceptable for whatever tool is used to
>update these files to do whatever sync is needed atomically as part of
>the update. Sadly for most of these files this tool is still a text
>editor and we don't get to fix that in a patch release. :(
>
> However it seems like the files of concern fall into a category of
>their own that is currently being dealt with in a manner that should
>actually be reserved for out of sync kernel binaries. This suggest we
>have the following three types of files in the archive:
>
> 1) Kernel binaries (including modules and drivers)
>
>       If any of these are out of sync and not just new, the concern
>       is that things with miss-matched interfaces may be running and
>       the kernel may act unpredictably.
>
>       So, for these we stop hard in a panic like fashion and refuse
>       to mount root until someone active takes responsibility for
>       doing so.
>
>       This is the classic check.
>
> 2) Files that are either caches or only grow or can be safely re-read
>    later. 
>
>       These are what's currently in filelist.safe. These files do
>       not cause the system to stop during boot and trigger an
>       archive update later in boot. This is the refinement that went
>       back into nv44 and u4.
>
> 3) Files that etc/system that contain information that can't always
>    be usefully processed later during boot, but if out of date _do
>    not_ leave the system in a dangerously unstable state.
>
>       These files are currently being treated just like the kernel
>       binaries. However since the system is not dangerously unstable
>       at this point, it is reasonable to drive on and mount root
>       read-write.
>
>       This means they should really get their own check.
>
>       If this check fails, I propose the following:
>
>               Print a warning to console.
>
>               Leave a service (which won't block multi-user) in
>               maintenance mode so the state is communicated via svcs
>               -x.
>
>               Drive on and mount root read-write.
>
If the root is zfs, it will already have been mounted read-write
(there is no need to do a read-only mount since zfs has no fsck.)
so no need for a remount at this point.  I don't think this changes
the overall logic here though.

>
>               Update the archive.
>
>               And potentially reboot immediately to the device we
>               just booted from now that the archive is updated.
>
>       The auto-reboot still makes us a little nervous, so it may be
>       something that needs to be explicitly enabled based on site
>       policy, but at least on sparc we have a solid idea of what
>       boot device we booted from, so it may turn out to a reasonable
>       default action to take.
>
>       Clearly the exact service dependencies will differ depending
>       on whether or not the system will automatically reboot to
>       pick up the changed files.
>
> Ideas, thoughts, comments?
>
>-jan
>
>
>  
>


Reply via email to