Thanks to everyone who commented so far.

 Here's an attempt summarize what I've gathered so far on the archive

 The primary concern is the editable files in the archive, not the ones
delivered and managed by the install and patching tools.

 Further, the most serious concern is the system stopping during boot
and requiring the administrator to sign off on its state. This is
followed by the concern about the effect of using old data.

 It sounds like it would be acceptable for whatever tool is used to
update these files to do whatever sync is needed atomically as part of
the update. Sadly for most of these files this tool is still a text
editor and we don't get to fix that in a patch release. :(

 However it seems like the files of concern fall into a category of
their own that is currently being dealt with in a manner that should
actually be reserved for out of sync kernel binaries. This suggest we
have the following three types of files in the archive:

 1) Kernel binaries (including modules and drivers)

        If any of these are out of sync and not just new, the concern
        is that things with miss-matched interfaces may be running and
        the kernel may act unpredictably.

        So, for these we stop hard in a panic like fashion and refuse
        to mount root until someone active takes responsibility for
        doing so.

        This is the classic check.

 2) Files that are either caches or only grow or can be safely re-read
    later. 

        These are what's currently in filelist.safe. These files do
        not cause the system to stop during boot and trigger an
        archive update later in boot. This is the refinement that went
        back into nv44 and u4.

 3) Files that etc/system that contain information that can't always
    be usefully processed later during boot, but if out of date _do
    not_ leave the system in a dangerously unstable state.

        These files are currently being treated just like the kernel
        binaries. However since the system is not dangerously unstable
        at this point, it is reasonable to drive on and mount root
        read-write.

        This means they should really get their own check.

        If this check fails, I propose the following:

                Print a warning to console.

                Leave a service (which won't block multi-user) in
                maintenance mode so the state is communicated via svcs
                -x.

                Drive on and mount root read-write.

                Update the archive.

                And potentially reboot immediately to the device we
                just booted from now that the archive is updated.

        The auto-reboot still makes us a little nervous, so it may be
        something that needs to be explicitly enabled based on site
        policy, but at least on sparc we have a solid idea of what
        boot device we booted from, so it may turn out to a reasonable
        default action to take.

        Clearly the exact service dependencies will differ depending
        on whether or not the system will automatically reboot to
        pick up the changed files.

 Ideas, thoughts, comments?

-jan



Reply via email to