I'm submitting the following fast-track on behalf of Jan Setje-Eilers
with time-out set to 05/27/2010.  Minor/patch binding is requested.

-----

1) Summary

        While automated recovery from an out of sync boot archive is
        possible, it does require rebooting the world with the updated
        archive. With both OBP and fast-reboot this can be reliably
        initiated from the previously running system. However BIOS
        based x86 systems lack any way to ensure that the system will
        boot from a specific device when coming out of reset, but can
        be (and often are) configured to boot from a specific device
        or ordered list of devices by the administrator. If a system is
        configured in that way it is highly desirable to allow it to
        participate in the fully automated recovery just like OBP
        based or fast-reboot capable systems.

        The bulk of text in this case described the implementation of
        the automated recovery.

        The interface described is however just the service fmri and
        the property name that allow a BIOS based system to be flagged
        as safe to automatically reboot.

2) Technical Details

 2.0) Automated recovery

   2.1.1) Background:

        The boot_archive verification was designed to catch a problem
        that has little to do with the boot_archive.

        The concern is that the boot_archive may have been generated
        before some kernel components were updated. This would
        allow a random match of older components, which were loaded
        from the boot_archive and newer components, which were loaded
        from the filesystem to interact in an unpredictable manner
        leading to potential data corruption.

        This scenario represents  an unclean shutdown  during or after
        updating kernel  components without the use of install/upgrade
        or  patch  tools  which   will complete their   transaction by
        updating the boot_archive. Given  that such components need to
        be  updated as a grouped transaction  and updating the archive
        simply  represents closing the  transaction, this issue exists
        with and without the boot_archive.


   2.1.2) Problem:

        There are other files in the archive that are updated out of
        band that cause the archive to become out of sync causing the
        check to keep the system from recovering unattended after a
        crash or power failure.

        The majority of these have been addressed with the fixes for
        6256649 and 6803974.


   2.1.3) Remaining problem:

        We will never be able to account for things like third party
        subsystems modifying binaries contained in the archive.

        Such unknown scenarios need to be handled via an automated
        archive re-build followed by (re)booting with the new
        archive.


   2.1.4) Solution:

        As long as kernel components are upgraded using a supported
        install/upgrade/patch mechanism it is reasonable to trust the
        running kernel to be sufficiently self-consistent to not be
        dangerous at that point. In fact a system that has been
        subject to power failure or panic after non-matched kernel
        components were copied into place simply needs to be assumed
        to be trust-worthy by the the check as that is exactly what
        would happen without the archive or check if the panic/power
        failure had occurred during the update.

        This means it is reasonable to re-build the archive and reboot
        the system. Alternatively it's also possible to provide an
        automated failsafe, reboot to it, rebuild the archive and then
        reboot again, but based on the above premise that is not
        necessary.

        Since the point is to provide unattended recovery, the system
        must reboot to the same root fs with the same options. On x86,
        where the OS does not have control over the firmware's boot
        device selection this requires fast reboot which allows a
        running kernel to boot another directly without returning
        control to the firmware (this also bypasses POST which is
        significant performance win, but in this case that is
        secondary). On SPARC, where we have sufficient control over
        OBP, it is reasonable to simply reboot passing through the
        firmware.

        The lack of ability to reboot to an explicit device/root fs
        prior to fast reboot is what has been blocking this work since
        the initial introduction of the check. Since we now have fast
        reboot, this work can be put in place as well.

        The actual implementation is rather simple. The boot_archive
        service already blocks the system from reaching multi-user, so
        all it needs to do is explicitly rebuild the archive (the ufs
        case would also need to mount -o rw /) and then reboot with
        the exact options used for the current boot.

        Since SPARC does not require fast reboot to reliable reboot to
        a specific device/fs the s10 back-port retains full value for
        SPARC customers. x86 customer still get the option of
        setting up their systems to boot from the proper device and
        conveying this by setting trust-bios-boot-device to true.

3) Public Interfaces

        The interface is expected to be documented and used by system
        administrators.

        3.1) svc:/system/boot-config:default

                svc fmri to store the property. While the service was
                introduced in PSARC 2008/760, this case extends the
                binding for the service to micro/patch.

        3.1) trust-bios-boot-device

                A boolean property associated with the
                system/boot-config service called
                trust-bios-boot-device that is set to false by
                default. The property can be set via to true by an
                administrator to convey that they have
                configured both the BIOS and the default GRUB menu
                entry to boot from an appropriate device
                or list of devices in order to allow for an automated
                reboot in case an recovery is performed.

                Typically most systems are configured that way however
                in isolated situations the consequences of
                automatically rebooting to an unknown OS or OS
                configuration may be unpredictable, hence explicit
                confirmation from the administrator is required.

                Like other service properties, trust-bios-boot-device
                can be set and queried via svccfg(1). For example:

                        svccfg -s svc:/system/boot-config:default \
                            listprop config/trust-bios-boot-device

                 Will list the state, while:

                        svccfg -s svc:/system/boot-config:default \
                            setprop config/trust-bios-boot-device = true

                 Will set it to true.

4) Man page addition

        4.1) boot(1m)

                The service svc:/system/boot-config:default contains
                the boolean property trust-bios-boot-device which is
                set to false by default. Setting it to true
                communicates that both the system's BIOS and default
                GRUB menu entry are set to boot from the current boot
                device.

                This allows the system to automatically reboot in
                order to recover from conditions such as an out of
                date boot-archive. The value of this property can be
                changed using svccfg(1M) and svcadm(1M).

                Typically most systems are configured that way however
                in isolated situations the consequences of
                automatically rebooting to an unknown OS or OS
                configuration may be unpredictable, hence explicit
                confirmation from the administrator is required.

                Example setting trust-bios-boot-device to allow for
                    automated reboots.

                example# svccfg -s svc:/system/boot-config:default \
                         setprop config/trust-bios-boot-device = true




_______________________________________________
opensolaris-arc mailing list
[email protected]

Reply via email to