Hi,

Instead of "trust-bios-boot-device", can we name it something like "automatic-boot-archive-recovery"? Future firmware support will not use the bios-boot-device property, and the property name should reflect its actual use instead of its implementation detail.

Also: the text:
                set to false by default. Setting it to true
                communicates that both the system's BIOS and default
                GRUB menu entry are set to boot from the current boot
                device.

is misleading -- we don't specify a boot "device" in GRUB anymore. We use the findroot/bootfs commands to search for the root pool or (in updates) the appropriate UFS filesystem.

 --S

Quoting Jerry Gilliam, who wrote the following on Thu, 20 May 2010:


I'm submitting the following fast-track on behalf of Jan Setje-Eilers
with time-out set to 05/27/2010.  Minor/patch binding is requested.

-----

1) Summary

       While automated recovery from an out of sync boot archive is
       possible, it does require rebooting the world with the updated
       archive. With both OBP and fast-reboot this can be reliably
       initiated from the previously running system. However BIOS
       based x86 systems lack any way to ensure that the system will
       boot from a specific device when coming out of reset, but can
       be (and often are) configured to boot from a specific device
       or ordered list of devices by the administrator. If a system is
       configured in that way it is highly desirable to allow it to
       participate in the fully automated recovery just like OBP
       based or fast-reboot capable systems.

       The bulk of text in this case described the implementation of
       the automated recovery.

       The interface described is however just the service fmri and
       the property name that allow a BIOS based system to be flagged
       as safe to automatically reboot.

2) Technical Details

2.0) Automated recovery

  2.1.1) Background:

       The boot_archive verification was designed to catch a problem
       that has little to do with the boot_archive.

       The concern is that the boot_archive may have been generated
       before some kernel components were updated. This would
       allow a random match of older components, which were loaded
       from the boot_archive and newer components, which were loaded
       from the filesystem to interact in an unpredictable manner
       leading to potential data corruption.

       This scenario represents  an unclean shutdown  during or after
       updating kernel  components without the use of install/upgrade
       or  patch  tools  which   will complete their   transaction by
       updating the boot_archive. Given  that such components need to
       be  updated as a grouped transaction  and updating the archive
       simply  represents closing the  transaction, this issue exists
       with and without the boot_archive.


  2.1.2) Problem:

       There are other files in the archive that are updated out of
       band that cause the archive to become out of sync causing the
       check to keep the system from recovering unattended after a
       crash or power failure.

       The majority of these have been addressed with the fixes for
       6256649 and 6803974.


  2.1.3) Remaining problem:

       We will never be able to account for things like third party
       subsystems modifying binaries contained in the archive.

       Such unknown scenarios need to be handled via an automated
       archive re-build followed by (re)booting with the new
       archive.


  2.1.4) Solution:

       As long as kernel components are upgraded using a supported
       install/upgrade/patch mechanism it is reasonable to trust the
       running kernel to be sufficiently self-consistent to not be
       dangerous at that point. In fact a system that has been
       subject to power failure or panic after non-matched kernel
       components were copied into place simply needs to be assumed
        to be trust-worthy by the the check as that is exactly what
        would happen without the archive or check if the panic/power
        failure had occurred during the update.

       This means it is reasonable to re-build the archive and reboot
       the system. Alternatively it's also possible to provide an
       automated failsafe, reboot to it, rebuild the archive and then
       reboot again, but based on the above premise that is not
       necessary.

       Since the point is to provide unattended recovery, the system
       must reboot to the same root fs with the same options. On x86,
       where the OS does not have control over the firmware's boot
       device selection this requires fast reboot which allows a
       running kernel to boot another directly without returning
       control to the firmware (this also bypasses POST which is
       significant performance win, but in this case that is
       secondary). On SPARC, where we have sufficient control over
       OBP, it is reasonable to simply reboot passing through the
       firmware.

       The lack of ability to reboot to an explicit device/root fs
       prior to fast reboot is what has been blocking this work since
       the initial introduction of the check. Since we now have fast
       reboot, this work can be put in place as well.

       The actual implementation is rather simple. The boot_archive
       service already blocks the system from reaching multi-user, so
       all it needs to do is explicitly rebuild the archive (the ufs
       case would also need to mount -o rw /) and then reboot with
       the exact options used for the current boot.

       Since SPARC does not require fast reboot to reliable reboot to
       a specific device/fs the s10 back-port retains full value for
        SPARC customers. x86 customer still get the option of
        setting up their systems to boot from the proper device and
        conveying this by setting trust-bios-boot-device to true.

3) Public Interfaces

       The interface is expected to be documented and used by system
       administrators.

       3.1) svc:/system/boot-config:default

               svc fmri to store the property. While the service was
               introduced in PSARC 2008/760, this case extends the
               binding for the service to micro/patch.

       3.1) trust-bios-boot-device

               A boolean property associated with the
               system/boot-config service called
               trust-bios-boot-device that is set to false by
               default. The property can be set via to true by an
               administrator to convey that they have
               configured both the BIOS and the default GRUB menu
               entry to boot from an appropriate device
               or list of devices in order to allow for an automated
               reboot in case an recovery is performed.

               Typically most systems are configured that way however
               in isolated situations the consequences of
               automatically rebooting to an unknown OS or OS
               configuration may be unpredictable, hence explicit
               confirmation from the administrator is required.

                Like other service properties, trust-bios-boot-device
                can be set and queried via svccfg(1). For example:

                        svccfg -s svc:/system/boot-config:default \
                            listprop config/trust-bios-boot-device

                 Will list the state, while:

                        svccfg -s svc:/system/boot-config:default \
                            setprop config/trust-bios-boot-device = true

                 Will set it to true.

4) Man page addition

        4.1) boot(1m)

                The service svc:/system/boot-config:default contains
                the boolean property trust-bios-boot-device which is
                set to false by default. Setting it to true
                communicates that both the system's BIOS and default
                GRUB menu entry are set to boot from the current boot
                device.

                This allows the system to automatically reboot in
                order to recover from conditions such as an out of
                date boot-archive. The value of this property can be
                changed using svccfg(1M) and svcadm(1M).

               Typically most systems are configured that way however
               in isolated situations the consequences of
               automatically rebooting to an unknown OS or OS
               configuration may be unpredictable, hence explicit
               confirmation from the administrator is required.

                Example setting trust-bios-boot-device to allow for
                    automated reboots.

                example# svccfg -s svc:/system/boot-config:default \
                         setprop config/trust-bios-boot-device = true





_______________________________________________
opensolaris-arc mailing list
[email protected]

Reply via email to