I am sponsoring the following fast-track on behalf of Sherry
Moore, with timeout set for one week, June 20 2008.

Supplemental documents in the materials directory include more
detailed design and implementation specs and man page updates
including diffs.

The project requests minor binding only.


-jg




1. Introduction
   1.1. Project/Component Working Name:
        Solaris Fast Reboot

   1.2. Name of Document Author/Supplier:
        Sherry Q. Moore

   1.3. Date of This Document:
        5/29/2008

4. Technical Description

   4.1 Introduction

    Solaris has always strived to be the most reliable and available
    operating system.  Many technologies have been invented to achieve
    this goal, the notable ones include Dynamic Reconfiguration (DR),
    Fault Management Architecture (FMA), SMF, ZFS, just to name a few.
    The objectives of all these projects are to keep the systems up and
    running correctly in the face of unexpected hardware and software
    failures with as little down time as possible.

    System boot/reboot time is considered system down time.  The less
    time a system spends in the boot phase, the more useful work it can
    do.  High availability is extremely important to most, if not all,
    of our customers.  Shorter reboot time also reduces the test
    turnaround time, thus improves developers' productivity.

   4.2 Background

    The Solaris boot and reboot path involves the following basic
    steps:

        On x86 systems:
            (Hardware reset) -> BIOS -> grub -> dboot -> kernel

        On SPARC systems:
            (Hardware reset) -> POST -> OBP -> dboot -> kernel

    On x86 systems, upon startup or reset, the BIOS code performs
    hardware testing and initialization, then jumps to "grub" the boot
    loader.  Grub loads dboot, unix text, data and the boot archive
    into memory, then calls dboot.  dboot does necessary
    initialization, such as building the initial page tables, or
    loading kernel text and data to a different location, then jumps to
    the kernel.

    As computer systems become more complex, the time they spend in the
    BIOS/POST phase to test and initialize hardware gets longer.  In
    the next 12 months we expect to see x86 systems with 1TB of
    memory.  Memory initialization alone will take over 1/2 hour.  It
    becomes more and more desirable to short circuit the reboot path so
    that the firmware and bootloaders can be bypassed.

    The fast reboot code will act as an in-kernel boot loader that
    loads the kernel into memory and switches to it.  The new kernel in
    the context of this write-up includes the dboot that gets tacked on
    during build time.

    The goal of the Solaris Fast Reboot project is to get to login
    prompt from "rebooting..." within seconds (assuming boot archive
    has been updated).

    The Solaris implementation will support systems with arbitrarily
    large amount of memory, and provide flexibility to reboot to 32-bit
    or 64-bit kernels.

   4.3 Interface Table

    INTERFACE           COMMITMENT LEVEL        COMMENT

    reboot -f (1M)      Committed       To initiate a fast reboot.

    reboot -f -e (1M)   Committed       To fast reboot to a different BE.

    uadmin(2)           Committed       Added AD_FASTBOOT and
                                        AD_FASTBOOT_DRYRUN to facilitate
                                        fast reboot.

    quiesce(9E)         Committed       To quiesce a device.

    ddi_no_quiesce(9F)  Committed       Returns DDI_SUCCESS. No need to quiesce.

    ddi_quiesce_not_\   Committed       Returns DDI_FAILURE. Quiesce needed
    supported(9F)                       but not implemented.

    dev_ops(9S)         Committed       Added devo_quiesce ops for
                                        quiescing devices.

6. Resources and Schedule

   6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON

   6.5. ARC review type: FastTrack

   6.6. ARC Exposure: open


A. Man pages
   A.1 quiesce(9E): new man page
   A.2 reboot(1M)
   A.3 uadmin(2)
   A.4 dev_ops(9S)
 
A.1 Man page for quiesce(9E)

   Man pages for ddi_no_quiesce(9F) and ddi_quiesce_not_supported(9F)
   will be links to the man page for quiesce(9E).


Driver Entry Points                                    quiesce(9E)

NAME
     quiesce - quiesce a device

SYNOPSIS
     #include <sys/ddi.h>
     #include <sys/sunddi.h>

     int prefix quiesce(dev_info_t dip, ddi_quiesce_cmd_t cmd, void *arg);

     int ddi_no_quiesce(dev_info_t *dip, ddi_quiesce_cmd_t cmd, void *arg);

     int ddi_quiesce_not_supported(dev_info_t *dip, ddi_quiesce_cmd_t cmd,
        void *arg);


INTERFACE LEVEL
     Solaris DDI specific (Solaris DDI)

PARAMETERS
     dip    A pointer to the device's dev_info structure.


     cmd    Type of quiesce operation.  Currently only DDI_QUIESCE
            is supported.

    
     arg    Argument to the quiesce routine if needed.  Can be set
            to NULL.


DESCRIPTION
     The quiesce() function quiesces a device so that it will no longer
     generate interrupts or modify or access memory.  It should reset
     the device to a hardware state from which it can be correctly
     configured by the driver's attach() routine without a system power
     cycle or being configured by the firmware.  For devices that come
     with factory default settings, drivers must also restore such
     settings in its quiesce() routine.

  DDI_QUIESCE
     If cmd is set to DDI_QUIESCE, quiesce() is used to stop devices
     from generating interrupts or modify or access memory.  One such
     use case is Fast Reboot where firmware is bypassed when booting to
     a new OS image.

     The quiesce() function will be called once for each instance of
     the device for which there has been a successful attach().  The
     system guarantees that the function will only be called for a
     particular dev_info node after a successful attach(9E) of that
     device.  The system is not single-threaded when quiesce() is
     called, so the driver must ensure that concurrent accesses to the
     device when quiesce() is invoked is correctly coordinated.  The
     driver can choose to drop outstanding I/O instead of waiting for
     them to complete as long as it can guarantee on disk data
     integrity.  The driver must cancel any outstanding timeouts and
     remove outstanding tasks from taskqs before returning successfully
     from quiesce().

     If quiesce() determines a particular instance of the device cannot
     be quiesced when requested because of some exceptional condition,
     quiesce() must return DDI_FAILURE.  This should almost never
     happen.

     For the fast reboot case, if DDI_FAILURE is returned for the
     DDI_QUIESCE cmd, regular reboot path will be taken.

     If a driver has previously implemented the obsolete reset()
     interface, its functionality must be merged into quiesce().  The
     driver's reset() routine will no longer be called if an
     implementation of quiesce() is present.

     The ddi_no_quiesce() function always returns DDI_SUCCESS.  It is
     used to indicate that a device does not need to be quiesced for
     fast reboot.

     The ddi_quiesce_not_supported() always returns DDI_FAILURE. It
     is used to indicate that the device needs to be quiesced but
     the device driver has not implemented the function yet.


RETURN VALUES
     DDI_SUCCESS    For DDI_QUIESCE, the device has been successfully
                    quiesced.

     DDI_FAILURE    The operation failed or the request  was  not
                    understood.


CONTEXT
     This function is called from kernel context only.

ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:



     ____________________________________________________________
    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    |_____________________________|_____________________________|
    | Interface Stability         | Committed                   |
    |_____________________________|_____________________________|


SEE ALSO
     attach(9E), detach(9E), ddi_add_intr(9F), ddi_map_regs(9F),
     pci_config_setup(9F), ddi_no_quiesce(9F),
     ddi_quiesce_not_supported(9F), timeout(9F), reboot(1M),
     uadmin(1M), uadmin(2)


A.2 Man page for reboot(1M)

System Administration Commands                         reboot(1M)



NAME
     reboot - restart the operating system

SYNOPSIS
     /usr/sbin/reboot [-dlnqf] [-e BE] [boot_arguments]                 |


DESCRIPTION
     The reboot utility restarts the kernel. The kernel is loaded
     into  memory by the PROM monitor, which transfers control to
     the loaded kernel.


     On x86 systems, when the -f flag is specified, the running         |
     kernel will load the next kernel into memory, then transfers       |
     control to the loaded kernel.                                      |


     Although reboot can be run by the super-user  at  any  time,
     shutdown(1M) is normally used first to warn all users logged
     in of the impending loss of service.  See  shutdown(1M)  for
     details.


     The reboot utility performs  a  sync(1M)  operation  on  the
     disks,  and  then  a  multi-user  reboot  is  initiated. See
     init(1M) for details. On x86 systems, reboot may also update
     the boot archive as needed to ensure a successful reboot.


     The reboot utility normally logs the reboot  to  the  system
     log daemon, syslogd(1M), and places a shutdown record in the
     login accounting  file  /var/adm/wtmpx.  These  actions  are
     inhibited if the -n or -q options are present.


     Normally, the system reboots itself  at  power-up  or  after
     crashes.

OPTIONS
     The following options are supported:

     -d    Force  a  system  crash  dump  before  rebooting.  See
           dumpadm(1M)  for  information  on  configuring  system
           crash dumps.


     -e    If -f is present, reboot to the specified boot               |
           environment.                                                 |
                                                                        |
                                                                        |
     -f    Fast reboot bypassing firmware and boot loader.  The         |
           new kernel will be loaded into memory by the running         |
           kernel, and control will be transferred to the loaded        |
           kernel.  If disk or kernel arguments are specified,          |
           they must be specified before other boot arguments.          |
           See Example 3 for details.                                   |
                                                                        |
           Currently only available on x86 system.                      |
                                                                        |

     -l    Suppress sending a message to the system  log  daemon,
           syslogd(1M) about who executed reboot.


     -n    Avoid calling sync(2) and do not  log  the  reboot  to
           syslogd(1M)  or  to  /var/adm/wtmpx.  The kernel still
           attempts to sync filesystems prior to  reboot,  except
           if  the  -d option is also present. If -d is used with
           -n, the kernel does not attempt to sync filesystems.


     -q    Quick. Reboot quickly and ungracefully, without  shut-
           ting down running processes first.


OPERANDS
     The following operands are supported:

     boot_arguments    An optional boot_arguments specifies argu-
                       ments  to  the uadmin(2) function that are
                       passed to the boot program and kernel upon
                       restart. The form and list of arguments is
                       described in the boot(1M)  and  kernel(1M)
                       man  pages..  If  the arguments are speci-
                       fied, whitespace between them is  replaced
                       by  single spaces unless the whitespace is
                       quoted   for    the    shell.    If    the
                       boot_arguments  begin  with a hyphen, they
                       must be preceded by the -- delimiter  (two
                       hyphens)  to  denote the end of the reboot
                       argument list.


EXAMPLES
     Example 1 Passing the -r and -v Arguments to boot


     In the following example, the  delimiter  --  (two  hyphens)
     must  be  used  to  separate  the options of reboot from the
     arguments of boot(1M).


       example# reboot -dl -- -rv



     Example 2 Rebooting Using a Specific Disk and Kernel


     The following example reboots using a specific disk and ker-
     nel.


       example# reboot disk1 kernel.test/unix


     Example 3 Fast reboot                                              |
                                                                        |
     Check if all the drivers on the system are fast reboot capable.    |
                                                                        |
       example# reboot -f dryrun                                        |
                                                                        |
     Rebooting to another UFS root disk.                                |
                                                                        |
       example# reboot -f -- '/dev/dsk/c1d0s0'                          |
                                                                        |
     Rebooting to another ZFS root pool.                                |
                                                                        |
       example# reboot -f -- 'rootpool/root1'                           |
                                                                        |
     Rebooting to "mykernel" on the same disk with "-k" option.         |
                                                                        |
       example# reboot -f -- '/platform/i86pc/mykernel/amd64/unix -k'   |
                                                                        |
     Rebooting to "mykernel" off another root disk mounted on /mnt.     |
                                                                        |
       example# reboot -f -- '/mnt/platform/i86pc/mykernel/amd64/unix -k' |
                                                                        |
     Rebooting to "/platform/i86pc/kernel/$ISADIR/unix" on another boot |
     environment named "second_root".                                   |
                                                                        |
       example# reboot -f -e second_root                                |
                                                                        |
     Rebooting to the same kernel with "-kv" options.                   |
                                                                        |
       example# reboot -f -- '-kv'                                      |


FILES
     /var/adm/wtmpx    login accounting file


ATTRIBUTES

System Administration Commands                         reboot(1M)


     See attributes(5) for descriptions of the  following  attri-
     butes:

     ____________________________________________________________
    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    |_____________________________|_____________________________|
    | Availability                | SUNWcsu                     |
    |_____________________________|_____________________________|


SEE ALSO
     mdb(1), boot(1M), dumpadm(1M), fsck(1M), halt(1M), init(1M),
     kernel(1M),  shutdown(1M),  sync(1M),  syslogd(1M), sync(2),
     uadmin(2), reboot(3C), attributes(5)

NOTES
     The  reboot  utility  does  not  execute  the   scripts   in
     /etc/rcnum.d  or execute shutdown actions in inittab(4).  To
     ensure  a  complete  shutdown  of   system   services,   use
     shutdown(1M) or init(1M) to reboot a Solaris system.


A.3 Man page for uadmin(2)


System Calls                                            uadmin(2)


NAME
     uadmin - administrative control

SYNOPSIS
     #include <sys/uadmin.h>

     int uadmin(int cmd, int fcn, uintptr_t mdep);


DESCRIPTION
     The uadmin() function provides control for basic administra-
     tive functions. This function is tightly coupled to the sys-
     tem administrative procedures and is not intended  for  gen-
     eral  use.  The  argument  mdep  is  provided  for  machine-
     dependent use and is not defined here. It should be initial-
     ized to NULL if not used.


     As specified by cmd, the following commands are available:

     A_SHUTDOWN    The system is shut down.  All  user  processes
                   are  killed,  the buffer cache is flushed, and
                   the root file system is unmounted. The  action
                   to  be  taken  after  the system has been shut
                   down is specified by fcn.  The  functions  are
                   generic;  the  hardware  capabilities  vary on
                   specific machines.

                   AD_HALT        Halt the processor(s).


                   AD_POWEROFF    Halt the processor(s) and  turn
                                  off the power.


                   AD_BOOT        Reboot the  system,  using  the
                                  kernel file.


                   AD_IBOOT       Interactive  reboot;  user   is
                                  prompted  for  bootable program
                                  name.


                   AD_FASTREBOOT  Bypass BIOS and boot loader           |
                                                                        |
                   AD_FASTREBOOT_DRYRUN  Fast reboot dry run to         |
                                  check whether a system supports       |
                                  fast reboot.                          |


     A_REBOOT      The  system  stops  immediately  without   any
                   further  processing.  The  action  to be taken
                   next is specified by fcn as above.


     A_DUMP        The system  is  forced  to  panic  immediately
                   without  any  further  processing  and a crash
                   dump  is  written  to  the  dump  device  (see
                   dumpadm(1M)).  The  action to be taken next is
                   specified by fcn, as above.


     A_REMOUNT     The root file system is  mounted  again  after
                   having  been  fixed.  This should be used only
                   during the startup process.


     A_FREEZE      Suspend the whole system.  The system state is
                   preserved  in  the  state  file. The following
                   subcommands, specified by fcn, are available.

                   AD_SUSPEND_TO_DISK          Save  the   system
                                               state to the state
                                               file. This subcom-
                                               mand is equivalent
                                               to ACPI state S4.


                   AD_CHECK_SUSPEND_TO_DISK    Check if your sys-
                                               tem       supports
                                               suspend  to  disk.
                                               Without performing
                                               a           system
                                               suspend/resume,
                                               this    subcommand
                                               checks   if   this
                                               feature         is
                                               currently   avail-
                                               able on your  sys-
                                               tem.


                   AD_SUSPEND_TO_RAM           Save  the   system
                                               state   to  memory
                                               This subcommand is
                                               equivalent to ACPI
                                               state S3.


                   AD_CHECK_SUSPEND_TO_RAM     Check if your sys-
                                               tem       supports
                                               suspend to memory.
                                               Without performing
                                               a           system
                                               suspend/resume,
                                               this    subcommand
                                               checks   if   this
                                               feature         is
                                               currently
                                               available on  your
                                               system.

                   The following subcommands, specified  by  fcn,
                   are  obsolete and might be removed in a subse-
                   quent release:

                   AD_COMPRESS    Save the system  state  to  the
                                  state  file with compression of
                                  data. This subcommand has  been
                                  replaced by AD_SUSPEND_TO_DISK,
                                  which should be used instead.


                   AD_CHECK       Check if your  system  supports
                                  suspend  and  resume.   Without
                                  performing      a        system
                                  suspend/resume,   this  command
                                  checks  if  this   feature   is
                                  currently   available  on  your
                                  system.  This  subcommand   has
                                  been         replaced        by
                                  AD_CHECK_SUSPEND_TO_DISK, which
                                  should be used instead.


                   AD_FORCE       Force  AD_COMPRESS  even   when
                                  threads  of  user  applications
                                  are not suspendable. This  sub-
                                  command  should  never be used,
                                  as it might result in undefined
                                  behavior.



RETURN VALUES
     Upon successful completion, the value  returned  depends  on
     cmd as follows:

     A_SHUTDOWN    Never returns.


     A_REBOOT      Never returns.


     A_FREEZE      0 upon resume.


     A_REMOUNT     0.

     Otherwise, -1 is returned and errno is set to  indicate  the
     error.

ERRORS
     The uadmin() function will fail if:

     EBUSY      Suspend is already in progress.


     EINVAL     The cmd argument is invalid.


     ENOMEM     Suspend/resume ran out of physical memory.


     ENOSPC     Suspend/resume could not allocate enough space on
                the root file system to store system information.


     ENOTSUP    Suspend/resume is not supported on this  platform
                or the command specified by cmd is not allowed.


     ENXIO      Unable to successfully suspend system.


     EPERM      The {PRIV_SYS_CONFIG} privilege is  not  asserted
                in the effective set of the calling process.


ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:



     ____________________________________________________________
    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    |_____________________________|_____________________________|
    | Interface Stability         | See below.                  |
    |_____________________________|_____________________________|



     The A_FREEZE command and its subcommands are Committed.

SEE ALSO
     dumpadm(1M),    kernel(1M),    uadmin(1M),    attributes(5),
     privileges(5)



A.4 Man page for dev_ops(9S)


Data Structures for Drivers                           dev_ops(9S)



NAME
     dev_ops - device operations structure

SYNOPSIS
     #include <sys/conf.h>
     #include <sys/devops.h>


INTERFACE LEVEL
     Solaris DDI specific (Solaris DDI).

DESCRIPTION
     dev_ops contains driver common fields and  pointers  to  the
     bus_ops and cb_ops(9S).


     Following are the device functions provided  in  the  device
     operations  structure.   All  fields  must be set at compile
     time.

     devo_rev          Driver  build   version.   Set   this   to
                       DEVO_REV.


     devo_refcnt       Driver reference count. Set this to 0.


     devo_getinfo      Get   device   driver   information   (see
                       getinfo(9E)).


     devo_identify     This  entry  point  is  obsolete.  Set  to
                       nulldev.


     devo_probe        Probe device. See probe(9E).


     devo_attach       Attach driver to dev_info. See attach(9E).


     devo_detach       Detach/prepare  driver  to   unload.   See
                       detach(9E).


     devo_reset        Reset  device.  (Not  supported  in   this
                       release.) Set this to nodev.


     devo_cb_ops       Pointer to cb_ops(9S) structure  for  leaf
                       drivers.


     devo_bus_ops      Pointer to bus  operations  structure  for
                       nexus drivers. Set this to NULL if this is
                       for a leaf driver.


     devo_power        Power a device  attached  to  system.  See
                       power(9E).


     devo_quiesce      Quiesce a device  attached  to  system.  See     |
                       quiesce(9E).  Can be set to ddi_no_quiesce if    |
                       the device does not generate interrupts or       |
                       perform DMA.                                     |

STRUCTURE MEMBERS
       int              devo_rev;
       int              devo_refcnt;
       int              (*devo_getinfo)(dev_info_t *dip,
                       ddi_info_cmd_t infocmd, void *arg, void **result);
       int              (*devo_identify)(dev_info_t *dip);
       int              (*devo_probe)(dev_info_t *dip);
       int              (*devo_attach)(dev_info_t *dip,
                       ddi_attach_cmd_t cmd);
       int              (*devo_detach)(dev_info_t *dip,
                       ddi_detach_cmd_t cmd);
       int              (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd);
       struct cb_ops    *devo_cb_ops;
       struct bus_ops   *devo_bus_ops;
       int              (*devo_power)(dev_info_t *dip, int component, int 
level);
       int              (*devo_quiesce)(dev_info_t *dip,                |
                        ddi_quiesce_cmd_t cmd, void *arg);              |


SEE ALSO
     attach(9E), detach(9E), getinfo(9E),  probe(9E),  power(9E),       |
     quiesce(9E), nodev(9F)                                             |



Reply via email to