So one thing that is slightly confusing to me is, why do we need both
ddi_no_quiesce(), and ddi_quiesce_not_supported()? And how is the entry
point being NULL interpreted?
It *seems* (and maybe I'm being naive here), that
ddi_quiesce_not_supported() may not have much value -- drivers that have
a bug where quiesce is necessary but lack the current can just implement
a trivial "return DDI_FAILURE" quiesce in the meantime -- not much more
effort than stubbing in the ddi_quiesce_not_supported(). Additionally,
unlike the ddi_no_quiesce() case, the driver is going to need to be
modified at some point anyway, right?
Finally, it may be informative/helpful to list some example drivers in
the case materials that need to have quiesce implemented, as well as
perhaps a couple that don't.
-- Garrett
Jerry Gilliam wrote:
> I am sponsoring the following fast-track on behalf of Sherry
> Moore, with timeout set for one week, June 20 2008.
>
> Supplemental documents in the materials directory include more
> detailed design and implementation specs and man page updates
> including diffs.
>
> The project requests minor binding only.
>
>
> -jg
>
>
>
>
> 1. Introduction
> 1.1. Project/Component Working Name:
> Solaris Fast Reboot
>
> 1.2. Name of Document Author/Supplier:
> Sherry Q. Moore
>
> 1.3. Date of This Document:
> 5/29/2008
>
> 4. Technical Description
>
> 4.1 Introduction
>
> Solaris has always strived to be the most reliable and available
> operating system. Many technologies have been invented to achieve
> this goal, the notable ones include Dynamic Reconfiguration (DR),
> Fault Management Architecture (FMA), SMF, ZFS, just to name a few.
> The objectives of all these projects are to keep the systems up and
> running correctly in the face of unexpected hardware and software
> failures with as little down time as possible.
>
> System boot/reboot time is considered system down time. The less
> time a system spends in the boot phase, the more useful work it can
> do. High availability is extremely important to most, if not all,
> of our customers. Shorter reboot time also reduces the test
> turnaround time, thus improves developers' productivity.
>
> 4.2 Background
>
> The Solaris boot and reboot path involves the following basic
> steps:
>
> On x86 systems:
> (Hardware reset) -> BIOS -> grub -> dboot -> kernel
>
> On SPARC systems:
> (Hardware reset) -> POST -> OBP -> dboot -> kernel
>
> On x86 systems, upon startup or reset, the BIOS code performs
> hardware testing and initialization, then jumps to "grub" the boot
> loader. Grub loads dboot, unix text, data and the boot archive
> into memory, then calls dboot. dboot does necessary
> initialization, such as building the initial page tables, or
> loading kernel text and data to a different location, then jumps to
> the kernel.
>
> As computer systems become more complex, the time they spend in the
> BIOS/POST phase to test and initialize hardware gets longer. In
> the next 12 months we expect to see x86 systems with 1TB of
> memory. Memory initialization alone will take over 1/2 hour. It
> becomes more and more desirable to short circuit the reboot path so
> that the firmware and bootloaders can be bypassed.
>
> The fast reboot code will act as an in-kernel boot loader that
> loads the kernel into memory and switches to it. The new kernel in
> the context of this write-up includes the dboot that gets tacked on
> during build time.
>
> The goal of the Solaris Fast Reboot project is to get to login
> prompt from "rebooting..." within seconds (assuming boot archive
> has been updated).
>
> The Solaris implementation will support systems with arbitrarily
> large amount of memory, and provide flexibility to reboot to 32-bit
> or 64-bit kernels.
>
> 4.3 Interface Table
>
> INTERFACE COMMITMENT LEVEL COMMENT
>
> reboot -f (1M) Committed To initiate a fast reboot.
>
> reboot -f -e (1M) Committed To fast reboot to a different
> BE.
>
> uadmin(2) Committed Added AD_FASTBOOT and
> AD_FASTBOOT_DRYRUN to facilitate
> fast reboot.
>
> quiesce(9E) Committed To quiesce a device.
>
> ddi_no_quiesce(9F) Committed Returns DDI_SUCCESS. No need to
> quiesce.
>
> ddi_quiesce_not_\ Committed Returns DDI_FAILURE. Quiesce needed
> supported(9F) but not implemented.
>
> dev_ops(9S) Committed Added devo_quiesce ops for
> quiescing devices.
>
> 6. Resources and Schedule
>
> 6.4. Steering Committee requested information
> 6.4.1. Consolidation C-team Name:
> ON
>
> 6.5. ARC review type: FastTrack
>
> 6.6. ARC Exposure: open
>
>
> A. Man pages
> A.1 quiesce(9E): new man page
> A.2 reboot(1M)
> A.3 uadmin(2)
> A.4 dev_ops(9S)
>
> A.1 Man page for quiesce(9E)
>
> Man pages for ddi_no_quiesce(9F) and ddi_quiesce_not_supported(9F)
> will be links to the man page for quiesce(9E).
>
>
> Driver Entry Points quiesce(9E)
>
> NAME
> quiesce - quiesce a device
>
> SYNOPSIS
> #include <sys/ddi.h>
> #include <sys/sunddi.h>
>
> int prefix quiesce(dev_info_t dip, ddi_quiesce_cmd_t cmd, void *arg);
>
> int ddi_no_quiesce(dev_info_t *dip, ddi_quiesce_cmd_t cmd, void *arg);
>
> int ddi_quiesce_not_supported(dev_info_t *dip, ddi_quiesce_cmd_t cmd,
> void *arg);
>
>
> INTERFACE LEVEL
> Solaris DDI specific (Solaris DDI)
>
> PARAMETERS
> dip A pointer to the device's dev_info structure.
>
>
> cmd Type of quiesce operation. Currently only DDI_QUIESCE
> is supported.
>
>
> arg Argument to the quiesce routine if needed. Can be set
> to NULL.
>
>
> DESCRIPTION
> The quiesce() function quiesces a device so that it will no longer
> generate interrupts or modify or access memory. It should reset
> the device to a hardware state from which it can be correctly
> configured by the driver's attach() routine without a system power
> cycle or being configured by the firmware. For devices that come
> with factory default settings, drivers must also restore such
> settings in its quiesce() routine.
>
> DDI_QUIESCE
> If cmd is set to DDI_QUIESCE, quiesce() is used to stop devices
> from generating interrupts or modify or access memory. One such
> use case is Fast Reboot where firmware is bypassed when booting to
> a new OS image.
>
> The quiesce() function will be called once for each instance of
> the device for which there has been a successful attach(). The
> system guarantees that the function will only be called for a
> particular dev_info node after a successful attach(9E) of that
> device. The system is not single-threaded when quiesce() is
> called, so the driver must ensure that concurrent accesses to the
> device when quiesce() is invoked is correctly coordinated. The
> driver can choose to drop outstanding I/O instead of waiting for
> them to complete as long as it can guarantee on disk data
> integrity. The driver must cancel any outstanding timeouts and
> remove outstanding tasks from taskqs before returning successfully
> from quiesce().
>
> If quiesce() determines a particular instance of the device cannot
> be quiesced when requested because of some exceptional condition,
> quiesce() must return DDI_FAILURE. This should almost never
> happen.
>
> For the fast reboot case, if DDI_FAILURE is returned for the
> DDI_QUIESCE cmd, regular reboot path will be taken.
>
> If a driver has previously implemented the obsolete reset()
> interface, its functionality must be merged into quiesce(). The
> driver's reset() routine will no longer be called if an
> implementation of quiesce() is present.
>
> The ddi_no_quiesce() function always returns DDI_SUCCESS. It is
> used to indicate that a device does not need to be quiesced for
> fast reboot.
>
> The ddi_quiesce_not_supported() always returns DDI_FAILURE. It
> is used to indicate that the device needs to be quiesced but
> the device driver has not implemented the function yet.
>
>
> RETURN VALUES
> DDI_SUCCESS For DDI_QUIESCE, the device has been successfully
> quiesced.
>
> DDI_FAILURE The operation failed or the request was not
> understood.
>
>
> CONTEXT
> This function is called from kernel context only.
>
> ATTRIBUTES
> See attributes(5) for descriptions of the following attri-
> butes:
>
>
>
> ____________________________________________________________
> | ATTRIBUTE TYPE | ATTRIBUTE VALUE |
> |_____________________________|_____________________________|
> | Interface Stability | Committed |
> |_____________________________|_____________________________|
>
>
> SEE ALSO
> attach(9E), detach(9E), ddi_add_intr(9F), ddi_map_regs(9F),
> pci_config_setup(9F), ddi_no_quiesce(9F),
> ddi_quiesce_not_supported(9F), timeout(9F), reboot(1M),
> uadmin(1M), uadmin(2)
>
>
> A.2 Man page for reboot(1M)
>
> System Administration Commands reboot(1M)
>
>
>
> NAME
> reboot - restart the operating system
>
> SYNOPSIS
> /usr/sbin/reboot [-dlnqf] [-e BE] [boot_arguments]
> |
>
>
> DESCRIPTION
> The reboot utility restarts the kernel. The kernel is loaded
> into memory by the PROM monitor, which transfers control to
> the loaded kernel.
>
>
> On x86 systems, when the -f flag is specified, the running
> |
> kernel will load the next kernel into memory, then transfers |
> control to the loaded kernel. |
>
>
> Although reboot can be run by the super-user at any time,
> shutdown(1M) is normally used first to warn all users logged
> in of the impending loss of service. See shutdown(1M) for
> details.
>
>
> The reboot utility performs a sync(1M) operation on the
> disks, and then a multi-user reboot is initiated. See
> init(1M) for details. On x86 systems, reboot may also update
> the boot archive as needed to ensure a successful reboot.
>
>
> The reboot utility normally logs the reboot to the system
> log daemon, syslogd(1M), and places a shutdown record in the
> login accounting file /var/adm/wtmpx. These actions are
> inhibited if the -n or -q options are present.
>
>
> Normally, the system reboots itself at power-up or after
> crashes.
>
> OPTIONS
> The following options are supported:
>
> -d Force a system crash dump before rebooting. See
> dumpadm(1M) for information on configuring system
> crash dumps.
>
>
> -e If -f is present, reboot to the specified boot |
> environment. |
> |
> |
> -f Fast reboot bypassing firmware and boot loader. The
> |
> new kernel will be loaded into memory by the running
> |
> kernel, and control will be transferred to the loaded |
> kernel. If disk or kernel arguments are specified, |
> they must be specified before other boot arguments. |
> See Example 3 for details. |
> |
> Currently only available on x86 system. |
> |
>
> -l Suppress sending a message to the system log daemon,
> syslogd(1M) about who executed reboot.
>
>
> -n Avoid calling sync(2) and do not log the reboot to
> syslogd(1M) or to /var/adm/wtmpx. The kernel still
> attempts to sync filesystems prior to reboot, except
> if the -d option is also present. If -d is used with
> -n, the kernel does not attempt to sync filesystems.
>
>
> -q Quick. Reboot quickly and ungracefully, without shut-
> ting down running processes first.
>
>
> OPERANDS
> The following operands are supported:
>
> boot_arguments An optional boot_arguments specifies argu-
> ments to the uadmin(2) function that are
> passed to the boot program and kernel upon
> restart. The form and list of arguments is
> described in the boot(1M) and kernel(1M)
> man pages.. If the arguments are speci-
> fied, whitespace between them is replaced
> by single spaces unless the whitespace is
> quoted for the shell. If the
> boot_arguments begin with a hyphen, they
> must be preceded by the -- delimiter (two
> hyphens) to denote the end of the reboot
> argument list.
>
>
> EXAMPLES
> Example 1 Passing the -r and -v Arguments to boot
>
>
> In the following example, the delimiter -- (two hyphens)
> must be used to separate the options of reboot from the
> arguments of boot(1M).
>
>
> example# reboot -dl -- -rv
>
>
>
> Example 2 Rebooting Using a Specific Disk and Kernel
>
>
> The following example reboots using a specific disk and ker-
> nel.
>
>
> example# reboot disk1 kernel.test/unix
>
>
> Example 3 Fast reboot |
> |
> Check if all the drivers on the system are fast reboot capable. |
> |
> example# reboot -f dryrun |
> |
> Rebooting to another UFS root disk. |
> |
> example# reboot -f -- '/dev/dsk/c1d0s0'
> |
> |
> Rebooting to another ZFS root pool. |
> |
> example# reboot -f -- 'rootpool/root1' |
> |
> Rebooting to "mykernel" on the same disk with "-k" option.
> |
> |
> example# reboot -f -- '/platform/i86pc/mykernel/amd64/unix -k' |
> |
> Rebooting to "mykernel" off another root disk mounted on /mnt. |
> |
> example# reboot -f -- '/mnt/platform/i86pc/mykernel/amd64/unix -k' |
> |
> Rebooting to "/platform/i86pc/kernel/$ISADIR/unix" on another boot
> |
> environment named "second_root". |
> |
> example# reboot -f -e second_root |
> |
> Rebooting to the same kernel with "-kv" options. |
> |
> example# reboot -f -- '-kv' |
>
>
> FILES
> /var/adm/wtmpx login accounting file
>
>
> ATTRIBUTES
>
> System Administration Commands reboot(1M)
>
>
> See attributes(5) for descriptions of the following attri-
> butes:
>
> ____________________________________________________________
> | ATTRIBUTE TYPE | ATTRIBUTE VALUE |
> |_____________________________|_____________________________|
> | Availability | SUNWcsu |
> |_____________________________|_____________________________|
>
>
> SEE ALSO
> mdb(1), boot(1M), dumpadm(1M), fsck(1M), halt(1M), init(1M),
> kernel(1M), shutdown(1M), sync(1M), syslogd(1M), sync(2),
> uadmin(2), reboot(3C), attributes(5)
>
> NOTES
> The reboot utility does not execute the scripts in
> /etc/rcnum.d or execute shutdown actions in inittab(4). To
> ensure a complete shutdown of system services, use
> shutdown(1M) or init(1M) to reboot a Solaris system.
>
>
> A.3 Man page for uadmin(2)
>
>
> System Calls uadmin(2)
>
>
> NAME
> uadmin - administrative control
>
> SYNOPSIS
> #include <sys/uadmin.h>
>
> int uadmin(int cmd, int fcn, uintptr_t mdep);
>
>
> DESCRIPTION
> The uadmin() function provides control for basic administra-
> tive functions. This function is tightly coupled to the sys-
> tem administrative procedures and is not intended for gen-
> eral use. The argument mdep is provided for machine-
> dependent use and is not defined here. It should be initial-
> ized to NULL if not used.
>
>
> As specified by cmd, the following commands are available:
>
> A_SHUTDOWN The system is shut down. All user processes
> are killed, the buffer cache is flushed, and
> the root file system is unmounted. The action
> to be taken after the system has been shut
> down is specified by fcn. The functions are
> generic; the hardware capabilities vary on
> specific machines.
>
> AD_HALT Halt the processor(s).
>
>
> AD_POWEROFF Halt the processor(s) and turn
> off the power.
>
>
> AD_BOOT Reboot the system, using the
> kernel file.
>
>
> AD_IBOOT Interactive reboot; user is
> prompted for bootable program
> name.
>
>
> AD_FASTREBOOT Bypass BIOS and boot loader |
> |
> AD_FASTREBOOT_DRYRUN Fast reboot dry run to |
> check whether a system supports |
> fast reboot. |
>
>
> A_REBOOT The system stops immediately without any
> further processing. The action to be taken
> next is specified by fcn as above.
>
>
> A_DUMP The system is forced to panic immediately
> without any further processing and a crash
> dump is written to the dump device (see
> dumpadm(1M)). The action to be taken next is
> specified by fcn, as above.
>
>
> A_REMOUNT The root file system is mounted again after
> having been fixed. This should be used only
> during the startup process.
>
>
> A_FREEZE Suspend the whole system. The system state is
> preserved in the state file. The following
> subcommands, specified by fcn, are available.
>
> AD_SUSPEND_TO_DISK Save the system
> state to the state
> file. This subcom-
> mand is equivalent
> to ACPI state S4.
>
>
> AD_CHECK_SUSPEND_TO_DISK Check if your sys-
> tem supports
> suspend to disk.
> Without performing
> a system
> suspend/resume,
> this subcommand
> checks if this
> feature is
> currently avail-
> able on your sys-
> tem.
>
>
> AD_SUSPEND_TO_RAM Save the system
> state to memory
> This subcommand is
> equivalent to ACPI
> state S3.
>
>
> AD_CHECK_SUSPEND_TO_RAM Check if your sys-
> tem supports
> suspend to memory.
> Without performing
> a system
> suspend/resume,
> this subcommand
> checks if this
> feature is
> currently
> available on your
> system.
>
> The following subcommands, specified by fcn,
> are obsolete and might be removed in a subse-
> quent release:
>
> AD_COMPRESS Save the system state to the
> state file with compression of
> data. This subcommand has been
> replaced by AD_SUSPEND_TO_DISK,
> which should be used instead.
>
>
> AD_CHECK Check if your system supports
> suspend and resume. Without
> performing a system
> suspend/resume, this command
> checks if this feature is
> currently available on your
> system. This subcommand has
> been replaced by
> AD_CHECK_SUSPEND_TO_DISK, which
> should be used instead.
>
>
> AD_FORCE Force AD_COMPRESS even when
> threads of user applications
> are not suspendable. This sub-
> command should never be used,
> as it might result in undefined
> behavior.
>
>
>
> RETURN VALUES
> Upon successful completion, the value returned depends on
> cmd as follows:
>
> A_SHUTDOWN Never returns.
>
>
> A_REBOOT Never returns.
>
>
> A_FREEZE 0 upon resume.
>
>
> A_REMOUNT 0.
>
> Otherwise, -1 is returned and errno is set to indicate the
> error.
>
> ERRORS
> The uadmin() function will fail if:
>
> EBUSY Suspend is already in progress.
>
>
> EINVAL The cmd argument is invalid.
>
>
> ENOMEM Suspend/resume ran out of physical memory.
>
>
> ENOSPC Suspend/resume could not allocate enough space on
> the root file system to store system information.
>
>
> ENOTSUP Suspend/resume is not supported on this platform
> or the command specified by cmd is not allowed.
>
>
> ENXIO Unable to successfully suspend system.
>
>
> EPERM The {PRIV_SYS_CONFIG} privilege is not asserted
> in the effective set of the calling process.
>
>
> ATTRIBUTES
> See attributes(5) for descriptions of the following attri-
> butes:
>
>
>
> ____________________________________________________________
> | ATTRIBUTE TYPE | ATTRIBUTE VALUE |
> |_____________________________|_____________________________|
> | Interface Stability | See below. |
> |_____________________________|_____________________________|
>
>
>
> The A_FREEZE command and its subcommands are Committed.
>
> SEE ALSO
> dumpadm(1M), kernel(1M), uadmin(1M), attributes(5),
> privileges(5)
>
>
>
> A.4 Man page for dev_ops(9S)
>
>
> Data Structures for Drivers dev_ops(9S)
>
>
>
> NAME
> dev_ops - device operations structure
>
> SYNOPSIS
> #include <sys/conf.h>
> #include <sys/devops.h>
>
>
> INTERFACE LEVEL
> Solaris DDI specific (Solaris DDI).
>
> DESCRIPTION
> dev_ops contains driver common fields and pointers to the
> bus_ops and cb_ops(9S).
>
>
> Following are the device functions provided in the device
> operations structure. All fields must be set at compile
> time.
>
> devo_rev Driver build version. Set this to
> DEVO_REV.
>
>
> devo_refcnt Driver reference count. Set this to 0.
>
>
> devo_getinfo Get device driver information (see
> getinfo(9E)).
>
>
> devo_identify This entry point is obsolete. Set to
> nulldev.
>
>
> devo_probe Probe device. See probe(9E).
>
>
> devo_attach Attach driver to dev_info. See attach(9E).
>
>
> devo_detach Detach/prepare driver to unload. See
> detach(9E).
>
>
> devo_reset Reset device. (Not supported in this
> release.) Set this to nodev.
>
>
> devo_cb_ops Pointer to cb_ops(9S) structure for leaf
> drivers.
>
>
> devo_bus_ops Pointer to bus operations structure for
> nexus drivers. Set this to NULL if this is
> for a leaf driver.
>
>
> devo_power Power a device attached to system. See
> power(9E).
>
>
> devo_quiesce Quiesce a device attached to system. See |
> quiesce(9E). Can be set to ddi_no_quiesce if |
> the device does not generate interrupts or |
> perform DMA. |
>
> STRUCTURE MEMBERS
> int devo_rev;
> int devo_refcnt;
> int (*devo_getinfo)(dev_info_t *dip,
> ddi_info_cmd_t infocmd, void *arg, void **result);
> int (*devo_identify)(dev_info_t *dip);
> int (*devo_probe)(dev_info_t *dip);
> int (*devo_attach)(dev_info_t *dip,
> ddi_attach_cmd_t cmd);
> int (*devo_detach)(dev_info_t *dip,
> ddi_detach_cmd_t cmd);
> int (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd);
> struct cb_ops *devo_cb_ops;
> struct bus_ops *devo_bus_ops;
> int (*devo_power)(dev_info_t *dip, int component, int
> level);
> int (*devo_quiesce)(dev_info_t *dip, |
> ddi_quiesce_cmd_t cmd, void *arg); |
>
>
> SEE ALSO
> attach(9E), detach(9E), getinfo(9E), probe(9E), power(9E), |
> quiesce(9E), nodev(9F) |
>
>
>