I am sponsoring the following fast-track on behalf of Sherry
Moore, with timeout set for one week, June 20 2008.
Supplemental documents in the materials directory include more
detailed design and implementation specs and man page updates
including diffs.
The project requests minor binding only.
-jg
1. Introduction
1.1. Project/Component Working Name:
Solaris Fast Reboot
1.2. Name of Document Author/Supplier:
Sherry Q. Moore
1.3. Date of This Document:
5/29/2008
4. Technical Description
4.1 Introduction
Solaris has always strived to be the most reliable and available
operating system. Many technologies have been invented to achieve
this goal, the notable ones include Dynamic Reconfiguration (DR),
Fault Management Architecture (FMA), SMF, ZFS, just to name a few.
The objectives of all these projects are to keep the systems up and
running correctly in the face of unexpected hardware and software
failures with as little down time as possible.
System boot/reboot time is considered system down time. The less
time a system spends in the boot phase, the more useful work it can
do. High availability is extremely important to most, if not all,
of our customers. Shorter reboot time also reduces the test
turnaround time, thus improves developers' productivity.
4.2 Background
The Solaris boot and reboot path involves the following basic
steps:
On x86 systems:
(Hardware reset) -> BIOS -> grub -> dboot -> kernel
On SPARC systems:
(Hardware reset) -> POST -> OBP -> dboot -> kernel
On x86 systems, upon startup or reset, the BIOS code performs
hardware testing and initialization, then jumps to "grub" the boot
loader. Grub loads dboot, unix text, data and the boot archive
into memory, then calls dboot. dboot does necessary
initialization, such as building the initial page tables, or
loading kernel text and data to a different location, then jumps to
the kernel.
As computer systems become more complex, the time they spend in the
BIOS/POST phase to test and initialize hardware gets longer. In
the next 12 months we expect to see x86 systems with 1TB of
memory. Memory initialization alone will take over 1/2 hour. It
becomes more and more desirable to short circuit the reboot path so
that the firmware and bootloaders can be bypassed.
The fast reboot code will act as an in-kernel boot loader that
loads the kernel into memory and switches to it. The new kernel in
the context of this write-up includes the dboot that gets tacked on
during build time.
The goal of the Solaris Fast Reboot project is to get to login
prompt from "rebooting..." within seconds (assuming boot archive
has been updated).
The Solaris implementation will support systems with arbitrarily
large amount of memory, and provide flexibility to reboot to 32-bit
or 64-bit kernels.
4.3 Interface Table
INTERFACE COMMITMENT LEVEL COMMENT
reboot -f (1M) Committed To initiate a fast reboot.
reboot -f -e (1M) Committed To fast reboot to a different BE.
uadmin(2) Committed Added AD_FASTBOOT and
AD_FASTBOOT_DRYRUN to facilitate
fast reboot.
quiesce(9E) Committed To quiesce a device.
ddi_no_quiesce(9F) Committed Returns DDI_SUCCESS. No need to quiesce.
ddi_quiesce_not_\ Committed Returns DDI_FAILURE. Quiesce needed
supported(9F) but not implemented.
dev_ops(9S) Committed Added devo_quiesce ops for
quiescing devices.
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
A. Man pages
A.1 quiesce(9E): new man page
A.2 reboot(1M)
A.3 uadmin(2)
A.4 dev_ops(9S)
A.1 Man page for quiesce(9E)
Man pages for ddi_no_quiesce(9F) and ddi_quiesce_not_supported(9F)
will be links to the man page for quiesce(9E).
Driver Entry Points quiesce(9E)
NAME
quiesce - quiesce a device
SYNOPSIS
#include <sys/ddi.h>
#include <sys/sunddi.h>
int prefix quiesce(dev_info_t dip, ddi_quiesce_cmd_t cmd, void *arg);
int ddi_no_quiesce(dev_info_t *dip, ddi_quiesce_cmd_t cmd, void *arg);
int ddi_quiesce_not_supported(dev_info_t *dip, ddi_quiesce_cmd_t cmd,
void *arg);
INTERFACE LEVEL
Solaris DDI specific (Solaris DDI)
PARAMETERS
dip A pointer to the device's dev_info structure.
cmd Type of quiesce operation. Currently only DDI_QUIESCE
is supported.
arg Argument to the quiesce routine if needed. Can be set
to NULL.
DESCRIPTION
The quiesce() function quiesces a device so that it will no longer
generate interrupts or modify or access memory. It should reset
the device to a hardware state from which it can be correctly
configured by the driver's attach() routine without a system power
cycle or being configured by the firmware. For devices that come
with factory default settings, drivers must also restore such
settings in its quiesce() routine.
DDI_QUIESCE
If cmd is set to DDI_QUIESCE, quiesce() is used to stop devices
from generating interrupts or modify or access memory. One such
use case is Fast Reboot where firmware is bypassed when booting to
a new OS image.
The quiesce() function will be called once for each instance of
the device for which there has been a successful attach(). The
system guarantees that the function will only be called for a
particular dev_info node after a successful attach(9E) of that
device. The system is not single-threaded when quiesce() is
called, so the driver must ensure that concurrent accesses to the
device when quiesce() is invoked is correctly coordinated. The
driver can choose to drop outstanding I/O instead of waiting for
them to complete as long as it can guarantee on disk data
integrity. The driver must cancel any outstanding timeouts and
remove outstanding tasks from taskqs before returning successfully
from quiesce().
If quiesce() determines a particular instance of the device cannot
be quiesced when requested because of some exceptional condition,
quiesce() must return DDI_FAILURE. This should almost never
happen.
For the fast reboot case, if DDI_FAILURE is returned for the
DDI_QUIESCE cmd, regular reboot path will be taken.
If a driver has previously implemented the obsolete reset()
interface, its functionality must be merged into quiesce(). The
driver's reset() routine will no longer be called if an
implementation of quiesce() is present.
The ddi_no_quiesce() function always returns DDI_SUCCESS. It is
used to indicate that a device does not need to be quiesced for
fast reboot.
The ddi_quiesce_not_supported() always returns DDI_FAILURE. It
is used to indicate that the device needs to be quiesced but
the device driver has not implemented the function yet.
RETURN VALUES
DDI_SUCCESS For DDI_QUIESCE, the device has been successfully
quiesced.
DDI_FAILURE The operation failed or the request was not
understood.
CONTEXT
This function is called from kernel context only.
ATTRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| Interface Stability | Committed |
|_____________________________|_____________________________|
SEE ALSO
attach(9E), detach(9E), ddi_add_intr(9F), ddi_map_regs(9F),
pci_config_setup(9F), ddi_no_quiesce(9F),
ddi_quiesce_not_supported(9F), timeout(9F), reboot(1M),
uadmin(1M), uadmin(2)
A.2 Man page for reboot(1M)
System Administration Commands reboot(1M)
NAME
reboot - restart the operating system
SYNOPSIS
/usr/sbin/reboot [-dlnqf] [-e BE] [boot_arguments] |
DESCRIPTION
The reboot utility restarts the kernel. The kernel is loaded
into memory by the PROM monitor, which transfers control to
the loaded kernel.
On x86 systems, when the -f flag is specified, the running |
kernel will load the next kernel into memory, then transfers |
control to the loaded kernel. |
Although reboot can be run by the super-user at any time,
shutdown(1M) is normally used first to warn all users logged
in of the impending loss of service. See shutdown(1M) for
details.
The reboot utility performs a sync(1M) operation on the
disks, and then a multi-user reboot is initiated. See
init(1M) for details. On x86 systems, reboot may also update
the boot archive as needed to ensure a successful reboot.
The reboot utility normally logs the reboot to the system
log daemon, syslogd(1M), and places a shutdown record in the
login accounting file /var/adm/wtmpx. These actions are
inhibited if the -n or -q options are present.
Normally, the system reboots itself at power-up or after
crashes.
OPTIONS
The following options are supported:
-d Force a system crash dump before rebooting. See
dumpadm(1M) for information on configuring system
crash dumps.
-e If -f is present, reboot to the specified boot |
environment. |
|
|
-f Fast reboot bypassing firmware and boot loader. The |
new kernel will be loaded into memory by the running |
kernel, and control will be transferred to the loaded |
kernel. If disk or kernel arguments are specified, |
they must be specified before other boot arguments. |
See Example 3 for details. |
|
Currently only available on x86 system. |
|
-l Suppress sending a message to the system log daemon,
syslogd(1M) about who executed reboot.
-n Avoid calling sync(2) and do not log the reboot to
syslogd(1M) or to /var/adm/wtmpx. The kernel still
attempts to sync filesystems prior to reboot, except
if the -d option is also present. If -d is used with
-n, the kernel does not attempt to sync filesystems.
-q Quick. Reboot quickly and ungracefully, without shut-
ting down running processes first.
OPERANDS
The following operands are supported:
boot_arguments An optional boot_arguments specifies argu-
ments to the uadmin(2) function that are
passed to the boot program and kernel upon
restart. The form and list of arguments is
described in the boot(1M) and kernel(1M)
man pages.. If the arguments are speci-
fied, whitespace between them is replaced
by single spaces unless the whitespace is
quoted for the shell. If the
boot_arguments begin with a hyphen, they
must be preceded by the -- delimiter (two
hyphens) to denote the end of the reboot
argument list.
EXAMPLES
Example 1 Passing the -r and -v Arguments to boot
In the following example, the delimiter -- (two hyphens)
must be used to separate the options of reboot from the
arguments of boot(1M).
example# reboot -dl -- -rv
Example 2 Rebooting Using a Specific Disk and Kernel
The following example reboots using a specific disk and ker-
nel.
example# reboot disk1 kernel.test/unix
Example 3 Fast reboot |
|
Check if all the drivers on the system are fast reboot capable. |
|
example# reboot -f dryrun |
|
Rebooting to another UFS root disk. |
|
example# reboot -f -- '/dev/dsk/c1d0s0' |
|
Rebooting to another ZFS root pool. |
|
example# reboot -f -- 'rootpool/root1' |
|
Rebooting to "mykernel" on the same disk with "-k" option. |
|
example# reboot -f -- '/platform/i86pc/mykernel/amd64/unix -k' |
|
Rebooting to "mykernel" off another root disk mounted on /mnt. |
|
example# reboot -f -- '/mnt/platform/i86pc/mykernel/amd64/unix -k' |
|
Rebooting to "/platform/i86pc/kernel/$ISADIR/unix" on another boot |
environment named "second_root". |
|
example# reboot -f -e second_root |
|
Rebooting to the same kernel with "-kv" options. |
|
example# reboot -f -- '-kv' |
FILES
/var/adm/wtmpx login accounting file
ATTRIBUTES
System Administration Commands reboot(1M)
See attributes(5) for descriptions of the following attri-
butes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| Availability | SUNWcsu |
|_____________________________|_____________________________|
SEE ALSO
mdb(1), boot(1M), dumpadm(1M), fsck(1M), halt(1M), init(1M),
kernel(1M), shutdown(1M), sync(1M), syslogd(1M), sync(2),
uadmin(2), reboot(3C), attributes(5)
NOTES
The reboot utility does not execute the scripts in
/etc/rcnum.d or execute shutdown actions in inittab(4). To
ensure a complete shutdown of system services, use
shutdown(1M) or init(1M) to reboot a Solaris system.
A.3 Man page for uadmin(2)
System Calls uadmin(2)
NAME
uadmin - administrative control
SYNOPSIS
#include <sys/uadmin.h>
int uadmin(int cmd, int fcn, uintptr_t mdep);
DESCRIPTION
The uadmin() function provides control for basic administra-
tive functions. This function is tightly coupled to the sys-
tem administrative procedures and is not intended for gen-
eral use. The argument mdep is provided for machine-
dependent use and is not defined here. It should be initial-
ized to NULL if not used.
As specified by cmd, the following commands are available:
A_SHUTDOWN The system is shut down. All user processes
are killed, the buffer cache is flushed, and
the root file system is unmounted. The action
to be taken after the system has been shut
down is specified by fcn. The functions are
generic; the hardware capabilities vary on
specific machines.
AD_HALT Halt the processor(s).
AD_POWEROFF Halt the processor(s) and turn
off the power.
AD_BOOT Reboot the system, using the
kernel file.
AD_IBOOT Interactive reboot; user is
prompted for bootable program
name.
AD_FASTREBOOT Bypass BIOS and boot loader |
|
AD_FASTREBOOT_DRYRUN Fast reboot dry run to |
check whether a system supports |
fast reboot. |
A_REBOOT The system stops immediately without any
further processing. The action to be taken
next is specified by fcn as above.
A_DUMP The system is forced to panic immediately
without any further processing and a crash
dump is written to the dump device (see
dumpadm(1M)). The action to be taken next is
specified by fcn, as above.
A_REMOUNT The root file system is mounted again after
having been fixed. This should be used only
during the startup process.
A_FREEZE Suspend the whole system. The system state is
preserved in the state file. The following
subcommands, specified by fcn, are available.
AD_SUSPEND_TO_DISK Save the system
state to the state
file. This subcom-
mand is equivalent
to ACPI state S4.
AD_CHECK_SUSPEND_TO_DISK Check if your sys-
tem supports
suspend to disk.
Without performing
a system
suspend/resume,
this subcommand
checks if this
feature is
currently avail-
able on your sys-
tem.
AD_SUSPEND_TO_RAM Save the system
state to memory
This subcommand is
equivalent to ACPI
state S3.
AD_CHECK_SUSPEND_TO_RAM Check if your sys-
tem supports
suspend to memory.
Without performing
a system
suspend/resume,
this subcommand
checks if this
feature is
currently
available on your
system.
The following subcommands, specified by fcn,
are obsolete and might be removed in a subse-
quent release:
AD_COMPRESS Save the system state to the
state file with compression of
data. This subcommand has been
replaced by AD_SUSPEND_TO_DISK,
which should be used instead.
AD_CHECK Check if your system supports
suspend and resume. Without
performing a system
suspend/resume, this command
checks if this feature is
currently available on your
system. This subcommand has
been replaced by
AD_CHECK_SUSPEND_TO_DISK, which
should be used instead.
AD_FORCE Force AD_COMPRESS even when
threads of user applications
are not suspendable. This sub-
command should never be used,
as it might result in undefined
behavior.
RETURN VALUES
Upon successful completion, the value returned depends on
cmd as follows:
A_SHUTDOWN Never returns.
A_REBOOT Never returns.
A_FREEZE 0 upon resume.
A_REMOUNT 0.
Otherwise, -1 is returned and errno is set to indicate the
error.
ERRORS
The uadmin() function will fail if:
EBUSY Suspend is already in progress.
EINVAL The cmd argument is invalid.
ENOMEM Suspend/resume ran out of physical memory.
ENOSPC Suspend/resume could not allocate enough space on
the root file system to store system information.
ENOTSUP Suspend/resume is not supported on this platform
or the command specified by cmd is not allowed.
ENXIO Unable to successfully suspend system.
EPERM The {PRIV_SYS_CONFIG} privilege is not asserted
in the effective set of the calling process.
ATTRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| Interface Stability | See below. |
|_____________________________|_____________________________|
The A_FREEZE command and its subcommands are Committed.
SEE ALSO
dumpadm(1M), kernel(1M), uadmin(1M), attributes(5),
privileges(5)
A.4 Man page for dev_ops(9S)
Data Structures for Drivers dev_ops(9S)
NAME
dev_ops - device operations structure
SYNOPSIS
#include <sys/conf.h>
#include <sys/devops.h>
INTERFACE LEVEL
Solaris DDI specific (Solaris DDI).
DESCRIPTION
dev_ops contains driver common fields and pointers to the
bus_ops and cb_ops(9S).
Following are the device functions provided in the device
operations structure. All fields must be set at compile
time.
devo_rev Driver build version. Set this to
DEVO_REV.
devo_refcnt Driver reference count. Set this to 0.
devo_getinfo Get device driver information (see
getinfo(9E)).
devo_identify This entry point is obsolete. Set to
nulldev.
devo_probe Probe device. See probe(9E).
devo_attach Attach driver to dev_info. See attach(9E).
devo_detach Detach/prepare driver to unload. See
detach(9E).
devo_reset Reset device. (Not supported in this
release.) Set this to nodev.
devo_cb_ops Pointer to cb_ops(9S) structure for leaf
drivers.
devo_bus_ops Pointer to bus operations structure for
nexus drivers. Set this to NULL if this is
for a leaf driver.
devo_power Power a device attached to system. See
power(9E).
devo_quiesce Quiesce a device attached to system. See |
quiesce(9E). Can be set to ddi_no_quiesce if |
the device does not generate interrupts or |
perform DMA. |
STRUCTURE MEMBERS
int devo_rev;
int devo_refcnt;
int (*devo_getinfo)(dev_info_t *dip,
ddi_info_cmd_t infocmd, void *arg, void **result);
int (*devo_identify)(dev_info_t *dip);
int (*devo_probe)(dev_info_t *dip);
int (*devo_attach)(dev_info_t *dip,
ddi_attach_cmd_t cmd);
int (*devo_detach)(dev_info_t *dip,
ddi_detach_cmd_t cmd);
int (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd);
struct cb_ops *devo_cb_ops;
struct bus_ops *devo_bus_ops;
int (*devo_power)(dev_info_t *dip, int component, int
level);
int (*devo_quiesce)(dev_info_t *dip, |
ddi_quiesce_cmd_t cmd, void *arg); |
SEE ALSO
attach(9E), detach(9E), getinfo(9E), probe(9E), power(9E), |
quiesce(9E), nodev(9F) |