Here are the updated materials for this fast-track.  All updates
are marked with change bars.  The updates include input from FWARC.
I've also bumped the timeout for a week, to 8/16/2007.


-jg



#ident  "@(#)design     1.4     07/08/09 SMI"

New Solaris SPARC Boot Architecture

Summary

The Solaris SPARC bootstrap process is being redesigned, both to
increase the commonality with Solaris x86, and to enable ITUs (install
time updates) on SPARC.  Secondary goals are to combine inetboot and
wanboot into a single network boot architecture, and to provide a
simplified architecture for disk filesystems other than UFS (e.g,
ZFS).

The Solaris x86 newboot project has already been delivered to Nevada
and S10u1.  That project was described by PSARC 2004/454.  This
project is a follow-on to that project.

This design specification only covers the architecture for Solaris
SPARC.  For details of the Solaris x86 design, see PSARC 2004/454.


1. Introduction
---------------

The Solaris boot process was designed in early 1990's for desktops
which were tiny by today's standards. On the one hand, the design was
constrained by a relatively small amount of system memory. On the
other hand, the design took advantage of the presence of Open Boot
Prom (OBP) on all Sun platforms. The resulting implementation involves
a complex sequence of control handoff between kernel and OBP to load a
minimum amount of text and data from the root device into memory.


2. Motivation
-------------

A change in the Solaris boot architecture is motivated by the
following problems:


2.1 Supporting new hardware

Solaris SPARC does not support ITUs, so a platform that requires any
Solaris changes requires a full Solaris update release.  This
requirement causes pain for both the Solaris and SPARC HW groups, as
the two must fight over Update schedules for almost every HW platform.


2.2 Commonality with Solaris x86

The newboot project on Solaris x86 has made both the administrative
and boot processes visibly different between Solaris SPARC and Solaris
x86.  This is in part due to the addition of a boot archive to Solaris
x86.  It is in Sun's interest that the look and feel of Solaris on
different platforms be as common as feasible.


2.3 Ease of booting different FS types

The current boot architecture requires multiple filesystem readers,
making adding new ones a difficult process.  The goal of this project
is to only require one phase to read the root FS before the kernel
mounts it.  This will enable new filesystem types like ZFS to be more
easily supported as root filesystems.


2.5 Common network boot process

The Solaris SPARC network boot process is very different between
booting over a LAN versus booting over a WAN.  The WAN process boots
via a miniroot that looks much like how Solaris x86 installs over a
network, but there is no commonality in how the miniroots are created
or administered.  This project will unify SPARC network booting around
a single architecture.


3. Proposed Architecture
------------------------

3.1 Boot Phase Independence

The primary design center is to make the phases of the boot process be
independent of each other.  This allows the addition of new features
(e.g., new filesystem types) without requiring changes to multiple
parts of the boot chain.

This project envisions 4 phases of booting on SPARC machines:

OBP

The OBP phase of boot is unchanged.  In fact, it's a requirement that
SPARC newboot not require new OBP functionality.  OBP will continue to
load and execute a booter from a disk or network device.

booter

The booter phase is responsible for reading in the boot archive and
executing it.  This is the only phase of the boot process that
requires knowledge of the root filesystem format.

ramdisk

The ramdisk is a boot archive containing either kernel modules or an
install miniroot.  This boot archive is the same boot archive as is
used on Solaris x86.  Its FS format is private to itself.  i.e.,
neither the booter nor the kernel needs to know whether the archive is
HSFS or UFS (or ZFS for that matter).  The ramdisk will extract the
kernel image from the boot archive and execute it.

In order to minimize the size of the ramdisk, in particular the
install miniroot, which must reside in memory, the contents of the
miniroot will be compressed. This compression is on a per file level
and is implemented within the filesystem. In order to create
compressed files a userland utility is used that simply compresses the
file in place, the file is then marked as compressed via the
_FIO_COMPRESSED (private) ioctl in the filesystem metadata.

kernel

The final stage is the kernel.  The kernel extracts the rest of the
primary modules from the boot archive, initializes itself, mounts the
real root file system, and throws away the boot archive.  This process
is also the same as on Solaris x86.


3.2 Removal of bootops (and ufsboot)

The bootops vector on SPARC originally existed to support platforms
with either OBP or SUNMON FW.  The second stage booter (e.g., ufsboot)
presented a common set of operations to the kernel so the kernel
didn't need to know either what prom version was running or what
filesystem type root was.  The last Solaris release to support a
SUNMON platform was Solaris 2.4, and the last one to support
pre-ieee1275 OBP was Solaris 7, so it's time for the bootops to go
gently into that good night.

3.2.3 unix and krtld combined

The Xen project (PSARC 2006/260) has combined the unix and krtld
modules in order to enable booting from either BIOS or the Xen
hypervisor.  Since combining these two makes the unix ELF header far
simpler to parse and load, this project has adapted this change for
SPARC.


3.2.4 boot properties                                                       |
                                                                            |
The booter will publish properties previously passed to the kernel via      |
bootops in the OBP /chosen node.                                            |
                                                                            |
Properties for both disk and network boot:                                  |
                                                                            |
"archive-fstype"                                                            |
        String, encoded with encode-string, which contains the name of      |
        the boot archive filesystem type (e.g., "hsfs")                     |
                                                                            |
"bootarchive"                                                               |
        String, encoded with encode-string, which contains the path to      |
        the boot archive ramdisk device (e.g., "/ramdisk-root")             |
                                                                            |
"bootfs"                                                                    |
        Integer, encoded with encode-int, which contains an ihandle         |
        of the package whence the kernel was loaded.                        |
                                                                            |
"elfheader-address"                                                         |
        Integer, encoded with encode-int, which contains the virtual        |
        address of the kernel ELF image.                                    |
                                                                            |
"elfheader-length"                                                          |
        Integer, encoded with encode-int, which contains the size of        |
        the kernel ELF image.                                               |
                                                                            |
"fstype"                                                                    |
        String, encoded with encode-string, which contains the name         |
        of the root filesystem type (e.g., "ufs")                           |
                                                                            |
"fs-package"                                                                |
        String, encoded with encode-string, which contains the name         |
        of the root file system reader package.                             |
        (e.g., "ufs-file-system")                                           |
                                                                            |
"impl-arch-name"                                                            |
        String, encoded with encode-string, which contains the              |
        platform name.  (e.g., "SUNW,Sun-Blade-1000")                       |
                                                                            |
"whoami"                                                                    |
        String, encoded with encode-string, which contains the file         |
        system name of the kernel.                                          |
        e.g., "/platform/sun4u/kernel/unix")                                |
                                                                            |
Property only for wanboot:                                                  |
                                                                            |
"netdev-path"                                                               |
        String, encoded with encode-string, which contains the path         |
        to the newtwork device the boot archive is loaded from.             |
        (e.g., /pci at 8,700000/network at 5,1")                                
    |
                                                                            |

3.3 Differences from Solaris x86

The primary difference from PSARC 2004/454 is that there is no
dependence on grub. This decision was made for both practical and
functional reasons.  The practical reason is that grub0.95 - which was
used by Solaris x86 - is not available on SPARC.  The functional
reason is that most of the reasons grub was used for Solaris x86
(e.g., eliminate real-mode, eliminate the boot shell, 3rd party device
support) are not applicable to Solaris SPARC.  Making grub work in the
network boot case on SPARC is not a trivial exercise, as grub does not
currently support NFS or HTTP.                                              |

The boot menu provided by grub would be an interesting feature to
provide Solaris SPARC users.  If grub2 were to become available on
SPARC in a stable supportable release it could be incorporated in
booter phase above.  This should not require much change to existing
code since the booter retains its ability to load alternate secondary
booters such as cprboot.

On SPARC the ability to read potentially volatile files directly from       |
the filesystem rather than the archive is exploited to avoid                |
triggering a boot archive check failure if those files get out of           |
date. This mechanism does not exist on x86.                                 |

4 Interfaces
------------

4.1 Interface Exported
-----------------------------------------------------------------------
Name                    Level           Comments
-----------------------------------------------------------------------
Boot files              Evolving
  platform/<platform>/boot_archive      default name of boot archive
                                        (initially may be sun4u/sun4v only)
Boot args
  kernel args           Evolving        see kernel(1M)
  -F <alternate file>   Evolving        alternate booter or boot archive

usr/sbin/bootadm(1M)        Stable
usr/sbin/root_archive(1M)  Stable
boot/solaris/bin
  create_ramdisk        Proj. Priv.

Install-time Update                     different user prompts

boot/solaris
    filelist.ramdisk    Proj. Priv.     boot archive content
    filestat.ramdisk    Proj. Priv.     boot archive file status

kernel/fs/dcfs          Proj. Priv.     compression file system
sbin/fiocompress        Proj. Priv.     file compression utility
usr/sbin/fiocompress    Proj. Priv.     link to sbin/fiocompress
usr/include/sys/fs/decomp.h   Proj. Priv.    dcfs header file
_FIO_COMPRESSED         Proj. Priv.     file compression ioctl

usr/platform/sun4[uv]/lib/fs
        hsfs/bootblk    Proj. Priv.     filesystem readers
        zfs/bootblk     Proj. Priv.     filesystem readers

platform/sun4[uv]/ufsboot  Proj. Priv.  removed
kernel/misc/sparcv9/krtld  Proj. Priv.  merged into unix

-----------------------------------------------------------------------


4.2 Interfaces Reimplemented
-----------------------------------------------------------------------
Name                    Level           Comments
-----------------------------------------------------------------------
reboot(1M)              Stable?         update boot archive as needed
halt(1M)
poweroff(1M)
shutdown(1M)
init(1M)
pkgadd(1M)
patchadd(1M)

add_install_client      Evolving        modified server setup

smdiskless(1M)          Evolving        setup /tftpboot area


Install                                 consume bootadm and installgrub
Upgrade
Live Upgrade
Flash Install
Net Install
Jumpstart


Release engineering tools
-------------------------
modified miniroot construction




5. User Experience
------------------

5.1 System startup

The system startup changes are not visible to the user.


5.2 Installation and Upgrade

The ITU menu which currently only exists on x86 may be exposed on
sparc as well.

Customers who rely on undocumented non-interfaced implementation
details of add_install_client may need to amend their procedures.

If we deliver the wanboot component, wanboot customers will see a
simplification of the deployment procedures.

5.3 Internal tools

The bfu scripts will be updated to permit developers to transition
from old boot to new boot. For a system initially installed with old
boot, it will be possible to bfu back and forth across the
boundary. For a system initially installed with new boot, bfu back to
old boot is not supported. Booting a glommed kernel is supported, with
the same restrictions regarding the compatibility of userland and
kernel.

If the root file system goes into a non-bootable state, a user may
boot the failsafe archive to perform manual recovery operations. The
failsafe archives contains files normally present in a CD or
netinstall miniroot.

If the boot archive goes into a non-bootable state, a user may bypass
the boot archive and directly boot the kernel with the -F <kernel FS
path> option.


5.4 Coexistence with other OSes

The only other OS of note for SPARC is Linux, and they currently have
no grub plans.  SPARC / Linux boots from a loader called SILO, which
already has a grub-like menu facility.

In general OBP is considered well suited to loading multiple OS's or
versions of OS's from various devices on the system, so no additional
work beyond what is provided by OBP/the system is desirable.


6. Technical Details
--------------------

6.1 OBP phase

This project does not change the OBP phase of SPARC boot; this section
is included for reference only.

When a user types "boot" on an OBP-based system, the device selected -
either from the command line or via the "boot-device" nvram variable -
has its "open" and "load" methods called.  The program loaded by this
process is then executed, and the boot process enters the booter
phase.

6.1.1 Disk

For disk devices, the FW driver usually uses the OBP label package's
"load" method, which parses the VTOC label at the beginning of the
disk to locate the specified partition, then reads sectors 1-15 of
that partition into memory. This area is commonly called the boot
block and usually contains a filesystem reader.

6.1.2 Net

For network devices, the process is slightly different between booting
over a LAN versus booting over a WAN.  In both cases, however, the
prom will download a booter from a boot or install server (inetboot in
this case).

6.1.2.1 LAN boot

When booting over a LAN, the FW uses either RARP and BOOTP or DHCP to
discover its boot or install server.  It then uses TFTP to download
the booter (inetboot in this case).

6.1.2.2 WAN boot

When booting over a WAN, the FW uses either DHCP or nvram properties
to discover its install server, and the router and proxies needed to
connect to it.  It then uses HTTP to download the booter, and may
optionally check the booter's signature with a predefined private key.
For more details, see PSARC 2001/009.


6.2 Booter phase

This phase is derived from the SPARC wanboot ramdisk process (see PSARC
2001/009), and is responsible for reading the boot archive from the
root file system (or install server in wanboot's case) into a ramdisk
device.  It does this by:

1) opening the boot-device (which it found as the "bootpath" property
   in the OBP "/chosen" node)

2) using its file system specific reader to read the boot archive (by
   default, /platform/`uname -m`/boot_archive)

3) creating a ramdisk device in "/ramdisk"

4) creating "bootarchive" and "fstype" properties in                        |
   "/chosen"                                                                |

5) booting the archive (a ramdisk is just another type of disk, so
   executing the boot block area serves this purpose)


6.3 Ramdisk phase

The ramdisk is self-describing in the same sense any disk image is by
virtue of having a filesystem reader in its boot block.  This reader
is over 90% the same as the disk boot block for a given filesystem so
the same program is re-used.  Its job is to load and execute the
kernel from the archive by:

1) opening the boot archive (the "bootarchive" property from the
   previous phase)

2) using its file system specific reader to read the kernel (by
   default, /platform/`uname -i`/kernel/unix)

3) creating "impl-arch-name", "whoami" and "elfheader-address" properties   |

4) executing the kernel


6.4 Kernel phase

When krtld gains control, it mounts the boot archive and loads
additional kernel modules from the boot archive via the
ramdisk. Subsequent kernel initialization procedures remain the same
until after the kernel mounts the root file system.  At that point,
the kernel throws away the boot archive and reclaims the memory it
occupies.  Note that in the install case, the ramdisk actually
contains the root file system, and is not thrown away.  The kernel
ramdisk driver simply takes over control of the ramdisk image.


6.5 Chained booters

When booting from a disk, the booter will support chained booters both
for cprboot (see PSARC 1992/201) and for situations where the file
system reader cannot fit in the boot block.  In a future project,
grub2 can use this facility to add a graphical user menu to the booter
phase.  This facility will not be available when booting from a
network, since the chained booter usually uses the same virtual
address space as the original booter.


6.6 Boot archive management

There are two kinds of boot archive: failsafe and normal.  A failsafe
archive is self-sufficient and bootable by itself.  It is created at
install time and requires no maintenance.  A normal archive shadows a
root filesystem, so it contains all kernel modules, driver.conf files,
and a few configuration files in /etc which are read by the kernel
before root is mounted.  Once the root filesystem is mounted, the
kernel discards the boot archive from memory and file I/O will be
performed against the root device.

By default, the normal archive contains the following files and             |
directories:                                                                |
                                                                            |
etc/dacf.conf                                                               |
etc/devices/devid_cache                                                     |
etc/devices/mdi_scsi_vhci_cache                                             |
etc/devices/mdi_ib_cache                                                    |
etc/cluster/nodeid                                                          |
etc/zfs/zpool.cache                                                         |
kernel                                                                      |
platform                                                                    |
                                                                            |
 Excluding any editable driver.conf files.                                  |
                                                                            |
 While on x86 it also contains the following editable files:                |
                                                                            |
etc/system                                                                  |
etc/name_to_major                                                           |
etc/driver_aliases                                                          |
etc/name_to_sysnum                                                          |
etc/driver_classes                                                          |
etc/path_to_inst                                                            |
                                                                            |
 As well as a number of editable driver.conf files.                         |

The contents under the platform directory will be segregated into
those needed for a sun4u boot archive and those needed for a sun4v
boot archive.  Further per-platform differentiated boot archives may
be considered if that helps us gain faster booting via faster archive
load, trading off a more complicated archive construction process.

If any files in this list (or under directories listed) is updated,
the boot archive must be rebuilt prior to the next reboot for the
modification to take effect. The package and patch tools are updated
to update the boot archive whenever needed. In addition, the boot
archive is updated as necessary on an orderly system shutdown to catch
files modified manually.

The boot archive could be out of sync with the root filesystem if the       |
system panics in the middle of an update, but before archive update is      |
completed. We check for such conditions on every boot before root           |
filesystem is mounted read/write. If an inconsistency is detected, the      |
system will stop in single-user mode, similar to current behavior when      |
fsck fails on the root filesystem.  The recommended recovery method is      |
to boot the failsafe archive and recreate the boot_archive. An expert       |
user may decide to continue booting if the out-of-sync files are not        |
critical.                                                                   |
                                                                            |
There are a number of files in the archive (as listed in                    |
oot/solaris/filelist.safe):                                                 |
                                                                            |
etc/devices/devid_cache                                                     |
etc/devices/mdi_scsi_vhci_cache                                             |
etc/devices/mdi_ib_cache                                                    |
etc/path_to_inst                                                            |
etc/rtc_config                                                              |
etc/zfs/zpool.cache                                                         |
                                                                            |
Each represents a pure cache, and while being up-to-date can improve        |
boot-time, can simply be re-built with no ill-effects on the system.        |
These files being out of sync does not cause an archive check failure.      |
                                                                            |
Further, an out-of-sync state due to editable files that are                |
updated during system administration tasks in the archive is avoided        |
by not placing such files in the archive. These files are read              |
directly from the filesystem by kobj_open() if it fails to find them        |
in the archive. This mechanism only exists on SPARC.                        |
                                                                            |
The bootadm(1M) command will handle the details of archive update and
verification.


6.7 Install and upgrade

Normal install and upgrade is achieved by booting the miniroot from
either CDROM/DVD or from the network. In both cases, the root
filesystem of the miniroot is the ramdisk. This allows the Solaris
boot CD to be ejected without rebooting the system.  The boot archive
contains the entire miniroot.

The construction of the install CD is modified to use an hsfs boot
block. The miniroot is packed into a single file in ufs format, to be
loaded as the ramdisk image.

The setup of the net boot server is also modified. The boot server
will serve a boot strap as well as the ramdisk image which is
downloaded and then booted from.

The netinstall image will be packed using root_archive(1m).

The process for installing the OS to disk remains the same except that
the boot blocks are different and a boot archive must be constructed
prior to booting the install target disk.  The boot archives are
created using bootadm(1M). The rest of code changes comes with
packages and patches and no special treatment is required.


6.8 Diskless clients

Diskless boot is similar to booting the miniroot for net install
except that the root filesystem is on NFS instead of on UFS.


6.9 Install-time Update (ITU)

New-boot (phase 1: x86) reduced the x86 ITU mechanism described in
PSARC 1997/059 to simply adding Solaris binaries (drivers, kernel
modules, commands, libraries, symlinks and the like) to the running
miniroot and then the target install environment. It also extended the
supported media that an ITU could be supplied on to include CD/DVD,
memory sticks and similar devices.

This mechanism will now be made available on sparc as well to allow
platform support to be delivered out of band of a regularly scheduled
release.

If a core kernel component needs to be updated, say unix or something
else that needs to be loaded before an ITU can be added, then a
pre-patched miniroot image needs to be made available (50Mb download)
along with the patch.


References
----------

1. Shudong Zhou         PSARC 2004/454  Solaris Boot Architecture
2. Carl Smith           PSARC 2001/009  WAN-boot
3. Clark Dong           PSARC 1992/201  Checkpoint Resume (Reanimator)
4. Allan McKillop       PSARC 2006/260  Solaris on Xen



Reply via email to