I am sponsoring this project as a fast-track on behalf of Jan
Setje-Eilers and John Johnson.  The case timer is set to 8/7/2007.

The project desires patch/update binding.

(For those who have already seen this, this re-introduction is
to transition the case from closed to open.)


thx,
-jg


#ident  "@(#)design     1.2     07/07/31 SMI"

New Solaris SPARC Boot Architecture

Summary

The Solaris SPARC bootstrap process is being redesigned, both to increase the
commonality with Solaris x86, and to enable ITUs (install time updates) on
SPARC.  Secondary goals are to combine inetboot and wanboot into a single
network boot architecture, and to provide a simplified architecture for disk
filesystems other than UFS (e.g, ZFS).

The Solaris x86 newboot project has already been delivered to Nevada and S10u1.
That project was described by PSARC 2004/454.  This project is a follow-on to
that project.

This design specification only covers the architecture for Solaris SPARC.  For
details of the Solaris x86 design, see PSARC 2004/454.


1. Introduction
---------------

The Solaris boot process was designed in early 1990's for desktops which were
tiny by today's standards. On the one hand, the design was constrained by a
relatively small amount of system memory. On the other hand, the design took
advantage of the presence of Open Boot Prom (OBP) on all Sun platforms. The
resulting implementation involves a complex sequence of control handoff between
kernel and OBP to load a minimum amount of text and data from the root device
into memory.


2. Motivation
-------------

A change in the Solaris boot architecture is motivated by the following
problems:


2.1 Supporting new hardware

Solaris SPARC does not support ITUs, so a platform that requires any Solaris
changes requires a full Solaris update release.  This requirement causes pain
for both the Solaris and SPARC HW groups, as the two must fight over Update
schedules for almost every HW platform.


2.2 Commonality with Solaris x86

The newboot project on Solaris x86 has made both the administrative and boot
processes visibly different between Solaris SPARC and Solaris x86.  This is in
part due to the addition of a boot archive to Solaris x86.  It is in Sun's
interest that the look and feel of Solaris on different platforms be as common
as feasible.


2.3 Ease of booting different FS types

The current boot architecture requires multiple filesystem readers, making 
adding
new ones a difficult process.  The goal of this project is to only require
one phase to read the root FS before the kernel mounts it.  This will enable
new filesystem types like ZFS to be more easily supported as root filesystems.


2.5 Common network boot process

The Solaris SPARC network boot process is very different between booting over a
LAN versus booting over a WAN.  The WAN process boots via a miniroot that looks
much like how Solaris x86 installs over a network, but there is no commonality
in how the miniroots are created or administered.  This project will unify
SPARC network booting around a single architecture.


3. Proposed Architecture
------------------------

3.1 Boot Phase Independence

The primary design center is to make the phases of the boot process be
independent of each other.  This allows the addition of new features (e.g., new
filesystem types) without requiring changes to multiple parts of the boot chain.

This project envisions 4 phases of booting on SPARC machines:

OBP

The OBP phase of boot is unchanged.  In fact, it's a requirement that SPARC
newboot not require new OBP functionality.  OBP will continue to load and
execute a booter from a disk or network device.

booter

The booter phase is responsible for reading in the boot archive and executing
it.  This is the only phase of the boot process that requires knowledge of the
root filesystem format.

ramdisk

The ramdisk is a boot archive containing either kernel modules or an install
miniroot.  This boot archive is the same boot archive as is used on Solaris x86.
Its FS format is private to itself.  i.e., neither the booter nor the kernel
needs to know whether the archive is HSFS or UFS (or ZFS for that matter).  The
ramdisk will extract the kernel image from the boot archive and execute it.

In order to minimize the size of the ramdisk, in particular the 
install miniroot, which must reside in memory, the contents of the 
miniroot will be compressed. This compression is on a per file level 
and is implemented within the filesystem. In order to create 
compressed files a userland utility is used that simply compresses the 
file in place, the file is then marked as compressed via the 
_FIO_COMPRESSED (private) ioctl in the filesystem metadata. 

kernel

The final stage is the kernel.  The kernel extracts the rest of the primary
modules from the boot archive, initializes itself, mounts the real root file
system, and throws away the boot archive.  This process is also the same as on
Solaris x86.


3.2 Removal of bootops (and ufsboot)

The bootops vector on SPARC originally existed to support platforms with either
OBP or SUNMON FW.  The second stage booter (e.g., ufsboot) presented a common
set of operations to the kernel so the kernel didn't need to know either what
prom version was running or what filesystem type root was.  The last Solaris
release to support a SUNMON platform was Solaris 2.4, and the last one to
support pre-ieee1275 OBP was Solaris 7, so it's time for the bootops to go
gently into that good night.

3.2.3 unix and krtld combined

The Xen project (PSARC 2006/260) has combined the unix and krtld modules in
order to enable booting from either BIOS or the Xen hypervisor.  Since combining
these two makes the unix ELF header far simpler to parse and load, this project
has adapted this change for SPARC.

3.2.4 boot properties

There are three properties retrieved via the current bootops that neither OBP
nor the kernel have any knowledge of.  These are:

"fstype"                name of root filesystem type (e.g., "ufs")
"impl-arch-name"        platform name (aka `uname -i`)
"whoami"                file system name of booted kernel

A new node (/packages/boot-properties) will be added to the OBP device tree by
the booter which contains these properties.

In addition to the above compatibility properties, a couple of new properties 
are
needed:

"bootarchive"           boot archive path (as opposed to "bootpath", the root
                        file system path)
"elfheader-address"     address of kernel ELF header, used by krtld in lieu of
                        the bootaux vector passed up via the second-stage booter
"elfheader-length"      length of kernel ELF header
"archive-fstype"        fstype of boot archive


3.3 Differences from Solaris x86

The primary difference from PSARC 2004/454 is that there is no dependence on
grub. This decision was made for both practical and functional reasons.  The
practical reason is that grub0.95 - which was used by Solaris x86 - is not
available on SPARC.  The functional reason is that most of the reasons grub was
used for Solaris x86 (e.g., eliminate real-mode, eliminate the boot shell, 3rd
party device support) are not applicable to Solaris SPARC.  Making grub work in
the network boot case on SPARC is not a trivial exercise, as grub does not
currently support NFS or HTTP, and adding wanboot's non-exportable cryptography
code is problematic with respect to the GPL.

The boot menu provided by grub would be an interesting feature to provide 
Solaris
SPARC users.  If grub2 were to become available on SPARC in a stable supportable
release it could be incorporated in booter phase above.  This should not require
much change to existing code since the booter retains its ability to load
alternate secondary booters such as cprboot.


4 Interfaces
------------

4.1 Interface Exported
-----------------------------------------------------------------------
Name                    Level           Comments
-----------------------------------------------------------------------
Boot files              Evolving
  platform/<platform>/boot_archive      default name of boot archive
                                        (initially may be sun4u/sun4v only)
Boot args
  kernel args           Evolving        see kernel(1M)
  -F <alternate file>   Evolving        alternate booter or boot archive

usr/sbin/bootadm(1M)        Stable
usr/sbin/root_archive(1M)  Stable
boot/solaris/bin
  create_ramdisk        Proj. Priv.

Install-time Update                     different user prompts

boot/solaris
    filelist.ramdisk    Proj. Priv.     boot archive content
    filestat.ramdisk    Proj. Priv.     boot archive file status

kernel/fs/dcfs          Proj. Priv.     compression file system
sbin/fiocompress        Proj. Priv.     file compression utility
usr/sbin/fiocompress    Proj. Priv.     link to sbin/fiocompress
usr/include/sys/fs/decomp.h   Proj. Priv.    dcfs header file
_FIO_COMPRESSED         Proj. Priv.     file compression ioctl

usr/platform/sun4[uv]/lib/fs
        hsfs/bootblk    Proj. Priv.     filesystem readers
        zfs/bootblk     Proj. Priv.     filesystem readers

platform/sun4[uv]/ufsboot  Proj. Priv.  removed
kernel/misc/sparcv9/krtld  Proj. Priv.  merged into unix

-----------------------------------------------------------------------


4.2 Interfaces Reimplemented
-----------------------------------------------------------------------
Name                    Level           Comments
-----------------------------------------------------------------------
reboot(1M)              Stable?         update boot archive as needed
halt(1M)
poweroff(1M)
shutdown(1M)
init(1M)
pkgadd(1M)
patchadd(1M)

add_install_client      Evolving        modified server setup

smdiskless(1M)          Evolving        setup /tftpboot area


Install                                 consume bootadm and installgrub
Upgrade
Live Upgrade
Flash Install
Net Install
Jumpstart


Release engineering tools
-------------------------
modified miniroot construction




5. User Experience
------------------

5.1 System startup

The system startup changes are not visible to the user.


5.2 Installation and Upgrade

The ITU menu which currently only exists on x86 may be exposed on sparc as
well.

Customers who rely on undocumented non-interfaced implementation details
of add_install_client may need to amend their procedures.

If we deliver the wanboot component, wanboot customers will see a
simplification of the deployment procedures.

5.3 Internal tools

The bfu scripts will be updated to permit developers to transition from old boot
to new boot. For a system initially installed with old boot, it will be possible
to bfu back and forth across the boundary. For a system initially installed with
new boot, bfu back to old boot is not supported. Booting a glommed kernel is
supported, with the same restrictions regarding the compatibility of userland
and kernel.

If the root file system goes into a non-bootable state, a user may boot
the failsafe archive to perform manual recovery operations. The failsafe
archives contains files normally present in a CD or netinstall miniroot.

If the boot archive goes into a non-bootable state, a user may bypass the boot
archive and directly boot the kernel with the -F <kernel FS path> option.


5.4 Coexistence with other OSes

The only other OS of note for SPARC is Linux, and they currently have no grub
plans.  SPARC / Linux boots from a loader called SILO, which already has a
grub-like menu facility.

In general OBP is considered well suited to loading multiple OS's or versions
of OS's from various devices on the system, so no additional work beyond what
is provided by OBP/the system is desirable.


6. Technical Details
--------------------

6.1 OBP phase

This project does not change the OBP phase of SPARC boot; this section is
included for reference only.

When a user types "boot" on an OBP-based system, the device selected - either
from the command line or via the "boot-device" nvram variable - has its "open"
and "load" methods called.  The program loaded by this process is then executed,
and the boot process enters the booter phase.

6.1.1 Disk

For disk devices, the FW driver usually uses the OBP label package's "load"
method, which parses the VTOC label at the beginning of the disk to locate the
specified partition, then reads sectors 1-15 of that partition into memory. This
area is commonly called the boot block and usually contains a filesystem reader.

6.1.2 Net

For network devices, the process is slightly different between booting over a
LAN versus booting over a WAN.  In both cases, however, the prom will download
a booter from a boot or install server (inetboot in this case).

6.1.2.1 LAN boot

When booting over a LAN, the FW uses either RARP and BOOTP or DHCP to discover
its boot or install server.  It then uses TFTP to download the booter (inetboot
in this case).

6.1.2.2 WAN boot

When booting over a WAN, the FW uses either DHCP or nvram properties to discover
its install server, and the router and proxies needed to connect to it.  It then
uses HTTP to download the booter, and may optionally check the booter's 
signature
with a predefined private key.  For more details, see PSARC 2001/009.


6.2 Booter phase

This phase is derived from the SPARC wanboot ramdisk process (see PSARC
2001/009), and is responsible for reading the boot archive from the root file
system (or install server in wanboot's case) into a ramdisk device.  It does
this by:

1) opening the boot-device (which it found as the "bootpath" property in the OBP
"/chosen" node)
2) using its file system specific reader to read the boot archive (by default,
/platform/`uname -m`/boot_archive)
3) creating a ramdisk device in "/ramdisk"
4) creating "bootarchive" and "fstype" properties in "/packages/boot-properties"
5) booting the archive (a ramdisk is just another type of disk, so executing the
boot block area serves this purpose)


6.3 Ramdisk phase

The ramdisk is self-describing in the same sense any disk image is by virtue of
having a filesystem reader in its boot block.  This reader is over 90% the same
as the disk boot block for a given filesystem so the same program is re-used.
Its job is to load and execute the kernel from the archive by:

1) opening the boot archive (the "bootarchive" property from the previous phase)
2) using its file system specific reader to read the kernel (by default,
/platform/`uname -i`/kernel/unix)
3) creating "impl-arch-name", "whoami" and "elfheader" properties
4) executing the kernel


6.4 Kernel phase

When krtld gains control, it mounts the boot archive and loads additional kernel
modules from the boot archive via the ramdisk. Subsequent kernel initialization
procedures remain the same until after the kernel mounts the root file system.
At that point, the kernel throws away the boot archive and reclaims the memory
it occupies.  Note that in the install case, the ramdisk actually contains the
root file system, and is not thrown away.  The kernel ramdisk driver simply
takes over control of the ramdisk image.


6.5 Chained booters

When booting from a disk, the booter will support chained booters both for
cprboot (see PSARC 1992/201) and for situations where the file system reader
cannot fit in the boot block.  In a future project, grub2 can use this facility
to add a graphical user menu to the booter phase.  This facility will not be
available when booting from a network, since the chained booter usually uses the
same virtual address space as the original booter.


6.6 Boot archive management

There are two kinds of boot archive: failsafe and normal.  A failsafe archive is
self-sufficient and bootable by itself.  It is created at install time and
requires no maintenance.  A normal archive shadows a root filesystem, so it
contains all kernel modules, driver.conf files, and a few configuration files in
/etc which are read by the kernel before root is mounted.  Once the root
filesystem is mounted, the kernel discards the boot archive from memory and
file I/O will be performed against the root device.

By default, the normal archive contains the following files and directories:

etc/system
etc/name_to_major
etc/driver_aliases
etc/name_to_sysnum
etc/dacf.conf
etc/driver_classes
etc/path_to_inst
etc/devices/devid_cache
etc/devices/mdi_scsi_vhci_cache
etc/devices/mdi_ib_cache
etc/cluster/nodeid
etc/zfs/zpool.cache
kernel
platform

The contents under the platform directory will be segregated into those needed
for a sun4u boot archive and those needed for a sun4v boot archive.  Further
per-platform differentiated boot archives may be considered if that helps
us gain faster booting via faster archive load, trading off a more complicated
archive construction process.

If any files in this list (or under directories listed) is updated, the boot
archive must be rebuilt prior to the next reboot for the modification to take
effect. The package and patch tools are updated to update the boot archive
whenever needed. In addition, the boot archive is updated as necessary on an
orderly system shutdown to catch files modified manually.

The boot archive could be out of sync with the root filesystem if the system
panics in the middle of an update, but before archive update is completed. We
check for such conditions on every boot before root filesystem is mounted
writeable. If an inconsistency is detected, the system will stop in single-user
mode, similar to current behavior when fsck fails on the root filesystem.  The
recommended recovery method is to boot the failsafe archive and recreate the
boot_archive. An expert user may decide to continue booting if the out-of-sync
files are not critical.

As noted above, if the boot archive cannot be fixed, the kernel can be directly
booted from the boot via the -F <kernel> argument.  This option will not work
when booting via HTTP, since there is no underlying FS to read from.  It will
also be noticably slower than booting via the archive, since the relatively
simple booters do not implement a buffer cache.

The bootadm(1M) command will handle the details of archive update and
verification.


6.7 Install and upgrade

Normal install and upgrade is achieved by booting the miniroot from either
CDROM/DVD or from the network. In both cases, the root filesystem of the
miniroot is the ramdisk. This allows the Solaris boot CD to be ejected without
rebooting the system.  The boot archive contains the entire miniroot.

The construction of the install CD is modified to use an hsfs boot block. The
miniroot is packed into a single file in ufs format, to be loaded as the ramdisk
image.

The setup of the net boot server is also modified. The boot server will serve a
boot strap as well as the ramdisk image which is downloaded and then booted
from.

The netinstall image will be packed using root_archive(1m).

The process for installing the OS to disk remains the same except that the boot
blocks are different and a boot archive must be constructed prior to booting the
install target disk.  The boot archives are created using bootadm(1M). The rest
of code changes comes with packages and patches and no special treatment is
required.


6.8 Diskless clients

Diskless boot is similar to booting the miniroot for net install
except that the root filesystem is on NFS instead of on UFS.


6.9 Install-time Update (ITU)

New-boot (phase 1: x86) reduced the x86 ITU mechanism described in PSARC
1997/059 to simply adding Solaris binaries (drivers, kernel modules, commands,
libraries, symlinks and the like) to the running miniroot and then the target
install environment. It also extended the supported media that an ITU could
be supplied on to include CD/DVD, memory sticks and similar devices.

This mechanism will now be made available on sparc as well to allow platform
support to be delivered out of band of a regularly scheduled release.

If a core kernel component needs to be updated, say unix or something else that
needs to be loaded before an ITU can be added, then a pre-patched miniroot image
needs to be made available (50Mb download) along with the patch.



References
----------

1. Shudong Zhou         PSARC 2004/454  Solaris Boot Architecture
2. Carl Smith           PSARC 2001/009  WAN-boot
3. Clark Dong           PSARC 1992/201  Checkpoint Resume (Reanimator)
4. Allan McKillop       PSARC 2006/260  Solaris on Xen


Reply via email to