Design Failure Mode Effects Analysis

System: Boot

Potential Failure Mode: /boot partition overfills

Effects of Failure: To return the system to a usable state, the user
must have advanced knowledge and an available system recovery disk. The
procedure involves disk mounting, chroot, package management, and deep
file system knowledge. This is outside the range of most target users.

Severity: 10 (system is completely unusable until recovery, and recovery
is very expensive)

Causes of Failure: 
1. Multiple kernel updates between reboots will overfill the standard 705M 
/boot partition with over 3 kernels.
2. User installing generic and lowlatency hwe kernels. They may also have 
transient kernels when switching to another kernel like oem.

Preventative Activities: 
1. There does not appear to be any tool to guide kernel selection for users or 
ensure the latest and penultimate versions are reserved for the kernels.
2. There does not appear to be any testing which prevents installation of a 
kernel that would over-fill the disk.
3. unattended-upgrades tries to clean images that fill up the /boot disk, but 
it does not consider disk space, and even when it work (which is a whole other 
issue), the /boot disk is required to hold 4 images at times which is not 
feasible with the current 705M and 180M file set size.

Occurrence: 5 (even users with a singe Kernel flavor encounter this)

Detection Rating: 9

Risk Priority Number: Severity * Occurrence * Detection = 450

Take Away Points:

* The DFMEA indicates this is a severe issue that should be considered critical.
* When an overfull /boot disk occurs, the effect is catastrophic to the the 
average user, and for many is simple unrecoverable. This can and does drive 
users to abandon the OS when this occurs.
* Existing controls to prevent occurrence are inadequate (automated upgrades 
still allows disk to over fill when using a single kernel flavor, and does not 
consider disk space) or completely missing (users are not guided on the issue 
of kernel management). The popularity of forum posts over the years about this 
issue illustrates this is a substantial problem.
* This issue goes beyond /boot partition size, but the increasing it to handle 
all possible transient states is required for a complete solution.
* Disk space is cheap these days. On consumer desktop solutions, 2.0GB is a 
small price to pay to avoid a catastrophic failure which is otherwise 
unrecoverable for many users.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1959971

Title:
  increase /boot partition size

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/partman-auto/+bug/1959971/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to