Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
On Fri, 14 Dec 2018 10:22:49 +0100 Ralf Jung wrote: > Hi, > > > Fixing this does seem like it would be a good idea for general > > robustness against dodgy firmware (this is not the first iteration of > > problems along these lines). It would take some development work, but > > hopefully not too much. > > > > Things that GRUB can't do, as far as I can tell: > > > > * I don't think there's a way for GRUB to check whether it will be > >possible to recreate a boot entry later; as I understand it, that > >depends on various low-level details, including firmware-specific > >quirks. > > > > * Even detecting that nothing changed would require cooperation from > >efibootmgr, since the encoding of the EFI variable is an > >implementation detail there (so we can't just read it out and > >compare), and efibootmgr doesn't expose a way for GRUB to say "set > >this configuration, but only if it's different from what's already > >there". > > > > However, I think GRUB can at least manage to delete all but one entry > > from the same distributor rather than all of them, and if it finds one > > remaining entry then it can modify that rather than writing a brand new > > variable. As I understand it, that would probably be enough to fix this > > bug? > > Assuming that modification works even when the variable storage is (close to) > full, then yes, that would at least keep the device bootable which would be a > big improvement. > > Kind regards, > Ralf > > Hi Colin, Thanks for proposing this solution. :) I also think it would be a good solution for now that will hopefully avoid most of these errors. :) Thanks, ~Niels
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
Hi, > Fixing this does seem like it would be a good idea for general > robustness against dodgy firmware (this is not the first iteration of > problems along these lines). It would take some development work, but > hopefully not too much. > > Things that GRUB can't do, as far as I can tell: > > * I don't think there's a way for GRUB to check whether it will be >possible to recreate a boot entry later; as I understand it, that >depends on various low-level details, including firmware-specific >quirks. > > * Even detecting that nothing changed would require cooperation from >efibootmgr, since the encoding of the EFI variable is an >implementation detail there (so we can't just read it out and >compare), and efibootmgr doesn't expose a way for GRUB to say "set >this configuration, but only if it's different from what's already >there". > > However, I think GRUB can at least manage to delete all but one entry > from the same distributor rather than all of them, and if it finds one > remaining entry then it can modify that rather than writing a brand new > variable. As I understand it, that would probably be enough to fix this > bug? Assuming that modification works even when the variable storage is (close to) full, then yes, that would at least keep the device bootable which would be a big improvement. Kind regards, Ralf
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
On Sun, Feb 25, 2018 at 04:13:13PM +0100, Ralf Jung wrote: > earlier today I did a system update, which completed successfully (as in, dpkg > didn't stop due to an error). I then rebooted my machine. This left Linux > unable to boot; only the Windows entry was left in the boot menu. After some > hours of debugging, the problem turned out to be that writing an EFI variable > fails with "No space left on the device". I did a firmware update (from > Windows), to no avail. In the end I booted into a live system, deleted some > of > the "dump-type0-*" variables, rebooted, and then ran "grub-install" from a > chroot to fix the situation. > > I'm not exactly sure what went wrong here, but clearly the system shouldn't be > put into an unbootable state ever. I see two bugs here: > > * First, it looks like something is filling up the EFI variable space. I've > added an `ls -lah` of the evivars folder below. This is after I deleted > roughly 20-30 "dump-type0-*" variables. Is this the kernel dumping > information (about crashes or so)? If yes, it seems to do so without ever > cleaning up or taking free space into account, which I'd consider a serious > bug. Should I report this against the kernel? I don't even know what > creates > those EFI variables. Those are created by the efi_pstore_write function in the kernel. Beyond that I'm not really familiar with what's going on - you should ask Debian's kernel folks if you need to pursue this. > * Second, does grub-install really have to delete and create EFI variables > even > when nothing changed? It seems to me that writing an EFI variable is only > necessary when initially installing GRUB. Even if writing is necessary, a > check could be done *before* deleting the boot entry whether it will be > possible to write it again later. Right now, it seems that grub will > happily > delete the debian boot entry and then fail to create it again -- and this > doesn't even make the system update fail. Fixing this does seem like it would be a good idea for general robustness against dodgy firmware (this is not the first iteration of problems along these lines). It would take some development work, but hopefully not too much. Things that GRUB can't do, as far as I can tell: * I don't think there's a way for GRUB to check whether it will be possible to recreate a boot entry later; as I understand it, that depends on various low-level details, including firmware-specific quirks. * Even detecting that nothing changed would require cooperation from efibootmgr, since the encoding of the EFI variable is an implementation detail there (so we can't just read it out and compare), and efibootmgr doesn't expose a way for GRUB to say "set this configuration, but only if it's different from what's already there". However, I think GRUB can at least manage to delete all but one entry from the same distributor rather than all of them, and if it finds one remaining entry then it can modify that rather than writing a brand new variable. As I understand it, that would probably be enough to fix this bug? -- Colin Watson [cjwat...@debian.org]
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
Just had this again, this time even after I repaired Debian, Windows disappeared completely from the start menu. I do not yet know how to get it back. What does it take to get attention t a bug that completely breaks the system? Kind regards, Ralf
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
Package: grub-efi-amd64 Version: 2.02+dfsg1-4 Followup-For: Bug #891434 I just ran into this same issue and it is specific to grub: refind-install also has similar issues, so this is specific to the state of the computer. I found this answer helpful: https://unix.stackexchange.com/a/379824/79267 In particular deleting dump files helped: rm /sys/firmware/efi/efivars/dump-* and then grub-install worked fine. As a fix, perhaps grub could issue a message to look into /sys/firmware/efi/efivars directory, because it was not trivial to find it (all the mounted file systems have plenty of space as reported by `df -h` so the message "No space left on device" is not helpful). -- Package-specific info: *** BEGIN /proc/mounts /dev/sda8 / ext4 rw,noatime,nodiratime,discard,errors=remount-ro,data=ordered 0 0 /dev/sda9 /home ext4 rw,noatime,nodiratime,discard,data=ordered 0 0 /dev/sda2 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0 *** END /proc/mounts *** BEGIN /boot/grub/grub.cfg # # DO NOT EDIT THIS FILE # # It is automatically generated by grub-mkconfig using templates # from /etc/grub.d and settings from /etc/default/grub # ### BEGIN /etc/grub.d/00_header ### if [ -s $prefix/grubenv ]; then set have_grubenv=true load_env fi if [ "${next_entry}" ] ; then set default="${next_entry}" set next_entry= save_env next_entry set boot_once=true else set default="0" fi if [ x"${feature_menuentry_id}" = xy ]; then menuentry_id_option="--id" else menuentry_id_option="" fi export menuentry_id_option if [ "${prev_saved_entry}" ]; then set saved_entry="${prev_saved_entry}" save_env saved_entry set prev_saved_entry= save_env prev_saved_entry set boot_once=true fi function savedefault { if [ -z "${boot_once}" ]; then saved_entry="${chosen}" save_env saved_entry fi } function load_video { if [ x$feature_all_video_module = xy ]; then insmod all_video else insmod efi_gop insmod efi_uga insmod ieee1275_fb insmod vbe insmod vga insmod video_bochs insmod video_cirrus fi } if [ x$feature_default_font_path = xy ] ; then font=unicode else insmod part_gpt insmod ext2 set root='hd0,gpt8' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 --hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8 b6485691-c9ee-4aad-85ba-4d9a8032e2c7 else search --no-floppy --fs-uuid --set=root b6485691-c9ee-4aad-85ba-4d9a8032e2c7 fi font="/usr/share/grub/unicode.pf2" fi if loadfont $font ; then set gfxmode=800x600 load_video insmod gfxterm set locale_dir=$prefix/locale set lang=en_US insmod gettext fi terminal_output gfxterm if [ "${recordfail}" = 1 ] ; then set timeout=30 else if [ x$feature_timeout_style = xy ] ; then set timeout_style=menu set timeout=5 # Fallback normal timeout code in case the timeout_style feature is # unavailable. else set timeout=5 fi fi ### END /etc/grub.d/00_header ### ### BEGIN /etc/grub.d/05_debian_theme ### insmod part_gpt insmod ext2 set root='hd0,gpt8' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 --hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8 b6485691-c9ee-4aad-85ba-4d9a8032e2c7 else search --no-floppy --fs-uuid --set=root b6485691-c9ee-4aad-85ba-4d9a8032e2c7 fi insmod png if background_image /usr/share/desktop-base/softwaves-theme/grub/grub-16x9.png; then set color_normal=white/black set color_highlight=black/white else set menu_color_normal=cyan/blue set menu_color_highlight=white/blue fi ### END /etc/grub.d/05_debian_theme ### ### BEGIN /etc/grub.d/10_linux ### function gfxmode { set gfxpayload="${1}" } set linux_gfx_mode= export linux_gfx_mode menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-b6485691-c9ee-4aad-85ba-4d9a8032e2c7' { load_video insmod gzio if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi insmod part_gpt insmod ext2 set root='hd0,gpt8' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 --hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8 b6485691-c9ee-4aad-85ba-4d9a8032e2c7 else search --no-floppy --fs-uuid --set=root b6485691-c9ee-4aad-85ba-4d9a8032e2c7 fi echo'Loading Linux 4.15.0-2-amd64 ...' linux /boot/vmlinuz-4.15.0-2-amd64 root=UUID=b6485691-c9ee-4aad-85ba-4d9a8032e2c7 ro quiet echo'Loading initial ramdisk ...' initrd /boot/initrd.img-4.15.0-2-amd64 } submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option 'gnulinux-advanced-b6485691-c9ee-4aad-85ba-4d9a8032e2c7' { menuentry 'Debia
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
Hi, I experienced the same today after a grub update to '2.02+dfsg1-1' on testing. Looking back at logs, grub-install reported an error but the upgrade process as a whole didn't fail, so I missed it at first: ``` Could not prepare Boot variable: No space left on device grub-install: error: efibootmgr failed to register the boot entry: Input/output error. Failed: grub-install --target=x86_64-efi WARNING: Bootloader is not properly installed, system may not be bootable ``` However, after manually recovering via efibootmgr, the ESP doesn't seem to be full nor close to: ``` # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 3.7G 238M 3.2G 7% /boot /dev/sda1 256M 24M 233M 10% /boot/efi ``` My pstore has ~150 entries, all quite small (~1Kb), and none of them are recent. So I'm not sure why this specific upgrade got stuck on ENOSPC. Ciao, Luca -- "If you build a wall, think of what you leave outside it" - Italo Calvino signature.asc Description: This is a digitally signed message part.
Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage
Package: grub-efi Version: 2.02+dfsg1-1 Severity: critical Justification: breaks the whole system Dear Maintainer, earlier today I did a system update, which completed successfully (as in, dpkg didn't stop due to an error). I then rebooted my machine. This left Linux unable to boot; only the Windows entry was left in the boot menu. After some hours of debugging, the problem turned out to be that writing an EFI variable fails with "No space left on the device". I did a firmware update (from Windows), to no avail. In the end I booted into a live system, deleted some of the "dump-type0-*" variables, rebooted, and then ran "grub-install" from a chroot to fix the situation. I'm not exactly sure what went wrong here, but clearly the system shouldn't be put into an unbootable state ever. I see two bugs here: * First, it looks like something is filling up the EFI variable space. I've added an `ls -lah` of the evivars folder below. This is after I deleted roughly 20-30 "dump-type0-*" variables. Is this the kernel dumping information (about crashes or so)? If yes, it seems to do so without ever cleaning up or taking free space into account, which I'd consider a serious bug. Should I report this against the kernel? I don't even know what creates those EFI variables. * Second, does grub-install really have to delete and create EFI variables even when nothing changed? It seems to me that writing an EFI variable is only necessary when initially installing GRUB. Even if writing is necessary, a check could be done *before* deleting the boot entry whether it will be possible to write it again later. Right now, it seems that grub will happily delete the debian boot entry and then fail to create it again -- and this doesn't even make the system update fail. This is all on a Lenovo P50. Initially I used the firmware version from last fall, and then updated it to the latest one (from last December). Kind regards, Ralf -- Package-specific info: *** BEGIN /sys/firmware/efi/efivars $ ls -lah total 0 drwxr-xr-x 2 root root0 Feb 25 14:25 . drwxr-xr-x 6 root root0 Feb 25 14:25 .. -rw-r--r-- 1 root root 26 Feb 25 14:25 AppName-1fd8b79f-0be2-4d57-b241-81c5e24e01a1 -rw-r--r-- 1 root root 36 Feb 25 14:25 AppPlatform-1fd8b79f-0be2-4d57-b241-81c5e24e01a1 -rw-r--r-- 1 root root5 Feb 25 14:25 AuthVarKeyDatabase-aaf32c78-947b-439a-a180-2e144ec37792 -rw-r--r-- 1 root root 304 Feb 25 14:25 Boot-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 122 Feb 25 14:25 Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 46 Feb 25 14:25 Boot0010-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 54 Feb 25 14:25 Boot0011-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 84 Feb 25 14:25 Boot0012-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 72 Feb 25 14:25 Boot0013-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 80 Feb 25 14:25 Boot0014-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 74 Feb 25 14:25 Boot0015-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 60 Feb 25 14:25 Boot0016-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 64 Feb 25 14:25 Boot0017-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 66 Feb 25 14:25 Boot0018-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 63 Feb 25 14:25 Boot0019-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 63 Feb 25 14:25 Boot001A-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 69 Feb 25 14:25 Boot001B-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 69 Feb 25 14:25 Boot001C-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 69 Feb 25 14:25 Boot001D-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 69 Feb 25 14:25 Boot001E-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 66 Feb 25 14:25 Boot001F-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 66 Feb 25 14:25 Boot0020-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 70 Feb 25 14:25 Boot0021-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 72 Feb 25 14:25 Boot0022-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 66 Feb 25 14:25 Boot0023-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 68 Feb 25 14:25 Boot0024-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root6 Feb 25 14:25 BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root8 Feb 25 14:25 BootOptionSupport-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 28 Feb 25 14:25 BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r-- 1 root root 24 Feb 25 14:25 BootOrderDefault-0b7646a4-6b44-4332-8588-c8998117f2ef -rw-r--r-- 1 root root5 Feb 25 14:25 BootState-60b5e939-0fcf-4227-ba83-6bbed45bc0e3 -rw-r--r-- 1 root root 28 Feb 25 14:25 CapsuleLongModeBuffer-711c7