Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2019-02-10 Thread Niels Thykier
On Fri, 14 Dec 2018 10:22:49 +0100 Ralf Jung  wrote:
> Hi,
> 
> > Fixing this does seem like it would be a good idea for general
> > robustness against dodgy firmware (this is not the first iteration of
> > problems along these lines).  It would take some development work, but
> > hopefully not too much.
> > 
> > Things that GRUB can't do, as far as I can tell:
> > 
> >  * I don't think there's a way for GRUB to check whether it will be
> >possible to recreate a boot entry later; as I understand it, that
> >depends on various low-level details, including firmware-specific
> >quirks.
> >
> >  * Even detecting that nothing changed would require cooperation from
> >efibootmgr, since the encoding of the EFI variable is an
> >implementation detail there (so we can't just read it out and
> >compare), and efibootmgr doesn't expose a way for GRUB to say "set
> >this configuration, but only if it's different from what's already
> >there".
> > 
> > However, I think GRUB can at least manage to delete all but one entry
> > from the same distributor rather than all of them, and if it finds one
> > remaining entry then it can modify that rather than writing a brand new
> > variable.  As I understand it, that would probably be enough to fix this
> > bug?
> 
> Assuming that modification works even when the variable storage is (close to)
> full, then yes, that would at least keep the device bootable which would be a
> big improvement.
> 
> Kind regards,
> Ralf
> 
> 

Hi Colin,

Thanks for proposing this solution. :)

I also think it would be a good solution for now that will hopefully
avoid most of these errors. :)

Thanks,
~Niels



Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-12-14 Thread Ralf Jung
Hi,

> Fixing this does seem like it would be a good idea for general
> robustness against dodgy firmware (this is not the first iteration of
> problems along these lines).  It would take some development work, but
> hopefully not too much.
> 
> Things that GRUB can't do, as far as I can tell:
> 
>  * I don't think there's a way for GRUB to check whether it will be
>possible to recreate a boot entry later; as I understand it, that
>depends on various low-level details, including firmware-specific
>quirks.
>
>  * Even detecting that nothing changed would require cooperation from
>efibootmgr, since the encoding of the EFI variable is an
>implementation detail there (so we can't just read it out and
>compare), and efibootmgr doesn't expose a way for GRUB to say "set
>this configuration, but only if it's different from what's already
>there".
> 
> However, I think GRUB can at least manage to delete all but one entry
> from the same distributor rather than all of them, and if it finds one
> remaining entry then it can modify that rather than writing a brand new
> variable.  As I understand it, that would probably be enough to fix this
> bug?

Assuming that modification works even when the variable storage is (close to)
full, then yes, that would at least keep the device bootable which would be a
big improvement.

Kind regards,
Ralf



Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-10-28 Thread Colin Watson
On Sun, Feb 25, 2018 at 04:13:13PM +0100, Ralf Jung wrote:
> earlier today I did a system update, which completed successfully (as in, dpkg
> didn't stop due to an error).  I then rebooted my machine.  This left Linux
> unable to boot; only the Windows entry was left in the boot menu.  After some
> hours of debugging, the problem turned out to be that writing an EFI variable
> fails with "No space left on the device".  I did a firmware update (from
> Windows), to no avail.  In the end I booted into a live system, deleted some 
> of
> the "dump-type0-*" variables, rebooted, and then ran "grub-install" from a
> chroot to fix the situation.
> 
> I'm not exactly sure what went wrong here, but clearly the system shouldn't be
> put into an unbootable state ever.  I see two bugs here:
> 
> * First, it looks like something is filling up the EFI variable space.  I've
>   added an `ls -lah` of the evivars folder below.  This is after I deleted
>   roughly 20-30 "dump-type0-*" variables.  Is this the kernel dumping
>   information (about crashes or so)?  If yes, it seems to do so without ever
>   cleaning up or taking free space into account, which I'd consider a serious
>   bug.  Should I report this against the kernel?  I don't even know what 
> creates
>   those EFI variables.

Those are created by the efi_pstore_write function in the kernel.
Beyond that I'm not really familiar with what's going on - you should
ask Debian's kernel folks if you need to pursue this.

> * Second, does grub-install really have to delete and create EFI variables 
> even
>   when nothing changed?  It seems to me that writing an EFI variable is only
>   necessary when initially installing GRUB.  Even if writing is necessary, a
>   check could be done *before* deleting the boot entry whether it will be
>   possible to write it again later.  Right now, it seems that grub will 
> happily
>   delete the debian boot entry and then fail to create it again -- and this
>   doesn't even make the system update fail.

Fixing this does seem like it would be a good idea for general
robustness against dodgy firmware (this is not the first iteration of
problems along these lines).  It would take some development work, but
hopefully not too much.

Things that GRUB can't do, as far as I can tell:

 * I don't think there's a way for GRUB to check whether it will be
   possible to recreate a boot entry later; as I understand it, that
   depends on various low-level details, including firmware-specific
   quirks.
   
 * Even detecting that nothing changed would require cooperation from
   efibootmgr, since the encoding of the EFI variable is an
   implementation detail there (so we can't just read it out and
   compare), and efibootmgr doesn't expose a way for GRUB to say "set
   this configuration, but only if it's different from what's already
   there".

However, I think GRUB can at least manage to delete all but one entry
from the same distributor rather than all of them, and if it finds one
remaining entry then it can modify that rather than writing a brand new
variable.  As I understand it, that would probably be enough to fix this
bug?

-- 
Colin Watson   [cjwat...@debian.org]



Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-08-10 Thread Ralf Jung
Just had this again, this time even after I repaired Debian, Windows disappeared
completely from the start menu.  I do not yet know how to get it back.

What does it take to get attention t a bug that completely breaks the system?

Kind regards,
Ralf



Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-04-14 Thread Marius Mikucionis
Package: grub-efi-amd64
Version: 2.02+dfsg1-4
Followup-For: Bug #891434

I just ran into this same issue and it is specific to grub:
refind-install also has similar issues, so this is specific to the state of the 
computer.

I found this answer helpful:
https://unix.stackexchange.com/a/379824/79267

In particular deleting dump files helped:

rm /sys/firmware/efi/efivars/dump-*

and then grub-install worked fine.

As a fix, perhaps grub could issue a message to look into 
/sys/firmware/efi/efivars directory, because it was not trivial to find it (all 
the mounted file systems have plenty of space as reported by `df -h` so the 
message "No space left on device" is not helpful).


-- Package-specific info:

*** BEGIN /proc/mounts
/dev/sda8 / ext4 rw,noatime,nodiratime,discard,errors=remount-ro,data=ordered 0 0
/dev/sda9 /home ext4 rw,noatime,nodiratime,discard,data=ordered 0 0
/dev/sda2 /boot/efi vfat 
rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro
 0 0
*** END /proc/mounts

*** BEGIN /boot/grub/grub.cfg
#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
  set have_grubenv=true
  load_env
fi
if [ "${next_entry}" ] ; then
   set default="${next_entry}"
   set next_entry=
   save_env next_entry
   set boot_once=true
else
   set default="0"
fi

if [ x"${feature_menuentry_id}" = xy ]; then
  menuentry_id_option="--id"
else
  menuentry_id_option=""
fi

export menuentry_id_option

if [ "${prev_saved_entry}" ]; then
  set saved_entry="${prev_saved_entry}"
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z "${boot_once}" ]; then
saved_entry="${chosen}"
save_env saved_entry
  fi
}
function load_video {
  if [ x$feature_all_video_module = xy ]; then
insmod all_video
  else
insmod efi_gop
insmod efi_uga
insmod ieee1275_fb
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
  fi
}

if [ x$feature_default_font_path = xy ] ; then
   font=unicode
else
insmod part_gpt
insmod ext2
set root='hd0,gpt8'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 
--hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8  
b6485691-c9ee-4aad-85ba-4d9a8032e2c7
else
  search --no-floppy --fs-uuid --set=root b6485691-c9ee-4aad-85ba-4d9a8032e2c7
fi
font="/usr/share/grub/unicode.pf2"
fi

if loadfont $font ; then
  set gfxmode=800x600
  load_video
  insmod gfxterm
  set locale_dir=$prefix/locale
  set lang=en_US
  insmod gettext
fi
terminal_output gfxterm
if [ "${recordfail}" = 1 ] ; then
  set timeout=30
else
  if [ x$feature_timeout_style = xy ] ; then
set timeout_style=menu
set timeout=5
  # Fallback normal timeout code in case the timeout_style feature is
  # unavailable.
  else
set timeout=5
  fi
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/05_debian_theme ###
insmod part_gpt
insmod ext2
set root='hd0,gpt8'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 
--hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8  
b6485691-c9ee-4aad-85ba-4d9a8032e2c7
else
  search --no-floppy --fs-uuid --set=root b6485691-c9ee-4aad-85ba-4d9a8032e2c7
fi
insmod png
if background_image /usr/share/desktop-base/softwaves-theme/grub/grub-16x9.png; 
then
  set color_normal=white/black
  set color_highlight=black/white
else
  set menu_color_normal=cyan/blue
  set menu_color_highlight=white/blue
fi
### END /etc/grub.d/05_debian_theme ###

### BEGIN /etc/grub.d/10_linux ###
function gfxmode {
set gfxpayload="${1}"
}
set linux_gfx_mode=
export linux_gfx_mode
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu 
--class os $menuentry_id_option 
'gnulinux-simple-b6485691-c9ee-4aad-85ba-4d9a8032e2c7' {
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod ext2
set root='hd0,gpt8'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt8 
--hint-efi=hd0,gpt8 --hint-baremetal=ahci0,gpt8  
b6485691-c9ee-4aad-85ba-4d9a8032e2c7
else
  search --no-floppy --fs-uuid --set=root 
b6485691-c9ee-4aad-85ba-4d9a8032e2c7
fi
echo'Loading Linux 4.15.0-2-amd64 ...'
linux   /boot/vmlinuz-4.15.0-2-amd64 
root=UUID=b6485691-c9ee-4aad-85ba-4d9a8032e2c7 ro  quiet
echo'Loading initial ramdisk ...'
initrd  /boot/initrd.img-4.15.0-2-amd64
}
submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option 
'gnulinux-advanced-b6485691-c9ee-4aad-85ba-4d9a8032e2c7' {
menuentry 

Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-02-26 Thread Luca BRUNO
Hi,
I experienced the same today after a grub update to '2.02+dfsg1-1' on testing.
Looking back at logs, grub-install reported an error but the upgrade process
as a whole didn't fail, so I missed it at first:
```
Could not prepare Boot variable: No space left on device
grub-install: error: efibootmgr failed to register the boot entry: Input/output 
error.
Failed: grub-install --target=x86_64-efi  
WARNING: Bootloader is not properly installed, system may not be bootable
```

However, after manually recovering via efibootmgr, the ESP doesn't seem
to be full nor close to:
```
# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda4   3.7G  238M  3.2G   7% /boot
/dev/sda1   256M   24M  233M  10% /boot/efi
```

My pstore has ~150 entries, all quite small (~1Kb), and none of them are recent.
So I'm not sure why this specific upgrade got stuck on ENOSPC.

Ciao, Luca

-- 
"If you build a wall, think of what you leave outside it" - Italo Calvino


signature.asc
Description: This is a digitally signed message part.


Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

2018-02-25 Thread Ralf Jung
Package: grub-efi
Version: 2.02+dfsg1-1
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

earlier today I did a system update, which completed successfully (as in, dpkg
didn't stop due to an error).  I then rebooted my machine.  This left Linux
unable to boot; only the Windows entry was left in the boot menu.  After some
hours of debugging, the problem turned out to be that writing an EFI variable
fails with "No space left on the device".  I did a firmware update (from
Windows), to no avail.  In the end I booted into a live system, deleted some of
the "dump-type0-*" variables, rebooted, and then ran "grub-install" from a
chroot to fix the situation.

I'm not exactly sure what went wrong here, but clearly the system shouldn't be
put into an unbootable state ever.  I see two bugs here:

* First, it looks like something is filling up the EFI variable space.  I've
  added an `ls -lah` of the evivars folder below.  This is after I deleted
  roughly 20-30 "dump-type0-*" variables.  Is this the kernel dumping
  information (about crashes or so)?  If yes, it seems to do so without ever
  cleaning up or taking free space into account, which I'd consider a serious
  bug.  Should I report this against the kernel?  I don't even know what creates
  those EFI variables.

* Second, does grub-install really have to delete and create EFI variables even
  when nothing changed?  It seems to me that writing an EFI variable is only
  necessary when initially installing GRUB.  Even if writing is necessary, a
  check could be done *before* deleting the boot entry whether it will be
  possible to write it again later.  Right now, it seems that grub will happily
  delete the debian boot entry and then fail to create it again -- and this
  doesn't even make the system update fail.

This is all on a Lenovo P50.  Initially I used the firmware version from last
fall, and then updated it to the latest one (from last December).

Kind regards,
Ralf

-- Package-specific info:

*** BEGIN /sys/firmware/efi/efivars

$ ls -lah
total 0
drwxr-xr-x 2 root root0 Feb 25 14:25 .
drwxr-xr-x 6 root root0 Feb 25 14:25 ..
-rw-r--r-- 1 root root   26 Feb 25 14:25 
AppName-1fd8b79f-0be2-4d57-b241-81c5e24e01a1
-rw-r--r-- 1 root root   36 Feb 25 14:25 
AppPlatform-1fd8b79f-0be2-4d57-b241-81c5e24e01a1
-rw-r--r-- 1 root root5 Feb 25 14:25 
AuthVarKeyDatabase-aaf32c78-947b-439a-a180-2e144ec37792
-rw-r--r-- 1 root root  304 Feb 25 14:25 
Boot-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root  122 Feb 25 14:25 
Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   46 Feb 25 14:25 
Boot0010-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   54 Feb 25 14:25 
Boot0011-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   84 Feb 25 14:25 
Boot0012-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   72 Feb 25 14:25 
Boot0013-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   80 Feb 25 14:25 
Boot0014-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   74 Feb 25 14:25 
Boot0015-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   60 Feb 25 14:25 
Boot0016-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   64 Feb 25 14:25 
Boot0017-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   66 Feb 25 14:25 
Boot0018-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   63 Feb 25 14:25 
Boot0019-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   63 Feb 25 14:25 
Boot001A-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   69 Feb 25 14:25 
Boot001B-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   69 Feb 25 14:25 
Boot001C-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   69 Feb 25 14:25 
Boot001D-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   69 Feb 25 14:25 
Boot001E-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   66 Feb 25 14:25 
Boot001F-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   66 Feb 25 14:25 
Boot0020-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   70 Feb 25 14:25 
Boot0021-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   72 Feb 25 14:25 
Boot0022-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   66 Feb 25 14:25 
Boot0023-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   68 Feb 25 14:25 
Boot0024-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root6 Feb 25 14:25 
BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root8 Feb 25 14:25 
BootOptionSupport-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   28 Feb 25 14:25 
BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r-- 1 root root   24 Feb 25 14:25 
BootOrderDefault-0b7646a4-6b44-4332-8588-c8998117f2ef
-rw-r--r-- 1 root root5 Feb 25 14:25 
BootState-60b5e939-0fcf-4227-ba83-6bbed45bc0e3
-rw-r--r-- 1 root root   28 Feb 25 14:25