Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-10 Thread Steve McIntyre
I've just pushed an update to the code here...

On Mon, Apr 10, 2023 at 05:45:15PM +0200, Pascal Hambourg wrote:
>On 10/04/2023 at 15:13, Steve McIntyre wrote:
>> 
>> Overall comment: I'm not trying to make the heuristics 100% reliable
>> here, as I don't think that's actually possible. Instead, I'm trying
>> to tread the fine line of:
>> 
>>   * minimising false negatives - let's try to pick up on the most
>> common cases where people are dual-booting with other systems and
>> might not understand the issues here. That's 99%+ going to be
>> people with Windows installed
>> 
>>   * minimising false positives - the issue that angered Cyril in
>> particular, with an incomplete LVM setup triggering the "bios
>> bootable OS" warning
>
>IMO it is more important to avoid false positives, because switching to a
>BIOS installation on systems which are not BIOS-boot capable would create a
>non bootable system. In case oft is easier to install GRUB for BIOS boot on
>an running EFI system than the other way around.

No. The reason I added this check and warning in the first place is to
avoid breaking existing (all-too-common) systems where Windows users
have a BIOS-booting installation but their BIOS is set to boot both
UEFI and BIOS. That's a stupid combination, but again all too
common. :-( New users who are just trying to install Debian dual-boot
are much less likely to be able to diagnose this kind of problem.

>> > - Other BIOS boot loaders such as syslinux/extlinux do not need or use a 
>> > BIOS
>> > boot partition.
>> 
>> Also not a use case I'm particularly caring about, I'll be
>> honest. They're also *really* not likely to work well without another
>> filesystem in use, which I expect we'll detect anyway.
>
>Indeed other partitions are needed and will be detected, but they will not
>increment $NUM_NOT_ESP if the disk is GPT and has no BIOS boot partition (so
>$DISK_BIOS_BOOT=no), so it might cause a false negative. So why not just
>treat MSDOS and GPT disk labels equally and treat BIOS boot partitions like
>any other non-ESP ?

It's a false negative that I really don't believe or care about very
much, I'll be honest. This is getting to be an edge case on an edge
case.

>> > 1b) IIUC the patch fixes #1033913 because the disk selected for 
>> > installation
>> > has received a new GPT disklabel without a BIOS boot partition, so further
>> > checking is skipped. But IMO the root cause of #1033913 is that changes are
>> > not committed to disk after setting the 'boot' and 'esp' flags to the newly
>> > created ESP partition before stopping parted_server.
>
>I originally thought about fixing partman-auto-lvm but it appears that other
>transient states can also trigger the "force UEFI installation" dialog during
>partitioning, for example after setting up LVM in manual partitioning if
>there is no ESP partition yet. As discussed in #debian-boot, a more general
>fix might be to run the check only once because only existing partitions
>before partitioning are relevant. Are there any use cases for which this
>might cause a false negative ?

So I've now modded the code to add a flag file - it'll only run the
check and (maybe) raise the warning on the first entry into
partman. Thanks for the suggection, this is clearly the correct
answer.

>> > 4) It appears that partman fails to detect the specially crafted partition
>> > table on the installation media created with a debian image. Is it intended
>> > or fortunately unintentional ? If partman could see the EFI partition on 
>> > the
>> > installation media, the detection of BIOS-bootable systems would fail.
>> 
>> That's not a worry for today... :-)
>
>Sure, but the issue can also happen if another removable media is present.
>For instance the USB drive I use to provide missing firmware has an ESP
>partition (and a regular partition table) thus can cause a false negative.

Again, we're hitting edge cases. We can't know for sure what the user
wants here, so we can't just ignore removable media (for example).

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
"Since phone messaging became popular, the young generation has lost the
 ability to read or write anything that is longer than one hundred and sixty
 characters."  -- Ignatios Souvatzis



Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-10 Thread Pascal Hambourg

On 10/04/2023 at 15:13, Steve McIntyre wrote:


Overall comment: I'm not trying to make the heuristics 100% reliable
here, as I don't think that's actually possible. Instead, I'm trying
to tread the fine line of:

  * minimising false negatives - let's try to pick up on the most
common cases where people are dual-booting with other systems and
might not understand the issues here. That's 99%+ going to be
people with Windows installed

  * minimising false positives - the issue that angered Cyril in
particular, with an incomplete LVM setup triggering the "bios
bootable OS" warning


IMO it is more important to avoid false positives, because switching to 
a BIOS installation on systems which are not BIOS-boot capable would 
create a non bootable system. In case oft is easier to install GRUB for 
BIOS boot on an running EFI system than the other way around.



- Other BIOS boot loaders such as syslinux/extlinux do not need or use a BIOS
boot partition.


Also not a use case I'm particularly caring about, I'll be
honest. They're also *really* not likely to work well without another
filesystem in use, which I expect we'll detect anyway.


Indeed other partitions are needed and will be detected, but they will 
not increment $NUM_NOT_ESP if the disk is GPT and has no BIOS boot 
partition (so $DISK_BIOS_BOOT=no), so it might cause a false negative. 
So why not just treat MSDOS and GPT disk labels equally and treat BIOS 
boot partitions like any other non-ESP ?



1b) IIUC the patch fixes #1033913 because the disk selected for installation
has received a new GPT disklabel without a BIOS boot partition, so further
checking is skipped. But IMO the root cause of #1033913 is that changes are
not committed to disk after setting the 'boot' and 'esp' flags to the newly
created ESP partition before stopping parted_server.


I originally thought about fixing partman-auto-lvm but it appears that 
other transient states can also trigger the "force UEFI installation" 
dialog during partitioning, for example after setting up LVM in manual 
partitioning if there is no ESP partition yet. As discussed in 
#debian-boot, a more general fix might be to run the check only once 
because only existing partitions before partitioning are relevant. Are 
there any use cases for which this might cause a false negative ?



4) It appears that partman fails to detect the specially crafted partition
table on the installation media created with a debian image. Is it intended
or fortunately unintentional ? If partman could see the EFI partition on the
installation media, the detection of BIOS-bootable systems would fail.


That's not a worry for today... :-)


Sure, but the issue can also happen if another removable media is 
present. For instance the USB drive I use to provide missing firmware 
has an ESP partition (and a regular partition table) thus can cause a 
false negative.




Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-10 Thread Steve McIntyre
Hey Pascal, and thanks for the review!

Overall comment: I'm not trying to make the heuristics 100% reliable
here, as I don't think that's actually possible. Instead, I'm trying
to tread the fine line of:

 * minimising false negatives - let's try to pick up on the most
   common cases where people are dual-booting with other systems and
   might not understand the issues here. That's 99%+ going to be
   people with Windows installed

 * minimising false positives - the issue that angered Cyril in
   particular, with an incomplete LVM setup triggering the "bios
   bootable OS" warning

On Mon, Apr 10, 2023 at 01:01:01PM +0200, Pascal Hambourg wrote:
>partman-efi "Fix detection of BIOS-bootable systems" provides a significant
>improvement over previous behaviour. However I have a few comments.
>
>1a) The patch assumes that a GPT disk may be BIOS-bootable only if it has a
>BIOS boot partition. But a GPT disk can be BIOS-bootable even without a BIOS
>boot partition:
>- GRUB may be installed without a BIOS boot partition if /boot is a plain
>partition (using blocklists), even though it is less reliable so a BIOS boot
>partition is strongly recommended.

Yeah, GRUB installed using blocklists is so much *not* a thing anybody
should be doing these days.

>- Other BIOS boot loaders such as syslinux/extlinux do not need or use a BIOS
>boot partition.

Also not a use case I'm particularly caring about, I'll be
honest. They're also *really* not likely to work well without another
filesystem in use, which I expect we'll detect anyway.

>1b) IIUC the patch fixes #1033913 because the disk selected for installation
>has received a new GPT disklabel without a BIOS boot partition, so further
>checking is skipped. But IMO the root cause of #1033913 is that changes are
>not committed to disk after setting the 'boot' and 'esp' flags to the newly
>created ESP partition before stopping parted_server.
>This can be seen in /var/log/partman:
>
>/bin/autopartition-lvm
>NEW_LABEL sda gpt
>NEW_PARTITION 1 sda ext2 538MB (future ESP)
>NEW_PARTITION 2 sda ext2 512MB (future /boot)
>NEW_PARTITION 3 sda ext3 159GB (future LVM)
>SET_FLAGS sda3 lvm
>(user prompt to write changes to the disk)
>COMMIT sda
>...
>/lib/partman/update.d/21efi_sync_flag
>SET_FLAGS sda1 boot esp
>...
>/bin/perform_recipe_by_lvm
>QUIT <- esp and boot flags have not been committed yet so are lost
>...
>/lib/partman/init.d/50efi
>GET_FLAGS sda1 -> none
>
>2) The patch considers the 'esp' and 'boot' flags to be equal. But this is
>true only with GPT. With MSDOS, they have totally different meanings:
>- 'esp' means that the partition has the ESP type identifier.
>- 'boot' means that the partition has the active/boot indicator set. The UEFI
>specification says that this indicator is ignored by EFI boot.

ACK, I think you're correct here. Yay parted and its inconsistent
"flags" concept. :-(

>3) The patch considers LVM and RAID partitions not bootable. But both LVM and
>RAID superblocks can have a boot loader reserved area. Also, GRUB may boot
>them directly without a /boot partition.

Hmmm, maybe.

>4) It appears that partman fails to detect the specially crafted partition
>table on the installation media created with a debian image. Is it intended
>or fortunately unintentional ? If partman could see the EFI partition on the
>installation media, the detection of BIOS-bootable systems would fail.

That's not a worry for today... :-)

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
You lock the door
And throw away the key
There's someone in my head but it's not me 



Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-10 Thread Pascal Hambourg
partman-efi "Fix detection of BIOS-bootable systems" provides a 
significant improvement over previous behaviour. However I have a few 
comments.


1a) The patch assumes that a GPT disk may be BIOS-bootable only if it 
has a BIOS boot partition. But a GPT disk can be BIOS-bootable even 
without a BIOS boot partition:
- GRUB may be installed without a BIOS boot partition if /boot is a 
plain partition (using blocklists), even though it is less reliable so a 
BIOS boot partition is strongly recommended.
- Other BIOS boot loaders such as syslinux/extlinux do not need or use a 
BIOS boot partition.


1b) IIUC the patch fixes #1033913 because the disk selected for 
installation has received a new GPT disklabel without a BIOS boot 
partition, so further checking is skipped. But IMO the root cause of 
#1033913 is that changes are not committed to disk after setting the 
'boot' and 'esp' flags to the newly created ESP partition before 
stopping parted_server.

This can be seen in /var/log/partman:

/bin/autopartition-lvm
NEW_LABEL sda gpt
NEW_PARTITION 1 sda ext2 538MB (future ESP)
NEW_PARTITION 2 sda ext2 512MB (future /boot)
NEW_PARTITION 3 sda ext3 159GB (future LVM)
SET_FLAGS sda3 lvm
(user prompt to write changes to the disk)
COMMIT sda
...
/lib/partman/update.d/21efi_sync_flag
SET_FLAGS sda1 boot esp
...
/bin/perform_recipe_by_lvm
QUIT <- esp and boot flags have not been committed yet so are lost
...
/lib/partman/init.d/50efi
GET_FLAGS sda1 -> none

2) The patch considers the 'esp' and 'boot' flags to be equal. But this 
is true only with GPT. With MSDOS, they have totally different meanings:

- 'esp' means that the partition has the ESP type identifier.
- 'boot' means that the partition has the active/boot indicator set. The 
UEFI specification says that this indicator is ignored by EFI boot.


3) The patch considers LVM and RAID partitions not bootable. But both 
LVM and RAID superblocks can have a boot loader reserved area. Also, 
GRUB may boot them directly without a /boot partition.


4) It appears that partman fails to detect the specially crafted 
partition table on the installation media created with a debian image. 
Is it intended or fortunately unintentional ? If partman could see the 
EFI partition on the installation media, the detection of BIOS-bootable 
systems would fail.




Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-03 Thread Pascal Hambourg

Hello,

On 03/04/2023 at 21:50, Cyril Brulebois wrote:


  - It very much looks like the intermediary states are slightly
different when setting up LVM and when setting up encrypted LVM, and
the LVM case case leads to some confusion in partman-efi's
/lib/partman/init.d/50efi (which logs to /var/log/partman rather than
to /var/log/syslog): “Found 0 ESPs, 3 non-ESPs”.


Not sure if it is relevant, but the non-EFI partition calculation is 
wrong. Patch attached.diff --git a/init.d/efi b/init.d/efi
index 7eb8d2e..092ba80 100755
--- a/init.d/efi
+++ b/init.d/efi
@@ -71,23 +71,22 @@ for dev in /var/lib/partman/devices/*; do
 	close_dialog
 
 	for id in $partitions; do
-	efi=no
-	open_dialog GET_FLAGS $id
-	while { read_line flag; [ "$flag" ]; }; do
-		if [ "$flag" = boot ]; then
-			efi=yes
+		efi=no
+		open_dialog GET_FLAGS $id
+		while { read_line flag; [ "$flag" ]; }; do
+			if [ "$flag" = boot ]; then
+efi=yes
+			fi
+		done
+		close_dialog
+		if [ "$efi" = yes ]; then
+			mkdir -p $id
+			echo efi >$id/method
 			NUM_ESP=$(($NUM_ESP + 1))
-			# cannot break here
 		else
 			NUM_NO=$(($NUM_NO + 1))
 		fi
 	done
-	close_dialog
-	if [ "$efi" = yes ]; then
-		mkdir -p $id
-		echo efi >$id/method
-	fi
-	done
 done
 
 log "Found $NUM_ESP ESPs, $NUM_NO non-ESPs"


Bug#1033913: partman-auto-lvm: Broken "Guided - use entire disk and set up LVM" in UEFI mode

2023-04-03 Thread Cyril Brulebois
Package: partman-auto-lvm
Version: 87
Severity: serious
Justification: Maintainer says so

TL;DR: Answering “Yes” to the “Force UEFI installation?” makes sure the
installer pulls the right bootloader packages, despite misreading the
situation.

I've discovered this while testing D-I Bookworm RC 1 but also confirmed
it already existed in D-I Bookworm Alpha 2, and I'm therefore filing it
against the version found in the previous release (and deciding not to
block the Bookworm RC 1 release on it).



For baremetal tests on laptops requiring various firmware packages, I've
been using guided partitioning since forever, with one of these:
 - Guided - use entire disk
 - Guided - use entire disk and set up encrypted LVM

The former is used most of the time since it's slightly faster (fewer
prompts), while the latter is only used once in a while, to make sure a
“real” laptop-oriented install works fine (since every laptop should be
encrypted in my opinion).

Since I had just tested “Guided - use entire disk” in a virtual machine,
I decided to pick this instead when switching to the first laptop
(Asus Vivobook S14/S15 but that's very likely not a factor):
 - Guided - use entire disk and set up LVM

And… *WOW!*

The first surprise is this prompt:

Force UEFI installation?

This machine's firmware has started the installer in UEFI mode but
it looks like there may be existing operating systems already
installed using "BIOS compatibility mode". If you continue to
install Debian in UEFI mode, it might be difficult to reboot the
machine into any BIOS-mode operating systems later.

If you wish to install in UEFI mode and don't care about keeping the
ability to boot one of the existing systems, you have the option to
force that here. If you wish to keep the option to boot an existing
operating system, you should choose NOT to force UEFI installation
here.

which defaults to No.

That's very surprising since the only operating system prior to the
installation was a Debian system, which was getting entirely erased (due
to using the full disk), and was installed in UEFI mode anyway.

I went for the default choice, since we expect the installer to make
smart suggestions, and unsuspecting users shouldn't have to know better.

That means we end up with installing grub-pc instead of grub-efi-amd64
and shim, being prompted where to install GRUB, and of course when it's
time to reboot, the UEFI firmware rightfully refuses to boot anything
since there's absolutely no signature whatsoever, which isn't a great
idea under Secure Boot:

Secure Boot Violation

Invalid signature detected. Check Secure Boot Policy in Setup.


Some additional info:
 - As mentioned in TL;DR, this can be worked around by answering Yes to
   “Force UEFI installation?”.
 - It doesn't seem to be dependent on possible traces of an existing
   system prior to the installation: Debian installed on the entire disk
   or with encrypted LVM on the entire disk doesn't seem to make a
   difference. Starting with a wiped disk (writing ~ 2 GB worth of
   zeros at the beginning of the disk) doesn't make a difference either.
 - It very much looks like the intermediary states are slightly
   different when setting up LVM and when setting up encrypted LVM, and
   the LVM case case leads to some confusion in partman-efi's
   /lib/partman/init.d/50efi (which logs to /var/log/partman rather than
   to /var/log/syslog): “Found 0 ESPs, 3 non-ESPs”.
 - I'm filing this issue against partman-auto-lvm though, for
   discoverability purposes.


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant