Re: grub_probe/grub-mkimage does not find all drives in BTRFS RAID1

2018-03-22 Thread Duncan
Matthew Hawn posted on Thu, 22 Mar 2018 00:13:38 + as excerpted:

> This is almost definitely a bug in GRUB, but I wanted to get the btrfs
> mailing list opinion first.
> 
> Symptoms:
> I have a btrfs raid1 /boot and root filesystem.  Ever since I replaced a
> drive, when I run the grub utilities to create my grub.cfg and install
> to boot sector, it only recognizes one of the drives.

> So, is this a grub bug? If so, any suggestions before I submit to the
> grub-bug list?  Also, as I wait until a fix is published (or I rebuild
> grub with my own patch), any ideas to workaround this?

It does appear to be a grub bug, yes.  I'm not a dev, just a btrfs user 
and list regular, so won't comment further on that, but I do have a 
suggested workaround based on what I do here.

> Grub:  2.02~beta2-36ubuntu3.17

FWIW, 2.02 is out (and installed here on gentoo, gentoo git log says it 
was committed to the gentoo tree on 2017-04-27, and ftp.gnu.org has a 
date of 2017-04-26, so it's out nearing a year now, and gentoo picked it 
up right away =:^), but I don't know if that bug is fixed.

> $ btrfs fi show
> Label: none  uuid: 84c8e78b-9d7f-4451-966d-3c25154e89b8
>   Total devices 2 FS bytes used 22.16GiB
>   devid2 size 100.00GiB used 25.03GiB
> path /dev/mapper/VG_BTRFS2-LV_ROOT2
>   devid3 size 100.00GiB used 25.03GiB
> path /dev/mapper/VG_BTRFS3-LV_ROOT3
> 
> 
> Label: none  uuid: 059ab98f-eb63-471d-b099-6561baf39040
>   Total devices 2 FS bytes used 61.04GiB
>   devid2 size 200.00GiB used 62.03GiB 
> path /dev/mapper/VG_BTRFS2-LV_HOME2
>   devid3 size 200.00GiB used 62.03GiB
> path /dev/mapper/VG_BTRFS3-LV_HOME3
> 
> 
> Label: none  uuid: ffe8b1a0-030c-42c2-94f5-b7e8e54b1439
>   Total devices 2 FS bytes used 342.04MiB
>   devid2 size 1.00GiB used 693.62MiB 
> path /dev/mapper/VG_BTRFS2-LV_BOOT2
>   devid3 size 1.00GiB used 693.62MiB
> path /dev/mapper/VG_BTRFS3-LV_BOOT3

Good, you have a separate dedicated boot.  That will make my suggestion 
/much/ easier. =:^)

OT: I still remember the big deal of my first GB drive upgrade, not even 
a full GiB, and get a bit of "historical vertigo" seeing "a whole gig!!" 
used as a dedicated /boot...


So my setup here now uses mostly btrfs raid1, with a separate /boot, but 
I came up with the general layout years ago, before btrfs, when I was 
using mdraid, then with grub1 (technically 0.97 with a bunch of patches 
after upstream abandoned further development, as they never released 
grub1, dumping it for what became grub2 before 1.0 release).

And the much simpler grub1 didn't really understand mdraid, tho it could 
work with mdraid1 simply because grub treated each mdraid1 device as a 
single device.  But you had to install grub1 to each of the mdraid1 
devices separately if you wanted to be able to boot any of them, as it 
didn't have grub2-core's ability to dynamically detect other devices to 
load further grub modules from.

Meanwhile, while I was running mdraid and additionally had a backup raid 
for most of my filesystems, including root, that I could boot to if my 
primary/working raid and filesystem wouldn't assemble and/or mount for 
whatever reason, because grub1 could only point to one /boot for its 
stage2, I couldn't have a backup /boot raid1 on the same set of physical 
devices as my primary, which meant that if something other than a bad 
device happened to that /boot that made it unbootable, say I fat-fingered 
a mkfs and overwrote /boot, or if a grub upgrade went bad, I was stuck.

But at the time I was using 4-way mdraid1, 4 mirrors, for my primary 
raid, and that gave me an idea.  Instead of the usual 4 mirrors for each 
filesystem and its backup, as I had for most filesystems, for /boot I did 
two, two-way raid1s, each on two two devices, with one of the two two-way 
raid1s being the primary /boot and the second being the backup.  That way 
I could have the BIOS boot selector default to one of the primaries, and 
could still have it select one of the secondaries if I needed to boot the 
backup grub and its stage2 located in the backup /boot raid.

That came in quite handy when I upgraded to grub2, since I could upgrade 
the backup /boot raid1 and its grub to grub2, then test and configure it 
to my satisfaction, all the while keeping the grub1 installed in my 
normal working copy raid1 /boot untouched until I was satisfied that the 
grub2 install on the backup was functional and configured as I wanted, 
and upgrading the working copy raid1 /boot to grub2 only after I knew 
things were working well.


Much later, after some hardware upgrades so I wasn't using the 4-way 
raid1 setup any longer, I switched to btrfs and btrfs raid1.  But I so 
well liked the concept of having a backup /boot so I didn't have to worry 
about bad upgrades or fat-fingering my /boot, that I kept it!

But in addition to two-way raid1 redundancy on multiple devices, btrfs 
has the dup mode, two-way dup redundancy on a si

grub_probe/grub-mkimage does not find all drives in BTRFS RAID1

2018-03-21 Thread Matthew Hawn
This is almost definitely a bug in GRUB, but I wanted to get the btrfs mailing 
list opinion first.

Symptoms:
I have a btrfs raid1 /boot and root filesystem.  Ever since I replaced a drive, 
when I run the grub utilities to create my grub.cfg and install to boot sector, 
it only recognizes one of the drives.

$ sudo grub-probe /boot/grub -t device
/dev/mapper/VG_BTRFS2-LV_BOOT2

$ sudo grub-probe /boot/grub -t bios_hints
lvmid/gEfhOx-J9hr-8tkA-OgjD-Aqqu-XR2T-sFB4me/oNnMDp-Rit5-P0qs-QZlf-bQQe-tZU7-Wwmz8z

This also prevents boot if the above drive is disconnected. Grub error in 
locating 
lvmid/gEfhOx-J9hr-8tkA-OgjD-Aqqu-XR2T-sFB4me/oNnMDp-Rit5-P0qs-QZlf-bQQe-tZU7-Wwmz8z

Boot works fine if both drives, or only the above drive is present.

Before drive replacement, the above commands returned both drives that were 
part of the RAID1 mirror.  I never tried booting with a device disconnected, 
but both showed up in my grub.cfg.   Replacement was not standard since the 
prior drive was developing bad sectors, but had not failed. Replacement was 
done by adding a third disk to the mirror, then removing the 1st disk.

Probable Cause:
To determine the boot drive, grub-probe and grub-mkimage make several ioctl  in 
osdep/linux/getroot.c: grub_find_root_devices_from_btrfs
A call to BTRFS_IOC_FS_INFO gets the max_id and num_devices.  It then iterates 
from 1 to max_id, calling BTRFS_IOC_DEV_INFO to get the path.  

For my system, max_id = 3 and num_devices = 2.  Requesting BTRFS_IOC_DEV_INFO 
for device 1 yields a "No Such Device".  Instead of continuing on to device 2 
and 3 (which return without error), grub treats all ioctl errors as fatal, 
exits the btrfs specific code with a failure, then falls back to generic linux 
code that only detects the single drive. 

So, is this a grub bug? If so, any suggestions before I submit to the grub-bug 
list?  Also, as I wait until a fix is published (or I rebuild grub with my own 
patch), any ideas to workaround this?


Info:
Kernel:  4.15.0-12-generic #13-Ubuntu SMP (based on 4.15.7 mainline)
btrfs-progs: 4.15.1-1
Grub:  2.02~beta2-36ubuntu3.17 

$ btrfs fi show
Label: none  uuid: 84c8e78b-9d7f-4451-966d-3c25154e89b8
Total devices 2 FS bytes used 22.16GiB
devid2 size 100.00GiB used 25.03GiB path 
/dev/mapper/VG_BTRFS2-LV_ROOT2
devid3 size 100.00GiB used 25.03GiB path 
/dev/mapper/VG_BTRFS3-LV_ROOT3


Label: none  uuid: 059ab98f-eb63-471d-b099-6561baf39040
Total devices 2 FS bytes used 61.04GiB
devid2 size 200.00GiB used 62.03GiB path 
/dev/mapper/VG_BTRFS2-LV_HOME2
devid3 size 200.00GiB used 62.03GiB path 
/dev/mapper/VG_BTRFS3-LV_HOME3


Label: none  uuid: ffe8b1a0-030c-42c2-94f5-b7e8e54b1439
Total devices 2 FS bytes used 342.04MiB
devid2 size 1.00GiB used 693.62MiB path 
/dev/mapper/VG_BTRFS2-LV_BOOT2
devid3 size 1.00GiB used 693.62MiB path 
/dev/mapper/VG_BTRFS3-LV_BOOT3--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html