Bug#1028541: lvm2: LVM filters render server unbootable
Hi, On Tue, 29 Aug 2023 17:25:23 +0200 Friedrich Weber wrote: > I'm seeing this bug in a different usecase on Debian Bookworm with LVM > 2.03.16-2: multipath is set up, the multipath device is an LVM > physical volume in a volume group with a thin pool. To prevent LVM from > picking up on the multipath components, /etc/lvm/lvm.conf has a > global_filter that rejects the multipath components by matching on their > /dev/disk/by-id symlink paths. FWIW, for this usecase there seems to be a viable workaround: Instead of manually adding a global_filter that ignores multipath components, rely on LVM's own multipath component detection (available since LVM 2.03.13 [1]) that reads /etc/multipath/wwids. Installing multipath-tools-boot makes this file available in initramfs, and then detection also works in early boot. The description of multipath-tools-boot states that it should not be installed if not booting from a multipath device, but currently I don't see any downside of installing it here (not booting from a multipath device). Still, is there a chance the mentioned patches could be backported? Without them, global_filter is not functioning as expected. > https://sourceware.org/git/?p=lvm2.git;a=commit;h=17a3585cbb55d9a15ced9775a18b50c53a50ee8e > https://sourceware.org/git/?p=lvm2.git;a=commit;h=c9fdc828ff0504bc2e57f65862bc382f7663a8a2 > https://sourceware.org/git/?p=lvm2.git;a=commit;h=6d14144d311fb347e4225ad6a48d4900b39445c4 > https://sourceware.org/git/?p=lvm2.git;a=commit;h=bd05318ba2fc588be6339f5dc61f09195996b0e9 Best, Friedrich [1] https://gitlab.com/lvmteam/lvm2/-/commit/90485650931d3fc04d00c92a729050c8743969e5 [2] https://packages.debian.org/bookworm/multipath-tools-boot
Bug#1028541: lvm2: LVM filters render server unbootable
Hi, I'm seeing this bug in a different usecase on Debian Bookworm with LVM 2.03.16-2: multipath is set up, the multipath device is an LVM physical volume in a volume group with a thin pool. To prevent LVM from picking up on the multipath components, /etc/lvm/lvm.conf has a global_filter that rejects the multipath components by matching on their /dev/disk/by-id symlink paths. I have replicated this setup in a VM, with the following global_filter in /etc/lvm/lvm.conf: devices { global_filter=["r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1|","r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2|"] } The relevant portion of /dev/disk/by-id: lrwxrwxrwx 1 root root 9 Aug 29 16:31 scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb lrwxrwxrwx 1 root root 9 Aug 29 16:31 scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 -> ../../sdc After running update-initramfs and rebooting, pvs and other LVM tooling reports the following warning: # pvs WARNING: Device mismatch detected for somegroup/somethinpool_tmeta which is accessing /dev/sdb instead of /dev/mapper/mpatha. WARNING: Device mismatch detected for somegroup/somethinpool_tdata which is accessing /dev/sdb instead of /dev/mapper/mpatha. PV VGFmt Attr PSize PFree /dev/mapper/mpatha somegroup lvm2 a-- <4.00g <2.99g >From reading this report and the now-resolved upstream report, this seems to happen because the /dev/disk/by-id symlinks are not available by the time the LVM udev hooks run, so the r|...| filters do not have any effect. Indeed, if I use r|/dev/sdb| and r|/dev/sdc| instead, run update-initramfs and reboot, the warning does not appear anymore. However, being able to use the /dev/disk/by-id paths would be preferable. With the following four patches applied, I can use /dev/disk/by-id in the filters and the warning does not appear: https://sourceware.org/git/?p=lvm2.git;a=commit;h=17a3585cbb55d9a15ced9775a18b50c53a50ee8e https://sourceware.org/git/?p=lvm2.git;a=commit;h=c9fdc828ff0504bc2e57f65862bc382f7663a8a2 https://sourceware.org/git/?p=lvm2.git;a=commit;h=6d14144d311fb347e4225ad6a48d4900b39445c4 https://sourceware.org/git/?p=lvm2.git;a=commit;h=bd05318ba2fc588be6339f5dc61f09195996b0e9 The first three patches are mentioned in the upstream bug report [1] and cause pvscan to read symlink names from udev's DEVLINKS environment variable under certain conditions. One of the conditions is that at least one of the filter regexes refer to a symlink. However, this check only considers a|...| filters [2], so it doesn't trigger if only r|...| filters are used as above. Hence, in my case the fourth patch is also needed, as it removes the filter regex check altogether. Is there a chance the patches could be backported? All four patches seem to be included in upstream release 2.03.19 [3]. Happy to provide any more information if needed! Thanks and best wishes, Friedrich [1] https://github.com/lvmteam/lvm2/issues/104 [2] https://sourceware.org/git/?p=lvm2.git;a=blob;f=lib/filters/filter-regex.c;h=ecc32914b0e15ba9cbac5c101cffddf25eddd8ad;hb=6d14144d311fb347e4225ad6a48d4900b39445c4#l272 [3] https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/tags/v2_03_19
Bug#1028541: lvm2: LVM filters render server unbootable
On Tue, Jan 17, 2023 at 08:13:33AM +0100, Christian Herzog wrote: > update: we were told by upstream that there is a known instability between lvm > and udev-generated symlinks and a devices file should be used instead. So > that's what we're going to do. I think I actually know what the problem is. pvscan is run during the udev event handling, esp in the initramfs where no systemd is available to move that out. Modifications to devices and symlinks are only applies at the end of the event. So symlinks will always be missing on the first event. If you have systemd running, it uses systemd-run, then it is just a race condition between udev and systemd, which one is faster in finishing. The only way to fix this is to provide the symlink information to pvscan in addition to the device itself and let it figure that out. Regards, Bastian -- No problem is insoluble. -- Dr. Janet Wallace, "The Deadly Years", stardate 3479.4
Bug#1028541: lvm2: LVM filters render server unbootable
Dear Bastian, update: we were told by upstream that there is a known instability between lvm and udev-generated symlinks and a devices file should be used instead. So that's what we're going to do. In related news, I'll create another bug report shortly, but it's a small one. thanks, -Christian -- Dr. Christian Herzog support: +41 44 633 26 68 Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50 Department of Physics, ETH Zurich 8093 Zurich, Switzerland http://isg.phys.ethz.ch/
Bug#1028541: lvm2: LVM filters render server unbootable
Dear Bastian, thanks for picking up on this. We've done some more research, and we now believe the issue to be upstream, so we've opened a bug report directly with lvm: https://github.com/lvmteam/lvm2/issues/104 If you check the lvm debug log we posted there, you'll see that it correctly picks up the filter, finds and scans the right device (sda3), but then rejects it since at the time of scanning, /dev/disk/by-path/pci-:04:00.0-sas-phy0-lun-0-part3 (the one in the filter) doesn't exist. This might be a race condition, since on some reboots it sees part1 and part2, on some only part1, but never part3. I could also reproduce the problem in Arch (Fedora, surprisingly, has too old of an LVM version). to your questions: > >- manually activating the root VG in busybox allows us to boot > > (by copy/pasting the IMPORT{program} lines from the udev rule) > > Which one? "pvscan"? That one does not activate anything. correct, but I don't think that's relevant any longer. > >- replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on > > bookworm with the bullseye versions fixes the problem > > What are you replacing exactly? The bullseye version did not include > /lib/udev/rules.d/69-lvm.rules at all, see > https://packages.debian.org/bullseye/amd64/lvm2/filelist. correct, I used bullseye's 69-lvm-metad.rules and renamed it to 69-lvm.rules on bookworm. > Please provide the output of "pvs", "vgs", "lvs" and the kernel log. again, I don't think it's relevant, but to help understand the situation better: PV VG Fmt Attr PSize PFree /dev/disk/by-path/pci-:04:00.0-sas-phy0-lun-0-part3 test-bookworm-vg lvm2 a-- <2.73t <2.45t VG #PV #LV #SN Attr VSize VFree test-bookworm-vg 1 10 4 wz--n- <2.73t <2.45t LVVG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert home test-bookworm-vg owi-aos--- 10.00g root test-bookworm-vg owi-aos--- 23.28g swap_1test-bookworm-vg -wi-ao 976.00m var test-bookworm-vg owi-aos--- 9.31g and pci-:04:00.0-sas-phy0-lun-0 -> ../../sda pci-:04:00.0-sas-phy0-lun-0-part1 -> ../../sda1 pci-:04:00.0-sas-phy0-lun-0-part2 -> ../../sda2 pci-:04:00.0-sas-phy0-lun-0-part3 -> ../../sda3 Device StartEndSectors Size Type /dev/sda1 2048 4095 20481M BIOS boot /dev/sda2 40961003519 999424 488M Linux filesystem /dev/sda3 1003520 5860532223 5859528704 2.7T Linux LVM thanks and kind regards, -Christian -- Dr. Christian Herzog support: +41 44 633 26 68 Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50 Department of Physics, ETH Zurich 8093 Zurich, Switzerland http://isg.phys.ethz.ch/
Bug#1028541: lvm2: LVM filters render server unbootable
Hi On Thu, Jan 12, 2023 at 03:18:55PM +0100, Christian Herzog wrote: >on our storage servers, we employ LVM filters to hide data partitions >from the OS (since they're iSCSI exported to the frontend >fileserver). With bookworm, lvm does not activate the root VG when >filters are in place. So far we have been able to establish the >following facts: >- with the default global_filter settings, it does boot Okay. >- with global_filter = [ "a|pci-:04.*|", "r|.*|" ] (to only > activate the root VG) bookworm drops into busybox (no root fs > found) So it could be that the filter does not apply that early. >- manually activating the root VG in busybox allows us to boot > (by copy/pasting the IMPORT{program} lines from the udev rule) Which one? "pvscan"? That one does not activate anything. >- replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on > bookworm with the bullseye versions fixes the problem What are you replacing exactly? The bullseye version did not include /lib/udev/rules.d/69-lvm.rules at all, see https://packages.debian.org/bullseye/amd64/lvm2/filelist. >- the problem seems to be related (but not identical) to #1018730 This one is about partial VG. > We've already spent 2 days trying to narrow down the underlying cause as > much as possible and we'd be happy to provide any additional information > since for us this is a bookworm deal breaker. Please provide the output of "pvs", "vgs", "lvs" and the kernel log. Bastian -- I'm a soldier, not a diplomat. I can only tell the truth. -- Kirk, "Errand of Mercy", stardate 3198.9
Bug#1028541: lvm2: LVM filters render server unbootable
Package: lvm2 Version: 2.03.16-2 Severity: important Dear Maintainer, * What led up to the situation? on our storage servers, we employ LVM filters to hide data partitions from the OS (since they're iSCSI exported to the frontend fileserver). With bookworm, lvm does not activate the root VG when filters are in place. So far we have been able to establish the following facts: - with the default global_filter settings, it does boot - with global_filter = [ "a|pci-:04.*|", "r|.*|" ] (to only activate the root VG) bookworm drops into busybox (no root fs found) - manually activating the root VG in busybox allows us to boot (by copy/pasting the IMPORT{program} lines from the udev rule) - replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on bookworm with the bullseye versions fixes the problem - the problem seems to be related (but not identical) to #1018730 We've already spent 2 days trying to narrow down the underlying cause as much as possible and we'd be happy to provide any additional information since for us this is a bookworm deal breaker. thanks, -Christian -- System Information: Debian Release: bookworm/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 6.0.0-6-amd64 (SMP w/40 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE=en Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages lvm2 depends on: ii dmeventd 2:1.02.185-2 ii dmsetup2:1.02.185-2 ii libaio10.3.113-3 ii libblkid1 2.38.1-4 ii libc6 2.36-7 ii libdevmapper-event1.02.1 2:1.02.185-2 ii libedit2 3.1-20221030-2 ii libselinux13.4-1+b4 ii libsystemd0252.4-1 ii libudev1 252.4-1 ii lsb-base 11.5 ii sysvinit-utils [lsb-base] 3.06-2 Versions of packages lvm2 recommends: pn thin-provisioning-tools lvm2 suggests no packages. -- Configuration Files: /etc/lvm/lvm.conf changed: config { # Configuration option config/checks. # If enabled, any LVM configuration mismatch is reported. # This implies checking that the configuration key is understood by # LVM and that the value of the key is the proper type. If disabled, # any configuration mismatch is ignored and the default value is used # without any warning (a message about the configuration key not being # found is issued in verbose mode only). # This configuration option has an automatic default value. # checks = 1 # Configuration option config/abort_on_errors. # Abort the LVM process if a configuration mismatch is found. # This configuration option has an automatic default value. # abort_on_errors = 0 # Configuration option config/profile_dir. # Directory where LVM looks for configuration profiles. # This configuration option has an automatic default value. # profile_dir = "/etc/lvm/profile" } devices { # Configuration option devices/dir. # Directory in which to create volume group device nodes. # Commands also accept this as a prefix on volume group names. # This configuration option is advanced. # This configuration option has an automatic default value. # dir = "/dev" # Configuration option devices/scan. # Directories containing device nodes to use with LVM. # This configuration option is advanced. # This configuration option has an automatic default value. # scan = [ "/dev" ] # Configuration option devices/obtain_device_list_from_udev. # Obtain the list of available devices from udev. # This avoids opening or using any inapplicable non-block devices or # subdirectories found in the udev directory. Any device node or # symlink not managed by udev in the udev directory is ignored. This # setting applies only to the udev-managed device directory; other # directories will be scanned fully. LVM needs to be compiled with # udev support for this setting to apply. # This configuration option has an automatic default value. obtain_device_list_from_udev = 1 # Configuration option devices/external_device_info_source. # Enable device information from udev. # If set to "udev", lvm will supplement its own native device information # with information from libudev. This can potentially improve the detection # of MD component devices and multipath component devices. # This configuration option has an automatic default value. external_device_info_source = "udev" # Configuration option