Bug#1028541: lvm2: LVM filters render server unbootable

2024-01-10 Thread Friedrich Weber
Hi,

On Tue, 29 Aug 2023 17:25:23 +0200 Friedrich Weber 
wrote:
> I'm seeing this bug in a different usecase on Debian Bookworm with LVM
> 2.03.16-2: multipath is set up, the multipath device is an LVM
> physical volume in a volume group with a thin pool. To prevent LVM from
> picking up on the multipath components, /etc/lvm/lvm.conf has a
> global_filter that rejects the multipath components by matching on their
> /dev/disk/by-id symlink paths.

FWIW, for this usecase there seems to be a viable workaround: Instead of
manually adding a global_filter that ignores multipath components, rely
on LVM's own multipath component detection (available since LVM 2.03.13
[1]) that reads /etc/multipath/wwids. Installing multipath-tools-boot
makes this file available in initramfs, and then detection also works in
early boot. The description of multipath-tools-boot states that it
should not be installed if not booting from a multipath device, but
currently I don't see any downside of installing it here (not booting
from a multipath device).

Still, is there a chance the mentioned patches could be backported?
Without them, global_filter is not functioning as expected.

> https://sourceware.org/git/?p=lvm2.git;a=commit;h=17a3585cbb55d9a15ced9775a18b50c53a50ee8e
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=c9fdc828ff0504bc2e57f65862bc382f7663a8a2
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=6d14144d311fb347e4225ad6a48d4900b39445c4
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=bd05318ba2fc588be6339f5dc61f09195996b0e9

Best,

Friedrich

[1]
https://gitlab.com/lvmteam/lvm2/-/commit/90485650931d3fc04d00c92a729050c8743969e5
[2] https://packages.debian.org/bookworm/multipath-tools-boot



Bug#1028541: lvm2: LVM filters render server unbootable

2023-08-29 Thread Friedrich Weber
Hi,

I'm seeing this bug in a different usecase on Debian Bookworm with LVM
2.03.16-2: multipath is set up, the multipath device is an LVM
physical volume in a volume group with a thin pool. To prevent LVM from
picking up on the multipath components, /etc/lvm/lvm.conf has a
global_filter that rejects the multipath components by matching on their
/dev/disk/by-id symlink paths.

I have replicated this setup in a VM, with the following global_filter
in /etc/lvm/lvm.conf:

devices {

global_filter=["r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1|","r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2|"]
}

The relevant portion of /dev/disk/by-id:

lrwxrwxrwx 1 root root  9 Aug 29 16:31
scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb
lrwxrwxrwx 1 root root  9 Aug 29 16:31
scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 -> ../../sdc

After running update-initramfs and rebooting, pvs and other LVM
tooling reports the following warning:

# pvs
  WARNING: Device mismatch detected for somegroup/somethinpool_tmeta
which is accessing /dev/sdb instead of /dev/mapper/mpatha.
  WARNING: Device mismatch detected for somegroup/somethinpool_tdata
which is accessing /dev/sdb instead of /dev/mapper/mpatha.
  PV VGFmt  Attr PSize  PFree
  /dev/mapper/mpatha somegroup lvm2 a--  <4.00g <2.99g

>From reading this report and the now-resolved upstream report, this
seems to happen because the /dev/disk/by-id symlinks are not available
by the time the LVM udev hooks run, so the r|...| filters do not have
any effect. Indeed, if I use r|/dev/sdb| and r|/dev/sdc| instead, run
update-initramfs and reboot, the warning does not appear anymore.
However, being able to use the /dev/disk/by-id paths would be preferable.

With the following four patches applied, I can use /dev/disk/by-id in
the filters and the warning does not appear:

https://sourceware.org/git/?p=lvm2.git;a=commit;h=17a3585cbb55d9a15ced9775a18b50c53a50ee8e
https://sourceware.org/git/?p=lvm2.git;a=commit;h=c9fdc828ff0504bc2e57f65862bc382f7663a8a2
https://sourceware.org/git/?p=lvm2.git;a=commit;h=6d14144d311fb347e4225ad6a48d4900b39445c4
https://sourceware.org/git/?p=lvm2.git;a=commit;h=bd05318ba2fc588be6339f5dc61f09195996b0e9

The first three patches are mentioned in the upstream bug report [1] and
cause pvscan to read symlink names from udev's DEVLINKS environment
variable under certain conditions. One of the conditions is that at
least one of the filter regexes refer to a symlink. However, this check
only considers a|...| filters [2], so it doesn't trigger if only r|...|
filters are used as above. Hence, in my case the fourth patch is also
needed, as it removes the filter regex check altogether.

Is there a chance the patches could be backported? All four patches seem
to be included in upstream release 2.03.19 [3].

Happy to provide any more information if needed!

Thanks and best wishes,

Friedrich

[1] https://github.com/lvmteam/lvm2/issues/104
[2]
https://sourceware.org/git/?p=lvm2.git;a=blob;f=lib/filters/filter-regex.c;h=ecc32914b0e15ba9cbac5c101cffddf25eddd8ad;hb=6d14144d311fb347e4225ad6a48d4900b39445c4#l272
[3] https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/tags/v2_03_19



Bug#1028541: lvm2: LVM filters render server unbootable

2023-01-17 Thread Bastian Blank
On Tue, Jan 17, 2023 at 08:13:33AM +0100, Christian Herzog wrote:
> update: we were told by upstream that there is a known instability between lvm
> and udev-generated symlinks and a devices file should be used instead. So
> that's what we're going to do.

I think I actually know what the problem is.  pvscan is run during the
udev event handling, esp in the initramfs where no systemd is available
to move that out.  Modifications to devices and symlinks are only
applies at the end of the event.  So symlinks will always be missing on
the first event.

If you have systemd running, it uses systemd-run, then it is just a race
condition between udev and systemd, which one is faster in finishing.

The only way to fix this is to provide the symlink information to pvscan
in addition to the device itself and let it figure that out.

Regards,
Bastian

-- 
No problem is insoluble.
-- Dr. Janet Wallace, "The Deadly Years", stardate 3479.4



Bug#1028541: lvm2: LVM filters render server unbootable

2023-01-16 Thread Christian Herzog
Dear Bastian,

update: we were told by upstream that there is a known instability between lvm
and udev-generated symlinks and a devices file should be used instead. So
that's what we're going to do.
In related news, I'll create another bug report shortly, but it's a small one.

thanks,
-Christian


-- 
Dr. Christian Herzog   support: +41 44 633 26 68
Head, IT Services Group, HPT H 8  voice: +41 44 633 39 50
Department of Physics, ETH Zurich   
8093 Zurich, Switzerland http://isg.phys.ethz.ch/



Bug#1028541: lvm2: LVM filters render server unbootable

2023-01-16 Thread Christian Herzog
Dear Bastian,

thanks for picking up on this. We've done some more research, and we now
believe the issue to be upstream, so we've opened a bug report directly with
lvm: https://github.com/lvmteam/lvm2/issues/104
If you check the lvm debug log we posted there, you'll see that it correctly
picks up the filter, finds and scans the right device (sda3), but then rejects
it since at the time of scanning,
/dev/disk/by-path/pci-:04:00.0-sas-phy0-lun-0-part3 (the one in the
filter) doesn't exist. This might be a race condition, since on some reboots
it sees part1 and part2, on some only part1, but never part3.
I could also reproduce the problem in Arch (Fedora, surprisingly, has too old
of an LVM version).

to your questions:

> >- manually activating the root VG in busybox allows us to boot
> >  (by copy/pasting the IMPORT{program} lines from the udev rule)
> 
> Which one?  "pvscan"?  That one does not activate anything.
correct, but I don't think that's relevant any longer.

> >- replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on
> >  bookworm with the bullseye versions fixes the problem
> 
> What are you replacing exactly?  The bullseye version did not include
> /lib/udev/rules.d/69-lvm.rules at all, see
> https://packages.debian.org/bullseye/amd64/lvm2/filelist.
correct, I used bullseye's 69-lvm-metad.rules and renamed it to 69-lvm.rules
on bookworm.

> Please provide the output of "pvs", "vgs", "lvs" and the kernel log.
again, I don't think it's relevant, but to help understand the situation
better:

  PV  VG   Fmt
Attr PSize  PFree 
  /dev/disk/by-path/pci-:04:00.0-sas-phy0-lun-0-part3 test-bookworm-vg
lvm2 a--  <2.73t <2.45t

  VG   #PV #LV #SN Attr   VSize  VFree 
  test-bookworm-vg   1  10   4 wz--n- <2.73t <2.45t

  LVVG   Attr   LSize   Pool Origin Data%  Meta%  Move
Log Cpy%Sync Convert
  home  test-bookworm-vg owi-aos---  10.00g 
   
  root  test-bookworm-vg owi-aos---  23.28g 
   
  swap_1test-bookworm-vg -wi-ao 976.00m 
   
  var   test-bookworm-vg owi-aos---   9.31g 
   

and 

pci-:04:00.0-sas-phy0-lun-0 -> ../../sda
pci-:04:00.0-sas-phy0-lun-0-part1 -> ../../sda1
pci-:04:00.0-sas-phy0-lun-0-part2 -> ../../sda2
pci-:04:00.0-sas-phy0-lun-0-part3 -> ../../sda3

Device   StartEndSectors  Size Type
/dev/sda1 2048   4095   20481M BIOS boot
/dev/sda2 40961003519 999424  488M Linux filesystem
/dev/sda3  1003520 5860532223 5859528704  2.7T Linux LVM


thanks and kind regards,
-Christian



-- 
Dr. Christian Herzog   support: +41 44 633 26 68
Head, IT Services Group, HPT H 8  voice: +41 44 633 39 50
Department of Physics, ETH Zurich   
8093 Zurich, Switzerland http://isg.phys.ethz.ch/



Bug#1028541: lvm2: LVM filters render server unbootable

2023-01-14 Thread Bastian Blank
Hi

On Thu, Jan 12, 2023 at 03:18:55PM +0100, Christian Herzog wrote:
>on our storage servers, we employ LVM filters to hide data partitions
>from the OS (since they're iSCSI exported to the frontend
>fileserver). With bookworm, lvm does not activate the root VG when
>filters are in place. So far we have been able to establish the
>following facts:
>- with the default global_filter settings, it does boot

Okay.

>- with global_filter = [ "a|pci-:04.*|", "r|.*|" ] (to only
>  activate the root VG) bookworm drops into busybox (no root fs
>  found)

So it could be that the filter does not apply that early.

>- manually activating the root VG in busybox allows us to boot
>  (by copy/pasting the IMPORT{program} lines from the udev rule)

Which one?  "pvscan"?  That one does not activate anything.

>- replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on
>  bookworm with the bullseye versions fixes the problem

What are you replacing exactly?  The bullseye version did not include
/lib/udev/rules.d/69-lvm.rules at all, see
https://packages.debian.org/bullseye/amd64/lvm2/filelist.

>- the problem seems to be related (but not identical) to #1018730

This one is about partial VG.

> We've already spent 2 days trying to narrow down the underlying cause as
> much as possible and we'd be happy to provide any additional information
> since for us this is a bookworm deal breaker.

Please provide the output of "pvs", "vgs", "lvs" and the kernel log.

Bastian

-- 
I'm a soldier, not a diplomat.  I can only tell the truth.
-- Kirk, "Errand of Mercy", stardate 3198.9



Bug#1028541: lvm2: LVM filters render server unbootable

2023-01-12 Thread Christian Herzog
Package: lvm2
Version: 2.03.16-2
Severity: important

Dear Maintainer,

   * What led up to the situation?
   on our storage servers, we employ LVM filters to hide data partitions
   from the OS (since they're iSCSI exported to the frontend
   fileserver). With bookworm, lvm does not activate the root VG when
   filters are in place. So far we have been able to establish the
   following facts:
   - with the default global_filter settings, it does boot
   - with global_filter = [ "a|pci-:04.*|", "r|.*|" ] (to only
 activate the root VG) bookworm drops into busybox (no root fs
 found)
   - manually activating the root VG in busybox allows us to boot
 (by copy/pasting the IMPORT{program} lines from the udev rule)
   - replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on
 bookworm with the bullseye versions fixes the problem
   - the problem seems to be related (but not identical) to #1018730

We've already spent 2 days trying to narrow down the underlying cause as
much as possible and we'd be happy to provide any additional information
since for us this is a bookworm deal breaker.


thanks,
-Christian




-- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-6-amd64 (SMP w/40 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.UTF-8), LANGUAGE=en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages lvm2 depends on:
ii  dmeventd   2:1.02.185-2
ii  dmsetup2:1.02.185-2
ii  libaio10.3.113-3
ii  libblkid1  2.38.1-4
ii  libc6  2.36-7
ii  libdevmapper-event1.02.1   2:1.02.185-2
ii  libedit2   3.1-20221030-2
ii  libselinux13.4-1+b4
ii  libsystemd0252.4-1
ii  libudev1   252.4-1
ii  lsb-base   11.5
ii  sysvinit-utils [lsb-base]  3.06-2

Versions of packages lvm2 recommends:
pn  thin-provisioning-tools  

lvm2 suggests no packages.

-- Configuration Files:
/etc/lvm/lvm.conf changed:
config {
# Configuration option config/checks.
# If enabled, any LVM configuration mismatch is reported.
# This implies checking that the configuration key is understood by
# LVM and that the value of the key is the proper type. If disabled,
# any configuration mismatch is ignored and the default value is used
# without any warning (a message about the configuration key not being
# found is issued in verbose mode only).
# This configuration option has an automatic default value.
# checks = 1
# Configuration option config/abort_on_errors.
# Abort the LVM process if a configuration mismatch is found.
# This configuration option has an automatic default value.
# abort_on_errors = 0
# Configuration option config/profile_dir.
# Directory where LVM looks for configuration profiles.
# This configuration option has an automatic default value.
# profile_dir = "/etc/lvm/profile"
}
devices {
# Configuration option devices/dir.
# Directory in which to create volume group device nodes.
# Commands also accept this as a prefix on volume group names.
# This configuration option is advanced.
# This configuration option has an automatic default value.
# dir = "/dev"
# Configuration option devices/scan.
# Directories containing device nodes to use with LVM.
# This configuration option is advanced.
# This configuration option has an automatic default value.
# scan = [ "/dev" ]
# Configuration option devices/obtain_device_list_from_udev.
# Obtain the list of available devices from udev.
# This avoids opening or using any inapplicable non-block devices or
# subdirectories found in the udev directory. Any device node or
# symlink not managed by udev in the udev directory is ignored. This
# setting applies only to the udev-managed device directory; other
# directories will be scanned fully. LVM needs to be compiled with
# udev support for this setting to apply.
# This configuration option has an automatic default value.
obtain_device_list_from_udev = 1
# Configuration option devices/external_device_info_source.
# Enable device information from udev.
# If set to "udev", lvm will supplement its own native device 
information
# with information from libudev. This can potentially improve the 
detection
# of MD component devices and multipath component devices.
# This configuration option has an automatic default value.
external_device_info_source = "udev"
# Configuration option