Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd
Dne 19. 02. 21 v 17:37 David Teigland napsal(a): On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote: Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] VG sys run autoactivation. Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: No such file or directory What's going on here? pvscan trying to start dmeventd ? Why ? There's a dedicated service for starting dmeventd (lvm2-monitor.service). I can see that running dmeventd makes sense as you have thin pools, but I'm at a loss why it has to be started at that early stage during boot already. This is a curious message, it looks as if pvscan was running from an environment (initramfs??) where dmeventd wasn't available. The message is repeated, and after that, pvscan appears to hang... I've found that when pvscan activates a VG, there's a bit of code that attempts to monitor any LVs that are already active in the VG. Monitoring means interacting with dmeventd. I don't know why it's doing that, it seems strange, but the logic around monitoring in lvm seems ad hoc and in need of serious reworking. In this case I'm guessing there's already an LV active in "sys", perhaps from direct activation in initrd, and when pvscan activates that VG it attempts to monitor the already active LV. The existing design for lvm2 rootfs using was like: Activate 'LV' within ramdisk by dracut - which discovers rootfs VG/LV and activates it (by rather 'brute-force' naive approach). Such activation is WITHOUT monitoring - as ramdisk is without 'dmeventd' and we do not want to 'lock' the binary from ramdisk into memory. So once the system switches to rootfs - 'vgchange --monitor y' enables monitoring for all activated LVs from ramdisk and process continues. Event based activation within ramdisk is a 3rd. party initiative by Arch linux and thus needs to be 'reinvented' with its own problems that arise from this. So far - in lvm2 the current dracut method is more maintainable. Another missing piece in lvm monitoring is that we don't have a way to start lvm2-monitor/dmeventd at the right time (I'm not sure anyone even knows when the right time is), so we get random behavior depending on if it's running or not at a given point. In this case, it looks like it happens to not be running yet. I sometimes suggest disabling lvm2-monitor and starting it manually once the system is up, to avoid having it interfere during startup. Right time is when switch is finished and we have rootfs with /usr available - should be ensured by lvm2-monitor.service and it dependencies. Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd
On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote: > > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] VG sys run > > autoactivation. > > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > > No such file or directory > > What's going on here? pvscan trying to start dmeventd ? Why ? There's a > dedicated service for starting dmeventd (lvm2-monitor.service). I can > see that running dmeventd makes sense as you have thin pools, but I'm > at a loss why it has to be started at that early stage during boot > already. > > This is a curious message, it looks as if pvscan was running from an > environment (initramfs??) where dmeventd wasn't available. The message > is repeated, and after that, pvscan appears to hang... I've found that when pvscan activates a VG, there's a bit of code that attempts to monitor any LVs that are already active in the VG. Monitoring means interacting with dmeventd. I don't know why it's doing that, it seems strange, but the logic around monitoring in lvm seems ad hoc and in need of serious reworking. In this case I'm guessing there's already an LV active in "sys", perhaps from direct activation in initrd, and when pvscan activates that VG it attempts to monitor the already active LV. Another missing piece in lvm monitoring is that we don't have a way to start lvm2-monitor/dmeventd at the right time (I'm not sure anyone even knows when the right time is), so we get random behavior depending on if it's running or not at a given point. In this case, it looks like it happens to not be running yet. I sometimes suggest disabling lvm2-monitor and starting it manually once the system is up, to avoid having it interfere during startup. > > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > > No such file or directory > > Feb 10 17:24:26 archlinux lvm[643]: WARNING: Failed to monitor > > sys/pool. > > Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation > > on device 9:0... The unwanted failed monitoring seems to have caused the pvscan command to exit with an error, which then leads to further mess and confusion where systemd then thinks it should stop or kill the pvscan service, whatever that means. pvscan should probably never exit with an error. Dave ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd
On Wed, 2021-02-17 at 14:38 +0100, Oleksandr Natalenko wrote: > Hi. > Thanks for the logs! > I'm not sure this issue is reproducible with any kind of LVM layout. > What I have is thin-LVM-on-LUKS-on-LVM: I saw MD in your other logs...? More comments below. > With regard to the journal, here it is (from the same machine in the > Arch bugreport; matches the second layout above): > > > [~]> LC_TIME=C sudo journalctl -b -10 -u lvm2-pvscan@\* > -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021- > 02-17 14:28:05 CET. -- > Feb 10 17:24:17 archlinux systemd[1]: Starting LVM event activation > on device 9:0... > Feb 10 17:24:17 archlinux lvm[463]: pvscan[463] PV /dev/md0 online, > VG base is complete. > Feb 10 17:24:17 archlinux lvm[463]: pvscan[463] VG base run > autoactivation. > Feb 10 17:24:17 archlinux lvm[463]: 2 logical volume(s) in volume > group "base" now active > Feb 10 17:24:17 archlinux systemd[1]: Finished LVM event activation > on device 9:0. > Feb 10 17:24:26 archlinux systemd[1]: Starting LVM event activation > on device 253:2... > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] PV /dev/mapper/sys > online, VG sys is complete. All good up to here, but then... > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] VG sys run > autoactivation. > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > No such file or directory What's going on here? pvscan trying to start dmeventd ? Why ? There's a dedicated service for starting dmeventd (lvm2-monitor.service). I can see that running dmeventd makes sense as you have thin pools, but I'm at a loss why it has to be started at that early stage during boot already. This is a curious message, it looks as if pvscan was running from an environment (initramfs??) where dmeventd wasn't available. The message is repeated, and after that, pvscan appears to hang... > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > No such file or directory > Feb 10 17:24:26 archlinux lvm[643]: WARNING: Failed to monitor > sys/pool. > Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation > on device 9:0... Here I suppose systemd is switching root, and trying to stop jobs, including the pvscan job. > Feb 10 17:24:26 archlinux lvm[720]: pvscan[720] PV /dev/md0 online. > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > No such file or directory > Feb 10 17:24:26 archlinux lvm[643]: WARNING: Failed to monitor > sys/pool. > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State > 'stop-sigterm' timed out. Killing. > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing > process 643 (lvm) with signal SIGKILL. > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main > process exited, code=killed, status=9/KILL > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed > with result 'timeout'. > Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on > device 253:2. So what's timing out here is the attempt to _stop_ pvscan. That's curious. It looks like a problem in pvscan to me, not having reacted to a TERM signal for 30s. It's also worth noting that the parallel pvscan process for device 9:0 terminated correctly (didn't hang). > > [~]> LC_TIME=C sudo journalctl -b -10 --grep pvscan > -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021- > 02-17 14:31:27 CET. -- > Feb 10 17:24:17 archlinux systemd[1]: Created slice system- > lvm2\x2dpvscan.slice. > Feb 10 17:24:17 archlinux lvm[463]: pvscan[463] PV /dev/md0 online, > VG base is complete. > Feb 10 17:24:17 archlinux lvm[463]: pvscan[463] VG base run > autoactivation. > Feb 10 17:24:17 archlinux audit[1]: SERVICE_START pid=1 uid=0 > auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0 > comm="systemd" exe="/init" hostname=? addr=? terminal=? res=success' > Feb 10 17:24:17 archlinux kernel: audit: type=1130 > audit(1612974257.986:6): pid=1 uid=0 auid=4294967295 ses=4294967295 > msg='unit=lvm2-pvscan@9:0 comm="systemd" exe="/init" hostname=? > addr=? terminal=? res=success' > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] PV /dev/mapper/sys > online, VG sys is complete. > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] VG sys run > autoactivation. > Feb 10 17:24:26 archlinux lvm[720]: pvscan[720] PV /dev/md0 online. > Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Control > process exited, code=killed, status=15/TERM > Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Failed > with result 'signal'. > Feb 10 17:24:26 spock audit[1]: SERVICE_STOP pid=1 uid=0 > auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0 > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? > terminal=? res=failed' > Feb 10 17:24:27 spock systemd[1]: Requested transaction contradicts > existing jobs: Transaction for lvm2-pvscan@253:2.service/start is > destructive (lvm2-pvscan@253:2.service has 'stop' job queued, but >
Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd
Hi. (will comment only on what I can comment, dropping the rest) On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote: > > I'm not sure this issue is reproducible with any kind of LVM layout. > > What I have is thin-LVM-on-LUKS-on-LVM: > > I saw MD in your other logs...? FWIW, one of the machines have soft RAID, another does not, the issue is reproducible regardless of whether there's soft RAID or not. > > Feb 10 17:24:26 archlinux lvm[643]: pvscan[643] VG sys run > > autoactivation. > > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > > No such file or directory > > What's going on here? pvscan trying to start dmeventd ? Why ? There's a > dedicated service for starting dmeventd (lvm2-monitor.service). I can > see that running dmeventd makes sense as you have thin pools, but I'm > at a loss why it has to be started at that early stage during boot > already. > > This is a curious message, it looks as if pvscan was running from an > environment (initramfs??) where dmeventd wasn't available. The message > is repeated, and after that, pvscan appears to hang... Not sure either. FWIW, real root is on a thin volume (everything is, in fact, except /boot and swap). > > Feb 10 17:24:26 archlinux lvm[720]: pvscan[720] PV /dev/md0 online. > > Feb 10 17:24:26 archlinux lvm[643]: /usr/bin/dmeventd: stat failed: > > No such file or directory > > Feb 10 17:24:26 archlinux lvm[643]: WARNING: Failed to monitor > > sys/pool. > > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State > > 'stop-sigterm' timed out. Killing. > > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing > > process 643 (lvm) with signal SIGKILL. > > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main > > process exited, code=killed, status=9/KILL > > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed > > with result 'timeout'. > > Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on > > device 253:2. > > So what's timing out here is the attempt to _stop_ pvscan. That's > curious. It looks like a problem in pvscan to me, not having reacted to > a TERM signal for 30s. > > It's also worth noting that the parallel pvscan process for device 9:0 > terminated correctly (didn't hang). Yes, pvscan seems to not react to SIGTERM. I have DefaultTimeoutStopSec=30s, if I set this to 90s, pvscan hangs for 90s respectively. -- Best regards, Oleksandr Natalenko (post-factum) Principal Software Maintenance Engineer ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/