subject:"\[linux\-lvm\] \[PATCH 1\/1\] pvscan\: wait for udevd"

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-26 Thread Oleksandr Natalenko

On 22 February 2021 14:04:17 CET, Christian Hesse  wrote:
>Zdenek Kabelac  on Mon, 2021/02/22 10:57:
>> > I've gone through the various tasks that dmeventd is responsible
>for,
>> > and I couldn't see anything that'd be strictly necessary during
>early
>> > boot. I may be overlooking something of course. Couldn't the
>monitoring  
>> 
>> As said - during ramdisk boot - monitor shall not be used (AFAIK -
>dracut
>> is supposed to use disabled monitoring in it's modified copy of
>lvm.conf
>> within ramdisk)
>
>I could not find anything in dracut that modifies lvm.conf, but looks
>like
>dracut calls the lvm commands with `--ignoremonitoring`.
>
>To date this is not handled in lvm2's mkinitcpio hook... Wondering if
>that would help. Oleksandr, you could undo the udev workaround, then
>apply
>the following diff to /usr/lib/initcpio/install/lvm2, regenerate the
>initramfs and retry?
>
>--- lvm2_install   (revision 408582)
>+++ lvm2_install   (working copy)
>@@ -34,6 +34,7 @@
> add_file "/usr/lib/udev/rules.d/95-dm-notify.rules"
>add_file "/usr/lib/initcpio/udev/11-dm-initramfs.rules"
>"/usr/lib/udev/rules.d/11-dm-initramfs.rules"
> add_file "/etc/lvm/lvm.conf"
>+sed -i '/^\smonitoring =/s/1/0/' "${BUILDROOT}/etc/lvm/lvm.conf"
> 
> # this udev rule is specific for systemd and non-systemd systems
> if command -v add_systemd_unit >/dev/null; then

Hi.

This changes nothing, sorry. The issue is still there.
-- 
  Best regards,
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Christian Hesse

Zdenek Kabelac  on Mon, 2021/02/22 10:57:
> > I've gone through the various tasks that dmeventd is responsible for,
> > and I couldn't see anything that'd be strictly necessary during early
> > boot. I may be overlooking something of course. Couldn't the monitoring  
> 
> As said - during ramdisk boot - monitor shall not be used (AFAIK - dracut
> is supposed to use disabled monitoring in it's modified copy of lvm.conf
> within ramdisk)

I could not find anything in dracut that modifies lvm.conf, but looks like
dracut calls the lvm commands with `--ignoremonitoring`.

To date this is not handled in lvm2's mkinitcpio hook... Wondering if
that would help. Oleksandr, you could undo the udev workaround, then apply
the following diff to /usr/lib/initcpio/install/lvm2, regenerate the
initramfs and retry?

--- lvm2_install(revision 408582)
+++ lvm2_install(working copy)
@@ -34,6 +34,7 @@
 add_file "/usr/lib/udev/rules.d/95-dm-notify.rules"
 add_file "/usr/lib/initcpio/udev/11-dm-initramfs.rules" 
"/usr/lib/udev/rules.d/11-dm-initramfs.rules"
 add_file "/etc/lvm/lvm.conf"
+sed -i '/^\smonitoring =/s/1/0/' "${BUILDROOT}/etc/lvm/lvm.conf"
 
 # this udev rule is specific for systemd and non-systemd systems
 if command -v add_systemd_unit >/dev/null; then
-- 
main(a){char*c=/*Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/*Best regards my address:*/=0;b=c[a++];)
putchar(b-1/(/*Chriscc -ox -xc - && ./x*/b/42*2-3)*42);}


pgpIMWDV4siKL.pgp
Description: OpenPGP digital signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck

On Fri, 2021-02-19 at 23:47 +0100, Zdenek Kabelac wrote:
> 
> Right time is when switch is finished and we have rootfs with /usr
> available - should be ensured by  lvm2-monitor.service and it
> dependencies.

While we're at it - I'm wondering why dmeventd is started so early. dm-
event.service on recent installments has only "Requires=dm-
event.socket", so it'll be started almost immediately after switching
root. In particular, it doesn't wait for any sort of device
initialization or udev initialization.

I've gone through the various tasks that dmeventd is responsible for,
and I couldn't see anything that'd be strictly necessary during early
boot. I may be overlooking something of course. Couldn't the monitoring
be delayed to after local-fs.target, for example?

(This is also related to our previous discussion about
external_device_info_source=udev; we found that dmeventd was one of the
primary sources of strange errors with that setting).

Regards
Martin

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck

On Fri, 2021-02-19 at 10:37 -0600, David Teigland wrote:
> On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:
> > > Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> > > autoactivation.
> > > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat
> > > failed:
> > > No such file or directory
> > 
> > What's going on here? pvscan trying to start dmeventd ? Why ?
> > There's a
> > dedicated service for starting dmeventd (lvm2-monitor.service). I
> > can
> > see that running dmeventd makes sense as you have thin pools, but
> > I'm
> > at a loss why it has to be started at that early stage during boot
> > already.
> > 
> > This is a curious message, it looks as if pvscan was running from
> > an
> > environment (initramfs??) where dmeventd wasn't available. The
> > message
> > is repeated, and after that, pvscan appears to hang...
> 
> I've found that when pvscan activates a VG, there's a bit of code
> that
> attempts to monitor any LVs that are already active in the VG. 
> Monitoring
> means interacting with dmeventd.  I don't know why it's doing that,
> it
> seems strange, but the logic around monitoring in lvm seems ad hoc
> and in
> need of serious reworking.  In this case I'm guessing there's already
> an
> LV active in "sys", perhaps from direct activation in initrd, and
> when
> pvscan activates that VG it attempts to monitor the already active
> LV.
> 
> Another missing piece in lvm monitoring is that we don't have a way
> to
> start lvm2-monitor/dmeventd at the right time (I'm not sure anyone
> even
> knows when the right time is), so we get random behavior depending on
> if
> it's running or not at a given point.  In this case, it looks like it
> happens to not be running yet.  I sometimes suggest disabling lvm2-
> monitor
> and starting it manually once the system is up, to avoid having it
> interfere during startup.

That sounds familiar.

> 
> > > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat
> > > failed:
> > > No such file or directory
> > > Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> > > sys/pool.
> > > Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event
> > > activation
> > > on device 9:0...
> 
> The unwanted failed monitoring seems to have caused the pvscan
> command to
> exit with an error, which then leads to further mess and confusion
> where
> systemd then thinks it should stop or kill the pvscan service,
> whatever
> that means.

The way I read Oleksandr's logs, systemd is killing all processes
because it wants to switch root, not because of errors in the pvscan
service. The weird thing is that that fails for one of the pvscan tasks
(253:2), and that that service continues to "run" (rather, "hang") long
after the root switch has happened.

Thanks,
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck

On Thu, 2021-02-18 at 16:30 +0100, Oleksandr Natalenko wrote:
> > 
> > So what's timing out here is the attempt to _stop_ pvscan. That's
> > curious. It looks like a problem in pvscan to me, not having
> > reacted to
> > a TERM signal for 30s.
> > 
> > It's also worth noting that the parallel pvscan process for device
> > 9:0
> > terminated correctly (didn't hang).
> 
> Yes, pvscan seems to not react to SIGTERM. I have
> DefaultTimeoutStopSec=30s, if I set this to 90s, pvscan hangs for 90s
> respectively.
> 

Good point. That allows us to conclude that pvscan may hang on exit
when udevd isn't available at the time (has been already stopped). That
positively looks like an lvm problem. The After= is a viable
workaround, nothing more and nothing less. We'd need to run pvscan with
increased debug/log level to figure out why it doesn't stop. Given that
you have a workaround, I'm not sure if it's worth the effort for you.

What strikes me more in your logs is the fact that systemd proceeds
with switching root even though the pvscan@253:2 service hasn't
terminated yet. That looks a bit fishy, really. systemd should have
KILLed pvscan before proceeding.

Martin

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Zdenek Kabelac


Dne 21. 02. 21 v 21:23 Martin Wilck napsal(a):

On Fri, 2021-02-19 at 23:47 +0100, Zdenek Kabelac wrote:


Right time is when switch is finished and we have rootfs with /usr
available - should be ensured by  lvm2-monitor.service and it
dependencies.


While we're at it - I'm wondering why dmeventd is started so early. dm-
event.service on recent installments has only "Requires=dm-
event.socket", so it'll be started almost immediately after switching
root. In particular, it doesn't wait for any sort of device
initialization or udev initialization.


Hi

Dmeventd alone does not depend on lvm2 in any way - it's the lvm2 plugin which 
then does all the 'scanning' for VGs/LVs and gets loaded when lvm2 connects to 
monitoring socket. That's also why dmeventd belongs to dm subsystem.


Dmeventd is nothing else then a process to check DM devices periodically - and 
can be used by i.e. dmraid or others...


So as such it doesn't need any devices - but it needs to be initialized early 
so it can accept connections from tools like lvm2 and starts to monitor a 
device without delaying command (as lvm2 wait for confirmation device is 
monitored).



I've gone through the various tasks that dmeventd is responsible for,
and I couldn't see anything that'd be strictly necessary during early
boot. I may be overlooking something of course. Couldn't the monitoring


As said - during ramdisk boot - monitor shall not be used (AFAIK - dracut is 
supposed to use disabled monitoring in it's modified copy of lvm.conf within 
ramdisk)


But we want to switch to monitoring ASAP when we switch to rootfs - so the 
'unmonitored' window is as small as possible - there are still same 'grey' 
areas in the correct logic thought...


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread Zdenek Kabelac


Dne 19. 02. 21 v 17:37 David Teigland napsal(a):

On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:

Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
autoactivation.
Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
No such file or directory


What's going on here? pvscan trying to start dmeventd ? Why ? There's a
dedicated service for starting dmeventd (lvm2-monitor.service). I can
see that running dmeventd makes sense as you have thin pools, but I'm
at a loss why it has to be started at that early stage during boot
already.

This is a curious message, it looks as if pvscan was running from an
environment (initramfs??) where dmeventd wasn't available. The message
is repeated, and after that, pvscan appears to hang...


I've found that when pvscan activates a VG, there's a bit of code that
attempts to monitor any LVs that are already active in the VG.  Monitoring
means interacting with dmeventd.  I don't know why it's doing that, it
seems strange, but the logic around monitoring in lvm seems ad hoc and in
need of serious reworking.  In this case I'm guessing there's already an
LV active in "sys", perhaps from direct activation in initrd, and when
pvscan activates that VG it attempts to monitor the already active LV.


The existing design for lvm2 rootfs using was like:

Activate 'LV' within ramdisk by dracut - which discovers rootfs VG/LV
and activates it (by rather 'brute-force' naive approach).

Such activation is WITHOUT monitoring - as ramdisk is without 'dmeventd'
and we do not want to 'lock' the binary from ramdisk into memory.

So once the system switches to rootfs - 'vgchange --monitor y' enables 
monitoring for all activated LVs from ramdisk and process continues.


Event based activation within ramdisk is a 3rd. party initiative by Arch linux 
and thus needs to be 'reinvented' with its own problems that arise from this.


So far - in lvm2 the current dracut method is more maintainable.


Another missing piece in lvm monitoring is that we don't have a way to
start lvm2-monitor/dmeventd at the right time (I'm not sure anyone even
knows when the right time is), so we get random behavior depending on if
it's running or not at a given point.  In this case, it looks like it
happens to not be running yet.  I sometimes suggest disabling lvm2-monitor
and starting it manually once the system is up, to avoid having it
interfere during startup.


Right time is when switch is finished and we have rootfs with /usr
available - should be ensured by  lvm2-monitor.service and it dependencies.

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread David Teigland

On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:
> > Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> > autoactivation.
> > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> > No such file or directory
> 
> What's going on here? pvscan trying to start dmeventd ? Why ? There's a
> dedicated service for starting dmeventd (lvm2-monitor.service). I can
> see that running dmeventd makes sense as you have thin pools, but I'm
> at a loss why it has to be started at that early stage during boot
> already.
> 
> This is a curious message, it looks as if pvscan was running from an
> environment (initramfs??) where dmeventd wasn't available. The message
> is repeated, and after that, pvscan appears to hang...

I've found that when pvscan activates a VG, there's a bit of code that
attempts to monitor any LVs that are already active in the VG.  Monitoring
means interacting with dmeventd.  I don't know why it's doing that, it
seems strange, but the logic around monitoring in lvm seems ad hoc and in
need of serious reworking.  In this case I'm guessing there's already an
LV active in "sys", perhaps from direct activation in initrd, and when
pvscan activates that VG it attempts to monitor the already active LV.

Another missing piece in lvm monitoring is that we don't have a way to
start lvm2-monitor/dmeventd at the right time (I'm not sure anyone even
knows when the right time is), so we get random behavior depending on if
it's running or not at a given point.  In this case, it looks like it
happens to not be running yet.  I sometimes suggest disabling lvm2-monitor
and starting it manually once the system is up, to avoid having it
interfere during startup.

> > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> > No such file or directory
> > Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> > sys/pool.
> > Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation
> > on device 9:0...

The unwanted failed monitoring seems to have caused the pvscan command to
exit with an error, which then leads to further mess and confusion where
systemd then thinks it should stop or kill the pvscan service, whatever
that means.  pvscan should probably never exit with an error.

Dave

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread Martin Wilck

On Wed, 2021-02-17 at 14:38 +0100, Oleksandr Natalenko wrote:
> Hi.
> 

Thanks for the logs!

> I'm not sure this issue is reproducible with any kind of LVM layout.
> What I have is thin-LVM-on-LUKS-on-LVM:

I saw MD in your other logs...?

More comments below.

> With regard to the journal, here it is (from the same machine in the
> Arch bugreport; matches the second layout above):
> 
> 
> [~]> LC_TIME=C sudo journalctl -b -10 -u lvm2-pvscan@\*
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:28:05 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Starting LVM event activation
> on device 9:0...
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux lvm[463]:   2 logical volume(s) in volume
> group "base" now active
> Feb 10 17:24:17 archlinux systemd[1]: Finished LVM event activation
> on device 9:0.
> Feb 10 17:24:26 archlinux systemd[1]: Starting LVM event activation
> on device 253:2...
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.

All good up to here, but then...

> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory

What's going on here? pvscan trying to start dmeventd ? Why ? There's a
dedicated service for starting dmeventd (lvm2-monitor.service). I can
see that running dmeventd makes sense as you have thin pools, but I'm
at a loss why it has to be started at that early stage during boot
already.

This is a curious message, it looks as if pvscan was running from an
environment (initramfs??) where dmeventd wasn't available. The message
is repeated, and after that, pvscan appears to hang...


> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation
> on device 9:0...

Here I suppose systemd is switching root, and trying to stop jobs,
including the pvscan job.


> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State
> 'stop-sigterm' timed out. Killing.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing
> process 643 (lvm) with signal SIGKILL.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main
> process exited, code=killed, status=9/KILL
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed
> with result 'timeout'.
> Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on
> device 253:2.

So what's timing out here is the attempt to _stop_ pvscan. That's
curious. It looks like a problem in pvscan to me, not having reacted to
a TERM signal for 30s.

It's also worth noting that the parallel pvscan process for device 9:0
terminated correctly (didn't hang).

> 
> [~]> LC_TIME=C sudo journalctl -b -10 --grep pvscan
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:31:27 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Created slice system-
> lvm2\x2dpvscan.slice.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux audit[1]: SERVICE_START pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/init" hostname=? addr=? terminal=? res=success'
> Feb 10 17:24:17 archlinux kernel: audit: type=1130
> audit(1612974257.986:6): pid=1 uid=0 auid=4294967295 ses=4294967295 
> msg='unit=lvm2-pvscan@9:0 comm="systemd" exe="/init" hostname=?
> addr=? terminal=? res=success'
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Control
> process exited, code=killed, status=15/TERM
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Failed
> with result 'signal'.
> Feb 10 17:24:26 spock audit[1]: SERVICE_STOP pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
> terminal=? res=failed'
> Feb 10 17:24:27 spock systemd[1]: Requested transaction contradicts
> existing jobs: Transaction for lvm2-pvscan@253:2.service/start is
> destructive (lvm2-pvscan@253:2.service has 'stop' job queued, but
> 'sta

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread Oleksandr Natalenko

Hi.

(will comment only on what I can comment, dropping the rest)

On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:
> > I'm not sure this issue is reproducible with any kind of LVM layout.
> > What I have is thin-LVM-on-LUKS-on-LVM:
> 
> I saw MD in your other logs...?

FWIW, one of the machines have soft RAID, another does not, the issue
is reproducible regardless of whether there's soft RAID or not.

> > Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> > autoactivation.
> > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> > No such file or directory
> 
> What's going on here? pvscan trying to start dmeventd ? Why ? There's a
> dedicated service for starting dmeventd (lvm2-monitor.service). I can
> see that running dmeventd makes sense as you have thin pools, but I'm
> at a loss why it has to be started at that early stage during boot
> already.
> 
> This is a curious message, it looks as if pvscan was running from an
> environment (initramfs??) where dmeventd wasn't available. The message
> is repeated, and after that, pvscan appears to hang...

Not sure either. FWIW, real root is on a thin volume (everything is,
in fact, except /boot and swap).

> > Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> > No such file or directory
> > Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> > sys/pool.
> > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State
> > 'stop-sigterm' timed out. Killing.
> > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing
> > process 643 (lvm) with signal SIGKILL.
> > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main
> > process exited, code=killed, status=9/KILL
> > Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed
> > with result 'timeout'.
> > Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on
> > device 253:2.
> 
> So what's timing out here is the attempt to _stop_ pvscan. That's
> curious. It looks like a problem in pvscan to me, not having reacted to
> a TERM signal for 30s.
> 
> It's also worth noting that the parallel pvscan process for device 9:0
> terminated correctly (didn't hang).

Yes, pvscan seems to not react to SIGTERM. I have
DefaultTimeoutStopSec=30s, if I set this to 90s, pvscan hangs for 90s
respectively.

-- 
  Best regards,
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-18 Thread Oleksandr Natalenko

Hello.

On Wed, Feb 17, 2021 at 02:49:00PM +0100, Martin Wilck wrote:
> On Wed, 2021-02-17 at 13:03 +0100, Christian Hesse wrote:
> > 
> > Let's keep this in mind. Now let's have a look at udevd startup: It
> > signals
> > being ready by calling sd_notifyf(), but it loads rules and applies
> > permissions before doing so [0].
> > Even before we have some code about handling events and monitoring
> > stuff.
> 
> It loads the rules, but events will only be processed after entering
> sd_event_loop(), which happens after the sd_notify() call.
> 
> Anyway, booting the system with "udev.log-priority=debug" might provide
> further insight. Oleksandr, could you try that (without the After=
> directive)?

Yes.

pvscan: http://ix.io/2PLK
udev: http://ix.io/2PLL
lvm: http://ix.io/2PLM

Let me know if I can collect something else.

Thanks.

-- 
  Oleksandr Natalenko (post-factum)

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-17 Thread Oleksandr Natalenko

Hi.

Please check my log excerpts and comments below.

On Wed, Feb 17, 2021 at 01:03:29PM +0100, Christian Hesse wrote:
> > But in general, I think this needs deeper analysis. Looking at
> > https://bugs.archlinux.org/task/69611, the workaround appears to have
> > been found simply by drawing an analogy to a previous similar case.
> > I'd like to understand what happened on the arch system when the error
> > occured, and why this simple ordering directive avoided it.
> 
> As said I can not reproduce it myself... Oleksandr, can you give more details?
> Possibly everything from journal regarding systemd-udevd.service (and
> systemd-udevd.socket) and lvm2-pvscan@*.service could help.

I'm not sure this issue is reproducible with any kind of LVM layout.
What I have is thin-LVM-on-LUKS-on-LVM:

```
[~]> lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme0n1259:00 238,5G  0 disk
├─nvme0n1p1259:10 8M  0 part  /boot/EFI
└─nvme0n1p2259:20 238,5G  0 part
  ├─base-boot  254:00   256M  0 lvm   /boot
  └─base-sys   254:10 238,2G  0 lvm
└─sys  254:20 238,2G  0 crypt
  ├─sys-swap   254:30 2G  0 lvm   [SWAP]
  ├─sys-pool_tmeta 254:40   120M  0 lvm
  │ └─sys-pool-tpool   254:60   236G  0 lvm
  │   ├─sys-pool   254:70   236G  1 lvm
  │   ├─sys-root   254:80   235G  0 lvm   /
  │   ├─sys-home   254:90   235G  0 lvm   /home
  │   ├─sys-containers 254:10   0   235G  0 lvm   
/mnt/containers
  │   ├─sys-vms254:11   0   235G  0 lvm   /mnt/vms
  │   ├─sys-email  254:12   0   235G  0 lvm   /mnt/email
  │   ├─sys-vm--01--rhel6--sda 254:13   0 8G  0 lvm
  │   ├─sys-vm--02--rhel7--sda 254:14   016G  0 lvm
  │   ├─sys-vm--03--rhel8--sda 254:15   0 8G  0 lvm
  │   ├─sys-vm--04--rhel6--32--sda 254:16   0 8G  0 lvm
  │   ├─sys-vm--05--rhel5--sda 254:17   0 8G  0 lvm
  │   ├─sys-vm--06--fedora--sda254:18   0 8G  0 lvm
  │   └─sys-vm--02--rhel7--sdb 254:19   0 1G  0 lvm
  └─sys-pool_tdata 254:50   236G  0 lvm
└─sys-pool-tpool   254:60   236G  0 lvm
  ├─sys-pool   254:70   236G  1 lvm
  ├─sys-root   254:80   235G  0 lvm   /
  ├─sys-home   254:90   235G  0 lvm   /home
  ├─sys-containers 254:10   0   235G  0 lvm   
/mnt/containers
  ├─sys-vms254:11   0   235G  0 lvm   /mnt/vms
  ├─sys-email  254:12   0   235G  0 lvm   /mnt/email
  ├─sys-vm--01--rhel6--sda 254:13   0 8G  0 lvm
  ├─sys-vm--02--rhel7--sda 254:14   016G  0 lvm
  ├─sys-vm--03--rhel8--sda 254:15   0 8G  0 lvm
  ├─sys-vm--04--rhel6--32--sda 254:16   0 8G  0 lvm
  ├─sys-vm--05--rhel5--sda 254:17   0 8G  0 lvm
  ├─sys-vm--06--fedora--sda254:18   0 8G  0 lvm
  └─sys-vm--02--rhel7--sdb 254:19   0 1G  0 lvm

```

(you may ask why? because full disk encryption is mandated by my
employer, which is, surprise-surprise, Red Hat ☺; I'll Cc myself to
@rh address as well just to be sure I do not miss anything in the future)

Also, this doesn't happen if the amount of thin volumes is small (like,
less than 5). So, in order to reproduce this locally you may need to
throw quite of LVs into the system.

The issue is reproducible with this layout as well:

```
[~]> lsblk
NAME MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT
nvme0n1  259:00 465,8G  0 disk
├─nvme0n1p1  259:10 8M  0 part   /boot/EFI
└─nvme0n1p2  259:20   465G  0 part
  └─md09:00 464,9G  0 raid10
├─base-boot  253:00   384M  0 lvm/boot
└─base-sys   253:10 464,5G  0 lvm
  └─sys  253:20 464,5G  0 crypt
├─sys-swap   253:30 2G  0 lvm[SWAP]
├─sys-pool_tmeta 253:40   116M  0 lvm
│ └─sys-pool-tpool   253:60 462,3G  0 lvm
│   ├─sys-pool   253:70 462,3G  1 lvm
│   ├─sys-root   253:80   462G  0 lvm/
│   ├─sys-home   253:90   462G  0 lvm/home
│   ├─sys-vms253:10   0   462G  0 lvm/mnt/vms
│   ├─sys-vm--01--archlinux--sda 253:11   0 8G  0 lvm

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-17 Thread Martin Wilck

On Wed, 2021-02-17 at 13:03 +0100, Christian Hesse wrote:
> 
> Let's keep this in mind. Now let's have a look at udevd startup: It
> signals
> being ready by calling sd_notifyf(), but it loads rules and applies
> permissions before doing so [0].
> Even before we have some code about handling events and monitoring
> stuff.

It loads the rules, but events will only be processed after entering
sd_event_loop(), which happens after the sd_notify() call.

Anyway, booting the system with "udev.log-priority=debug" might provide
further insight. Oleksandr, could you try that (without the After=
directive)?

> So I guess pvscan is started in initialization phase before udevd
> signals
> being ready. And obviously there is any kind of race condition.

Right. Some uevent might arrive between the creation of the monitor
socket in monitor_new() and entering the event loop. Such event would
be handled immediately, and possibly before systemd receives the
sd_notify message, so a race condition looks possible.

> 
> With the ordering "After=" in `lvm2-pvscan@.service` the service
> start is
> queued at initialization phase, but actual start and pvscan execution
> is
> delayed until udevd signaled being ready.
> 
> > But in general, I think this needs deeper analysis. Looking at
> > https://bugs.archlinux.org/task/69611, the workaround appears to
> > have
> > been found simply by drawing an analogy to a previous similar case.
> > I'd like to understand what happened on the arch system when the
> > error
> > occured, and why this simple ordering directive avoided it.
> 
> As said I can not reproduce it myself... Oleksandr, can you give more
> details?
> Possibly everything from journal regarding systemd-udevd.service (and
> systemd-udevd.socket) and lvm2-pvscan@*.service could help.
> 
> > 1. How had the offending pvscan process been started? I'd expect
> > that
> > "pvscan" (unlike "lvm monitor" in our case) was started by an udev
> > rule. If udevd hadn't started yet, how would that udev rule have be
> > executed? OTOH, if pvscan had not been started by udev but by
> > another
> > systemd service, than *that* service would probably need to get the
> > After=systemd-udevd.service directive.
> 
> To my understanding it was started from udevd by a rule in
> `69-dm-lvm-metad.rules`.
> 
> (BTW, renaming that rule file may make sense now that lvm2-metad is
> gone...)
> 
> > 2. Even without the "After=" directive, I'd assume that pvscan
> > wasn't
> > started "before" systemd-udevd, but rather "simultaneously" (i.e.
> > in
> > the same systemd transaction). Thus systemd-udevd should have
> > started
> > up while pvscan was running, and pvscan should have noticed that
> > udevd
> > eventually became available. Why did pvscan time out? What was it
> > waiting for? We know that lvm checks for the existence of
> > "/run/udev/control", but that should have become avaiable after
> > some
> > fractions of a second of waiting.
> 
> I do not think there is anything starting pvscan before udevd.

I agree. The race described above looks at least possible.
I would go one step further and say that *every* systemd service that
might be started from an udev rule should have an "After=systemd-
udevd.service".

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-17 Thread Martin Wilck

On Wed, 2021-02-17 at 09:49 +0800, heming.z...@suse.com wrote:
> On 2/11/21 7:16 PM, Christian Hesse wrote:
> > From: Christian Hesse 
> > 
> > Running the scan before udevd finished startup may result in
> > failure.
> > This has been reported for Arch Linux [0] and proper ordering fixes
> > the issue.
> > 
> > [0] https://bugs.archlinux.org/task/69611
> > 
> > Signed-off-by: Christian Hesse 
> > ---
> >   scripts/lvm2-pvscan.service.in | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/scripts/lvm2-pvscan.service.in b/scripts/lvm2-
> > pvscan.service.in
> > index 09753e8c9..7b4ace551 100644
> > --- a/scripts/lvm2-pvscan.service.in
> > +++ b/scripts/lvm2-pvscan.service.in
> > @@ -4,6 +4,7 @@ Documentation=man:pvscan(8)
> >   DefaultDependencies=no
> >   StartLimitIntervalSec=0
> >   BindsTo=dev-block-%i.device
> > +After=systemd-udevd.service
> >   Before=shutdown.target
> >   Conflicts=shutdown.target
> >   
> > 
> 
> I watched a similar issue with lvm2-monitor.service.
> In a very old machine (i586), udevd cost too much time to finish, it
> triggered
> lvm2-monitor timeout then reported:
> > WARNING: Device /dev/sda not initialized in udev database even
> > after waiting 1000 microseconds.
> 
> One workable solution is to add "systemd-udev-settle.service"
> (obsoleted) or
> "local-fs.target" in "After" of lvm2-monitor.service.

We have to differentiate here. In our case we had to wait for "systemd-
udev-settle.service". In the arch case, it was only necessary to wait
for systemd-udevd.service itself. "After=systemd-udevd.service" just
means that the daemon is up, it says nothing about any device
initialization being completed.

But in general, I think this needs deeper analysis. Looking at
https://bugs.archlinux.org/task/69611, the workaround appears to have
been found simply by drawing an analogy to a previous similar case.
I'd like to understand what happened on the arch system when the error
occured, and why this simple ordering directive avoided it.

1. How had the offending pvscan process been started? I'd expect that
"pvscan" (unlike "lvm monitor" in our case) was started by an udev
rule. If udevd hadn't started yet, how would that udev rule have be
executed? OTOH, if pvscan had not been started by udev but by another
systemd service, than *that* service would probably need to get the
After=systemd-udevd.service directive.

2. Even without the "After=" directive, I'd assume that pvscan wasn't
started "before" systemd-udevd, but rather "simultaneously" (i.e. in
the same systemd transaction). Thus systemd-udevd should have started
up while pvscan was running, and pvscan should have noticed that udevd
eventually became available. Why did pvscan time out? What was it
waiting for? We know that lvm checks for the existence of
"/run/udev/control", but that should have become avaiable after some
fractions of a second of waiting.

Regards,
Martin

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-11 Thread Christian Hesse

From: Christian Hesse 

Running the scan before udevd finished startup may result in failure.
This has been reported for Arch Linux [0] and proper ordering fixes
the issue.

[0] https://bugs.archlinux.org/task/69611

Signed-off-by: Christian Hesse 
---
 scripts/lvm2-pvscan.service.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/lvm2-pvscan.service.in b/scripts/lvm2-pvscan.service.in
index 09753e8c9..7b4ace551 100644
--- a/scripts/lvm2-pvscan.service.in
+++ b/scripts/lvm2-pvscan.service.in
@@ -4,6 +4,7 @@ Documentation=man:pvscan(8)
 DefaultDependencies=no
 StartLimitIntervalSec=0
 BindsTo=dev-block-%i.device
+After=systemd-udevd.service
 Before=shutdown.target
 Conflicts=shutdown.target
 


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

[linux-lvm] [PATCH 1/1] pvscan: wait for udevd

15 matches

Site Navigation

Mail list logo

Footer information