Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
Hi all, I recently ran into this problem on my KVM host after I migrated all my VMs to use unpartitioned virtual disks for everything except the /boot disk. A little investigation revealed that the problem was related to pvscan processes started by the 69-lvm-metad.rules udev script. At the end of that script there is a conditional check that results in the two branches "systemd_background" and "direct_pvscan" (lines 92 and 93). The error case uses the "direct_pvscan" branch and results in the delays. Changing the script so that the "systemd_background" branch is always used instead (just add a 'GOTO="systemd_background"' line before line 92) completely solves the delays for me, both on virtual and physical systems. I also tested online hotplug and did not notice regressions with the change. Reverting the change from line 475 in the debian-patches file (https://sources.debian.org/src/lvm2/2.03.07-1/debian/patches/debian-changes/#L475) however does not bring any improvement. I hope this helps anyone running into this problem. Regards Oliver
Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
Control: reassign -1 lvm2 On Thu, 6 Jul 2017 13:22:38 -0700 Rob Lesliewrote: > > On Jul 6, 2017, at 8:03 AM, Felipe Sateler wrote: > > > > > % systemd-analyze > > > Startup finished in 2min 55.491s (kernel) + 13.076s (userspace) = 3min > > > 8.568s > > > > This suggests the kernel is the one slowing things down. > > I initially thought so too, but attempts to debug the kernel with e.g. > 'initcall_debug' were not fruitful. Instead indications point to something > happening in early initramfs userspace. > > > Can you try booting with the old jessie kernel? This looks like a kernel > > regression. > > I have confirmed that the problem still exists even when booting a stretch > system using the most recent jessie kernel. > > I have tried modifying the initramfs scripts to arrange for systemd-udevd to > be > passed the '--debug' option. Here are some relevant excerpts: > > > % grep 793 systemd-udevd.debug > > seq 793 queued, 'change' 'block' > > seq 793 running > > seq 793 '/devices/virtual/block/dm-2' is taking a long time > > > % tail -n 5 systemd-udevd.debug > > giving up waiting for workers to finish > > event loop failed: Connection timed out > > Unload module index > > Unloaded link configuration context. > > timeout '/sbin/lvm pvscan --cache --activate ay --major 253 --minor 2' > > > This of course suggests the problem may actually lie with lvm2. Please feel > free > to reassign if you agree, or suggest a further course of investigation. This does look like it might be a lvm2 bug indeed, which is why I'm going to reassign this bug now, as I think the lvm2 maintainers have a better chance helping you debug this. If this turns out to be a bug in udev, please reassign back. Regards, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
I'm experiencing the same behavior with a lvm pv on top of a md mirror: Nov 2 22:03:13 zero systemd-udevd[515]: seq 4167 '/devices/virtual/block/md4' is taking a long time pvscan PV /dev/sda VG BACKUP0 lvm2 [9.10 TiB / 98.00 GiB free] PV /dev/md4 VG MAIN0 lvm2 [9.10 TiB / 273.93 GiB free] Total: 2 [18.19 TiB] / in use: 2 [18.19 TiB] / in no VG: 0 [0 ] cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md4 : active raid1 sdc4[1] sdb4[0] 9765778432 blocks super 1.2 [2/2] [UU] bitmap: 4/73 pages [16KB], 65536KB chunk I first thought, the delay has something to do with the network setup, as it appears during the 'Raise network interfaces' message when systemd counts about two and a half minute before continuing, but I think that's more a coincidence.
Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
> On Jul 6, 2017, at 8:03 AM, Felipe Satelerwrote: > > > % systemd-analyze > > Startup finished in 2min 55.491s (kernel) + 13.076s (userspace) = 3min > > 8.568s > > This suggests the kernel is the one slowing things down. I initially thought so too, but attempts to debug the kernel with e.g. 'initcall_debug' were not fruitful. Instead indications point to something happening in early initramfs userspace. > Can you try booting with the old jessie kernel? This looks like a kernel > regression. I have confirmed that the problem still exists even when booting a stretch system using the most recent jessie kernel. I have tried modifying the initramfs scripts to arrange for systemd-udevd to be passed the '--debug' option. Here are some relevant excerpts: > % grep 793 systemd-udevd.debug > seq 793 queued, 'change' 'block' > seq 793 running > seq 793 '/devices/virtual/block/dm-2' is taking a long time > % tail -n 5 systemd-udevd.debug > giving up waiting for workers to finish > event loop failed: Connection timed out > Unload module index > Unloaded link configuration context. > timeout '/sbin/lvm pvscan --cache --activate ay --major 253 --minor 2' This of course suggests the problem may actually lie with lvm2. Please feel free to reassign if you agree, or suggest a further course of investigation. Thanks. -- Rob Leslie r...@mars.org
Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
On Wed, Jul 5, 2017 at 11:38 PM, Rob Lesliewrote: > Package: udev > Version: 232-25 > Severity: important > File: /lib/systemd/systemd-udevd > > Dear Maintainer, > > After upgrading to stretch I've discovered that some machines are taking a > long time to boot -- more than two minutes -- with no immediately obvious > cause. For example: > > > % systemd-analyze > > Startup finished in 2min 55.491s (kernel) + 13.076s (userspace) = 3min > 8.568s > This suggests the kernel is the one slowing things down. > > During most of this time, nothing appears on the console except a cursor. > Kernel and system logs show nothing to pinpoint the issue other than long > time gaps, e.g.: > > > [3.194336] clocksource: Switched to clocksource tsc > > [ 121.499692] md: linear personality registered for level -1 > > Adding 'debug' to the kernel command line however I got this clue: > > > [ 64.386629] systemd-udevd[88]: seq 905 '/devices/virtual/block/dm-5' > is taking a long time > This is just udev complaining. Udev does not generate the devices, it waits for the kernel to notify it. > > After some further investigation it appears the problem stems from having > an > LVM PV inside another LV. Here is how I am able to recreate the problem on > a > fresh stretch install: > > > # lvcreate -n pvtest -L 1g $vg > > # pvcreate /dev/$vg/pvtest > > (where $vg is some existing volume group name) > > Now rebooting the system will result in the long boot delay. > > This sort of setup did not cause any problems in jessie. An obvious > workaround > is not to put PVs inside other LVs, but this is sometimes a convenient > arrangement and I would hope it could be made to work without an excessive > boot delay. > Can you try booting with the old jessie kernel? This looks like a kernel regression. -- Saludos, Felipe Sateler
Bug#867368: systemd-udevd: delays boot for 2+ minutes when LVM PV is inside a LV
Package: udev Version: 232-25 Severity: important File: /lib/systemd/systemd-udevd Dear Maintainer, After upgrading to stretch I've discovered that some machines are taking a long time to boot -- more than two minutes -- with no immediately obvious cause. For example: > % systemd-analyze > Startup finished in 2min 55.491s (kernel) + 13.076s (userspace) = 3min 8.568s During most of this time, nothing appears on the console except a cursor. Kernel and system logs show nothing to pinpoint the issue other than long time gaps, e.g.: > [3.194336] clocksource: Switched to clocksource tsc > [ 121.499692] md: linear personality registered for level -1 Adding 'debug' to the kernel command line however I got this clue: > [ 64.386629] systemd-udevd[88]: seq 905 '/devices/virtual/block/dm-5' is > taking a long time After some further investigation it appears the problem stems from having an LVM PV inside another LV. Here is how I am able to recreate the problem on a fresh stretch install: > # lvcreate -n pvtest -L 1g $vg > # pvcreate /dev/$vg/pvtest (where $vg is some existing volume group name) Now rebooting the system will result in the long boot delay. This sort of setup did not cause any problems in jessie. An obvious workaround is not to put PVs inside other LVs, but this is sometimes a convenient arrangement and I would hope it could be made to work without an excessive boot delay. Thanks. -- Package-specific info: -- System Information: Debian Release: 9.0 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages udev depends on: ii adduser 3.115 ii dpkg 1.18.24 ii libacl1 2.2.52-3+b1 ii libblkid12.29.2-1 ii libc62.24-11+deb9u1 ii libkmod2 23-2 ii libselinux1 2.6-3+b1 ii libudev1 232-25 ii lsb-base 9.20161125 ii procps 2:3.3.12-3 ii util-linux 2.29.2-1 udev recommends no packages. udev suggests no packages. Versions of packages udev is related to: ii systemd 232-25 -- no debconf information