Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used
Dne 3.9.2016 v 05:17 james harvey napsal(a): On Tue, Aug 16, 2016 at 5:57 AM, Zdenek Kabelacwrote: Dne 6.8.2016 v 04:08 james harvey napsal(a): Same problem and question about if an immediate SIGKILL is OK for dmeventd. On Thu, Aug 4, 2016 at 11:20 PM, james harvey wrote: Does it matter at all if lvmetad shuts down gracefully? Can I safely just have systemd right off the bat send a SIGKILL? Most things I wouldn't ask about, but I'm wondering if this is PURELY a caching daemon where gracefully shutting down doesn't really do anything. Sigterm/sigint is ignored by dmeventd when device is monitored. Before stopping dmevend - devices shall be unmonitored. (vg/lvchange) Killing 'dmeventd' in the middle of i.e. recovery operation might leave your system in dizzy state (suspended devices) essentially useless. Somewhat similar ATM does apply to lvmetad - where lvm2 command will not like death of lvmetad in the middle of operation and this may result in operation failure (thought here the situation might get somewhat improved over the time...) - but ATM don't kill - just stop services. Fedora should be doing it properly on reboot - switching to ramdisk and continuing with shutdown sequence from there. Unsure how other OS-es solves this. Using 'kill -9' (SIGKILL) is in general unsupported and any reported problems caused by this usage are ignored... Regards Zdenek Got it. Fedora defaults to having lvm2-monitor.service enabled, Arch doesn't. (I've asked for that to be fixed.) Arch also uses a shutdown ramdisk. Using some device type WITHOUT monitoring is quite 'crazy' idea... Unless you are well aware of what you are doing, thin, raid, mirror, snapshot device should be always monitored... So IMHO a thing to fix in Arch 1) Should the lvm2-lvmetad, dm-event, and lvm2-monitor unit files be modified so they are never given a SIGKILL? Even with lvm2-monitor.service enabled, even on Fedora, if systemd sees they don't SIGTERM/SIGINT within 90 seconds (systemd v231 is 90 seconds, was 10 second before), it's sending them a SIGKILL. I think adding "SendSIGKILL=no" to the Service and Socket sections will do this, if I understand it correctly. That's a different story here - it something is 'deadlocked' and can't move forward - killing things after 90 seconds can't make the situation any more worst likely - especially if you are doing shutdown... So no - there is no plan to use such option (SendSIGKILL=no) ATM (State-machine is pretty complex and when some devices are 'forgotten' in suspend - it's quite hard to fix it). 2) Should lvm2-lvmetad and dm-event systemd unit files want lvm2-monitor.service? lvm2-lvmetad is unrelated to monitoring service (dmevent). 3) Could all LVM programs be changed so if they receive a SIGTERM/SIGINT and choose to ignore it, they give a warn/info/debug message? Not doing so invites thinking a SIGKILL is the proper thing to do. SIGINT should be handle with logging - (at least I've taken care in dmeventd - it should log this to syslog). Both daemons should be able to gracefully shutdown if they are not in use (i.e. no connection to lvmetad, no monitored device in dmeventd). lvm2 command usually block signal processing while it's holding VG lock, but it should be breakable (SIGINT) in those 'process_each_lv' loops or if the command prompts - support for SIGTERM is planned - but low-prio - so it will happen - it's known issue - but bigger fishes are there for hunting ATM... :) The best is to open BZ if you find something breaking common logic. (So it's not lost in mailing list noice). Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used
Same problem and question about if an immediate SIGKILL is OK for dmeventd. On Thu, Aug 4, 2016 at 11:20 PM, james harveywrote: > Does it matter at all if lvmetad shuts down gracefully? > > Can I safely just have systemd right off the bat send a SIGKILL? > > Most things I wouldn't ask about, but I'm wondering if this is PURELY > a caching daemon where gracefully shutting down doesn't really do > anything. > > On Wed, Aug 3, 2016 at 10:51 PM, james harvey > wrote: >> After upgrading to systemd v231, my shutdowns/reboots have a 90 second >> delay at the very end. Linux kvm 4.6.4-1. >> >> After I looked into it, I found it's due to lvmetad never terminating >> when receiving a SIGTERM, and after 90 seconds, systemd performs a >> SIGKILL. >> >> systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL >> after a SIGTERM from 10 seconds to 90 seconds. I think this bug has >> been around for quite a while, because I've noticed shutdowns had >> about a 10 second delay at the same spot that now has a 90 second >> delay. >> >> With lvmetad running with "-l all", a systemd debug dmesg log through >> shutdown is attached, after running it through "grep -i lvm2". The >> full (4MB) version is here: >> http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt >> >> This also happens if I attempt stopping lvm2-lvmetad. Attached is >> information showing that. >> >> Also attached is the minimal steps I used to cause the problem, using >> one disk and EXT4. >> >> If during the install I combine the 2 lvcreate commands into a single >> one without using thin pools, then lvmetad terminates pretty much >> immediately with SIGTERM. >> >> == >> >> # systemctl status lvm2-lvmetad >> ● lvm2-lvmetad.service - LVM2 metadata daemon >>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; >> disabled; vendor preset: disabled) >>Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago >> Docs: man:lvmetad(8) >> Main PID: 398 (lvmetad) >> Tasks: 2 (limit: 4915) >>CGroup: /system.slice/lvm2-lvmetad.service >>└─398 /usr/bin/lvmetad -f >> >> # systemctl stop lvm2-lvmetad >> {{{ after 90 seconds }}} >> Warning: Stopping lvm2-lvmetad.service, but it can still be activated by: >> lvm2-lvmetad.socket >> # systemctl status lvm2-lvmetad >> ● lvm2-lvmetad.service - LVM2 metadata daemon >>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; >> disabled; vendor preset: disabled) >>Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago >> Docs: man:lvmetad(8) >> Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL) >> Main PID: 398 (code=killed, signal=KILL) >> >> Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon. >> Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon... >> Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection. >> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State >> 'stop-sigterm' timed out. Killing. >> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing >> process 398 (lvmetad) with signal SIGKILL. >> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process >> exited, code=killed, status=9/KILL >> Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon. >> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered >> failed state. >> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with >> result 'signal'. >> >> >> >> /dev/sda1 3.5G Linux filesystem >> /dev/sda2 4.5TB Linux LVM >> >> { Setup LVM and filesystems } >> # mkfs.ext4 -L boot /dev/sda1 >> # pvcreate /dev/sda2 >> # vgcreate disk1 /dev/sda2 >> { Merging these 2 lvcreates, removing the thin volume usage makes >> lvm2-lvmetad properly terminate on SIGTERM } >> # lvcreate --size 500G --thinpool disk1thin disk1 >> # lvcreate --virtualsize 100G --name root disk1/disk1thin >> # mkfs.ext4 -L /mnt /dev/disk1/main >> # mount /dev/disk1/main /mnt >> # mkdir /mnt/boot >> # mount /dev/sda1 /mnt/boot >> >> { Install Arch Linux } >> # vi /etc/pacman.d/mirrorlist >> # pacstrap -i /mnt base syslinux gptfdisk lvm2 >> # arch-chroot /mnt >> # vi /etc/locale.gen >> # locale-gen >> # locale > /etc/locale.conf >> # vi /etc/nsswitch.conf >> # systemctl enable systemd-resolved systemd-networkd >> # ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime >> # hwclock --utc --systohc >> # passwd >> { Add lvm2 between block and filesystems } >> # vi /etc/mkinitcpio.conf >> # mkinitcpio -p linux >> # echo hostname > /etc/hostname >> # vi /etc/systemd/network/enp31s0.network >> # syslinux-install_update -i -a -m >> # vi /boot/syslinux/syslinux.cfg >> >> { After Reboot } >> # vi /etc/fstab -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used
Does it matter at all if lvmetad shuts down gracefully? Can I safely just have systemd right off the bat send a SIGKILL? Most things I wouldn't ask about, but I'm wondering if this is PURELY a caching daemon where gracefully shutting down doesn't really do anything. On Wed, Aug 3, 2016 at 10:51 PM, james harveywrote: > After upgrading to systemd v231, my shutdowns/reboots have a 90 second > delay at the very end. Linux kvm 4.6.4-1. > > After I looked into it, I found it's due to lvmetad never terminating > when receiving a SIGTERM, and after 90 seconds, systemd performs a > SIGKILL. > > systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL > after a SIGTERM from 10 seconds to 90 seconds. I think this bug has > been around for quite a while, because I've noticed shutdowns had > about a 10 second delay at the same spot that now has a 90 second > delay. > > With lvmetad running with "-l all", a systemd debug dmesg log through > shutdown is attached, after running it through "grep -i lvm2". The > full (4MB) version is here: > http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt > > This also happens if I attempt stopping lvm2-lvmetad. Attached is > information showing that. > > Also attached is the minimal steps I used to cause the problem, using > one disk and EXT4. > > If during the install I combine the 2 lvcreate commands into a single > one without using thin pools, then lvmetad terminates pretty much > immediately with SIGTERM. > > == > > # systemctl status lvm2-lvmetad > ● lvm2-lvmetad.service - LVM2 metadata daemon >Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; > disabled; vendor preset: disabled) >Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago > Docs: man:lvmetad(8) > Main PID: 398 (lvmetad) > Tasks: 2 (limit: 4915) >CGroup: /system.slice/lvm2-lvmetad.service >└─398 /usr/bin/lvmetad -f > > # systemctl stop lvm2-lvmetad > {{{ after 90 seconds }}} > Warning: Stopping lvm2-lvmetad.service, but it can still be activated by: > lvm2-lvmetad.socket > # systemctl status lvm2-lvmetad > ● lvm2-lvmetad.service - LVM2 metadata daemon >Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; > disabled; vendor preset: disabled) >Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago > Docs: man:lvmetad(8) > Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL) > Main PID: 398 (code=killed, signal=KILL) > > Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon. > Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon... > Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection. > Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State > 'stop-sigterm' timed out. Killing. > Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing > process 398 (lvmetad) with signal SIGKILL. > Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process > exited, code=killed, status=9/KILL > Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon. > Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered > failed state. > Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with > result 'signal'. > > > > /dev/sda1 3.5G Linux filesystem > /dev/sda2 4.5TB Linux LVM > > { Setup LVM and filesystems } > # mkfs.ext4 -L boot /dev/sda1 > # pvcreate /dev/sda2 > # vgcreate disk1 /dev/sda2 > { Merging these 2 lvcreates, removing the thin volume usage makes > lvm2-lvmetad properly terminate on SIGTERM } > # lvcreate --size 500G --thinpool disk1thin disk1 > # lvcreate --virtualsize 100G --name root disk1/disk1thin > # mkfs.ext4 -L /mnt /dev/disk1/main > # mount /dev/disk1/main /mnt > # mkdir /mnt/boot > # mount /dev/sda1 /mnt/boot > > { Install Arch Linux } > # vi /etc/pacman.d/mirrorlist > # pacstrap -i /mnt base syslinux gptfdisk lvm2 > # arch-chroot /mnt > # vi /etc/locale.gen > # locale-gen > # locale > /etc/locale.conf > # vi /etc/nsswitch.conf > # systemctl enable systemd-resolved systemd-networkd > # ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime > # hwclock --utc --systohc > # passwd > { Add lvm2 between block and filesystems } > # vi /etc/mkinitcpio.conf > # mkinitcpio -p linux > # echo hostname > /etc/hostname > # vi /etc/systemd/network/enp31s0.network > # syslinux-install_update -i -a -m > # vi /boot/syslinux/syslinux.cfg > > { After Reboot } > # vi /etc/fstab -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used
On Wed, Aug 3, 2016 at 10:51 PM, james harveywrote: > After upgrading to systemd v231, my shutdowns/reboots have a 90 second > delay at the very end. Linux kvm 4.6.4-1. Ignore the "kvm" here, must have typed that into the wrong window. This is on a physical system, not running kvm at this point. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used
After upgrading to systemd v231, my shutdowns/reboots have a 90 second delay at the very end. Linux kvm 4.6.4-1. After I looked into it, I found it's due to lvmetad never terminating when receiving a SIGTERM, and after 90 seconds, systemd performs a SIGKILL. systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL after a SIGTERM from 10 seconds to 90 seconds. I think this bug has been around for quite a while, because I've noticed shutdowns had about a 10 second delay at the same spot that now has a 90 second delay. With lvmetad running with "-l all", a systemd debug dmesg log through shutdown is attached, after running it through "grep -i lvm2". The full (4MB) version is here: http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt This also happens if I attempt stopping lvm2-lvmetad. Attached is information showing that. Also attached is the minimal steps I used to cause the problem, using one disk and EXT4. If during the install I combine the 2 lvcreate commands into a single one without using thin pools, then lvmetad terminates pretty much immediately with SIGTERM. == # systemctl status lvm2-lvmetad ● lvm2-lvmetad.service - LVM2 metadata daemon Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago Docs: man:lvmetad(8) Main PID: 398 (lvmetad) Tasks: 2 (limit: 4915) CGroup: /system.slice/lvm2-lvmetad.service └─398 /usr/bin/lvmetad -f # systemctl stop lvm2-lvmetad {{{ after 90 seconds }}} Warning: Stopping lvm2-lvmetad.service, but it can still be activated by: lvm2-lvmetad.socket # systemctl status lvm2-lvmetad ● lvm2-lvmetad.service - LVM2 metadata daemon Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago Docs: man:lvmetad(8) Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL) Main PID: 398 (code=killed, signal=KILL) Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon. Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon... Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection. Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State 'stop-sigterm' timed out. Killing. Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing process 398 (lvmetad) with signal SIGKILL. Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process exited, code=killed, status=9/KILL Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon. Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered failed state. Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with result 'signal'. /dev/sda1 3.5G Linux filesystem /dev/sda2 4.5TB Linux LVM { Setup LVM and filesystems } # mkfs.ext4 -L boot /dev/sda1 # pvcreate /dev/sda2 # vgcreate disk1 /dev/sda2 { Merging these 2 lvcreates, removing the thin volume usage makes lvm2-lvmetad properly terminate on SIGTERM } # lvcreate --size 500G --thinpool disk1thin disk1 # lvcreate --virtualsize 100G --name root disk1/disk1thin # mkfs.ext4 -L /mnt /dev/disk1/main # mount /dev/disk1/main /mnt # mkdir /mnt/boot # mount /dev/sda1 /mnt/boot { Install Arch Linux } # vi /etc/pacman.d/mirrorlist # pacstrap -i /mnt base syslinux gptfdisk lvm2 # arch-chroot /mnt # vi /etc/locale.gen # locale-gen # locale > /etc/locale.conf # vi /etc/nsswitch.conf # systemctl enable systemd-resolved systemd-networkd # ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime # hwclock --utc --systohc # passwd { Add lvm2 between block and filesystems } # vi /etc/mkinitcpio.conf # mkinitcpio -p linux # echo hostname > /etc/hostname # vi /etc/systemd/network/enp31s0.network # syslinux-install_update -i -a -m # vi /boot/syslinux/syslinux.cfg { After Reboot } # vi /etc/fstab [1.329731] systemd-udevd[102]: Reading rules file: /usr/lib/udev/rules.d/11-dm-lvm.rules [1.330295] systemd-udevd[102]: Reading rules file: /usr/lib/udev/rules.d/69-dm-lvm-metad.rules [2.784667] systemd-udevd[171]: LINK 'disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' /usr/lib/udev/rules.d/69-dm-lvm-metad.rules:38 [2.784686] systemd-udevd[171]: RUN '/usr/bin/lvm pvscan --background --cache --activate ay --major $major --minor $minor' /usr/lib/udev/rules.d/69-dm-lvm-metad.rules:91 [2.784778] systemd-udevd[171]: creating link '/dev/disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' to '/dev/sda2' [2.784786] systemd-udevd[171]: creating symlink '/dev/disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' to '../../sda2' [2.785414] systemd-udevd[228]: starting '/usr/bin/lvm pvscan --background --cache --activate ay --major 8 --minor 2' [2.787879] systemd-udevd[171]: '/usr/bin/lvm pvscan --background --cache --activate ay