Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used

2016-09-05 Thread Zdenek Kabelac

Dne 3.9.2016 v 05:17 james harvey napsal(a):

On Tue, Aug 16, 2016 at 5:57 AM, Zdenek Kabelac  wrote:

Dne 6.8.2016 v 04:08 james harvey napsal(a):


Same problem and question about if an immediate SIGKILL is OK for
dmeventd.

On Thu, Aug 4, 2016 at 11:20 PM, james harvey 
wrote:


Does it matter at all if lvmetad shuts down gracefully?

Can I safely just have systemd right off the bat send a SIGKILL?

Most things I wouldn't ask about, but I'm wondering if this is PURELY
a caching daemon where gracefully shutting down doesn't really do
anything.




Sigterm/sigint is ignored by dmeventd when device is monitored.

Before stopping dmevend - devices shall be unmonitored.
(vg/lvchange)

Killing 'dmeventd' in the middle of i.e. recovery operation might leave your
system in dizzy state (suspended devices) essentially useless.


Somewhat similar ATM does apply to lvmetad - where lvm2 command will not
like death of lvmetad in the middle of operation and this may result in
operation failure (thought here the situation might get somewhat improved
over the time...) - but ATM don't kill  - just stop services.

Fedora should be doing it properly on reboot - switching to ramdisk and
continuing with shutdown sequence from there.  Unsure how other OS-es solves
this.

Using 'kill -9' (SIGKILL) is in general unsupported and any reported
problems caused by this usage are ignored...

Regards

Zdenek



Got it.  Fedora defaults to having lvm2-monitor.service enabled, Arch
doesn't.  (I've asked for that to be fixed.)  Arch also uses a
shutdown ramdisk.


Using some device type WITHOUT monitoring is quite 'crazy' idea...
Unless you are well aware of what you are doing,  thin, raid, mirror,
snapshot device should be always monitored...

So IMHO a thing to fix in Arch



1) Should the lvm2-lvmetad, dm-event, and lvm2-monitor unit files be
modified so they are never given a SIGKILL?  Even with
lvm2-monitor.service enabled, even on Fedora, if systemd sees they
don't SIGTERM/SIGINT within 90 seconds (systemd v231 is 90 seconds,
was 10 second before), it's sending them a SIGKILL.  I think adding
"SendSIGKILL=no" to the Service and Socket sections will do this, if I
understand it correctly.


That's a different story here - it something is 'deadlocked' and
can't move forward - killing things after 90 seconds can't make
the situation any more worst likely - especially if you are doing shutdown...

So no - there is no plan to use such option (SendSIGKILL=no) ATM
(State-machine is pretty complex and when some devices are 'forgotten' in 
suspend - it's quite hard to fix it).




2) Should lvm2-lvmetad and dm-event systemd unit files want
lvm2-monitor.service?


lvm2-lvmetad is unrelated to monitoring service (dmevent).



3) Could all LVM programs be changed so if they receive a
SIGTERM/SIGINT and choose to ignore it, they give a warn/info/debug
message?  Not doing so invites thinking a SIGKILL is the proper thing
to do.



SIGINT should  be handle with logging - (at least I've taken care  in dmeventd 
- it should log this to syslog).


Both daemons should be able to gracefully shutdown if they are not in use
(i.e. no connection to lvmetad,  no monitored device in dmeventd).

lvm2 command usually block signal processing while it's holding VG lock,
but it should be breakable (SIGINT) in those 'process_each_lv'  loops or if 
the command prompts - support for  SIGTERM is planned - but low-prio - so it 
will happen - it's known issue - but bigger fishes are there for hunting ATM... :)


The best is to open BZ if you find something breaking common logic.
(So it's not lost in mailing list noice).

Regards

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used

2016-08-05 Thread james harvey
Same problem and question about if an immediate SIGKILL is OK for dmeventd.

On Thu, Aug 4, 2016 at 11:20 PM, james harvey  wrote:
> Does it matter at all if lvmetad shuts down gracefully?
>
> Can I safely just have systemd right off the bat send a SIGKILL?
>
> Most things I wouldn't ask about, but I'm wondering if this is PURELY
> a caching daemon where gracefully shutting down doesn't really do
> anything.
>
> On Wed, Aug 3, 2016 at 10:51 PM, james harvey  
> wrote:
>> After upgrading to systemd v231, my shutdowns/reboots have a 90 second
>> delay at the very end.  Linux kvm 4.6.4-1.
>>
>> After I looked into it, I found it's due to lvmetad never terminating
>> when receiving a SIGTERM, and after 90 seconds, systemd performs a
>> SIGKILL.
>>
>> systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL
>> after a SIGTERM from 10 seconds to 90 seconds.  I think this bug has
>> been around for quite a while, because I've noticed shutdowns had
>> about a 10 second delay at the same spot that now has a 90 second
>> delay.
>>
>> With lvmetad running with "-l all", a systemd debug dmesg log through
>> shutdown is attached, after running it through "grep -i lvm2".  The
>> full (4MB) version is here:
>> http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt
>>
>> This also happens if I attempt stopping lvm2-lvmetad.  Attached is
>> information showing that.
>>
>> Also attached is the minimal steps I used to cause the problem, using
>> one disk and EXT4.
>>
>> If during the install I combine the 2 lvcreate commands into a single
>> one without using thin pools, then lvmetad terminates pretty much
>> immediately with SIGTERM.
>>
>> ==
>>
>> # systemctl status lvm2-lvmetad
>> ● lvm2-lvmetad.service - LVM2 metadata daemon
>>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
>> disabled; vendor preset: disabled)
>>Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago
>>  Docs: man:lvmetad(8)
>>  Main PID: 398 (lvmetad)
>> Tasks: 2 (limit: 4915)
>>CGroup: /system.slice/lvm2-lvmetad.service
>>└─398 /usr/bin/lvmetad -f
>>
>> # systemctl stop lvm2-lvmetad
>> {{{ after 90 seconds }}}
>> Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
>>   lvm2-lvmetad.socket
>> # systemctl status lvm2-lvmetad
>> ● lvm2-lvmetad.service - LVM2 metadata daemon
>>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
>> disabled; vendor preset: disabled)
>>Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago
>>  Docs: man:lvmetad(8)
>>   Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL)
>>  Main PID: 398 (code=killed, signal=KILL)
>>
>> Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon.
>> Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon...
>> Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection.
>> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State
>> 'stop-sigterm' timed out. Killing.
>> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing
>> process 398 (lvmetad) with signal SIGKILL.
>> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process
>> exited, code=killed, status=9/KILL
>> Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon.
>> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered
>> failed state.
>> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with
>> result 'signal'.
>>
>> 
>>
>> /dev/sda1 3.5G Linux filesystem
>> /dev/sda2 4.5TB Linux LVM
>>
>> { Setup LVM and filesystems }
>> # mkfs.ext4 -L boot /dev/sda1
>> # pvcreate /dev/sda2
>> # vgcreate disk1 /dev/sda2
>> { Merging these 2 lvcreates, removing the thin volume usage makes
>> lvm2-lvmetad properly terminate on SIGTERM }
>> # lvcreate --size 500G --thinpool disk1thin disk1
>> # lvcreate --virtualsize 100G --name root disk1/disk1thin
>> # mkfs.ext4 -L /mnt /dev/disk1/main
>> # mount /dev/disk1/main /mnt
>> # mkdir /mnt/boot
>> # mount /dev/sda1 /mnt/boot
>>
>> { Install Arch Linux }
>> # vi /etc/pacman.d/mirrorlist
>> # pacstrap -i /mnt base syslinux gptfdisk lvm2
>> # arch-chroot /mnt
>> # vi /etc/locale.gen
>> # locale-gen
>> # locale > /etc/locale.conf
>> # vi /etc/nsswitch.conf
>> # systemctl enable systemd-resolved systemd-networkd
>> # ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime
>> # hwclock --utc --systohc
>> # passwd
>> { Add lvm2 between block and filesystems }
>> # vi /etc/mkinitcpio.conf
>> # mkinitcpio -p linux
>> # echo hostname > /etc/hostname
>> # vi /etc/systemd/network/enp31s0.network
>> # syslinux-install_update -i -a -m
>> # vi /boot/syslinux/syslinux.cfg
>>
>> { After Reboot }
>> # vi /etc/fstab

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used

2016-08-04 Thread james harvey
Does it matter at all if lvmetad shuts down gracefully?

Can I safely just have systemd right off the bat send a SIGKILL?

Most things I wouldn't ask about, but I'm wondering if this is PURELY
a caching daemon where gracefully shutting down doesn't really do
anything.

On Wed, Aug 3, 2016 at 10:51 PM, james harvey  wrote:
> After upgrading to systemd v231, my shutdowns/reboots have a 90 second
> delay at the very end.  Linux kvm 4.6.4-1.
>
> After I looked into it, I found it's due to lvmetad never terminating
> when receiving a SIGTERM, and after 90 seconds, systemd performs a
> SIGKILL.
>
> systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL
> after a SIGTERM from 10 seconds to 90 seconds.  I think this bug has
> been around for quite a while, because I've noticed shutdowns had
> about a 10 second delay at the same spot that now has a 90 second
> delay.
>
> With lvmetad running with "-l all", a systemd debug dmesg log through
> shutdown is attached, after running it through "grep -i lvm2".  The
> full (4MB) version is here:
> http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt
>
> This also happens if I attempt stopping lvm2-lvmetad.  Attached is
> information showing that.
>
> Also attached is the minimal steps I used to cause the problem, using
> one disk and EXT4.
>
> If during the install I combine the 2 lvcreate commands into a single
> one without using thin pools, then lvmetad terminates pretty much
> immediately with SIGTERM.
>
> ==
>
> # systemctl status lvm2-lvmetad
> ● lvm2-lvmetad.service - LVM2 metadata daemon
>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
> disabled; vendor preset: disabled)
>Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago
>  Docs: man:lvmetad(8)
>  Main PID: 398 (lvmetad)
> Tasks: 2 (limit: 4915)
>CGroup: /system.slice/lvm2-lvmetad.service
>└─398 /usr/bin/lvmetad -f
>
> # systemctl stop lvm2-lvmetad
> {{{ after 90 seconds }}}
> Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
>   lvm2-lvmetad.socket
> # systemctl status lvm2-lvmetad
> ● lvm2-lvmetad.service - LVM2 metadata daemon
>Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
> disabled; vendor preset: disabled)
>Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago
>  Docs: man:lvmetad(8)
>   Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL)
>  Main PID: 398 (code=killed, signal=KILL)
>
> Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon.
> Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon...
> Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection.
> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State
> 'stop-sigterm' timed out. Killing.
> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing
> process 398 (lvmetad) with signal SIGKILL.
> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process
> exited, code=killed, status=9/KILL
> Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon.
> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered
> failed state.
> Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with
> result 'signal'.
>
> 
>
> /dev/sda1 3.5G Linux filesystem
> /dev/sda2 4.5TB Linux LVM
>
> { Setup LVM and filesystems }
> # mkfs.ext4 -L boot /dev/sda1
> # pvcreate /dev/sda2
> # vgcreate disk1 /dev/sda2
> { Merging these 2 lvcreates, removing the thin volume usage makes
> lvm2-lvmetad properly terminate on SIGTERM }
> # lvcreate --size 500G --thinpool disk1thin disk1
> # lvcreate --virtualsize 100G --name root disk1/disk1thin
> # mkfs.ext4 -L /mnt /dev/disk1/main
> # mount /dev/disk1/main /mnt
> # mkdir /mnt/boot
> # mount /dev/sda1 /mnt/boot
>
> { Install Arch Linux }
> # vi /etc/pacman.d/mirrorlist
> # pacstrap -i /mnt base syslinux gptfdisk lvm2
> # arch-chroot /mnt
> # vi /etc/locale.gen
> # locale-gen
> # locale > /etc/locale.conf
> # vi /etc/nsswitch.conf
> # systemctl enable systemd-resolved systemd-networkd
> # ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime
> # hwclock --utc --systohc
> # passwd
> { Add lvm2 between block and filesystems }
> # vi /etc/mkinitcpio.conf
> # mkinitcpio -p linux
> # echo hostname > /etc/hostname
> # vi /etc/systemd/network/enp31s0.network
> # syslinux-install_update -i -a -m
> # vi /boot/syslinux/syslinux.cfg
>
> { After Reboot }
> # vi /etc/fstab

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used

2016-08-03 Thread james harvey
On Wed, Aug 3, 2016 at 10:51 PM, james harvey  wrote:
> After upgrading to systemd v231, my shutdowns/reboots have a 90 second
> delay at the very end.  Linux kvm 4.6.4-1.

Ignore the "kvm" here, must have typed that into the wrong window.
This is on a physical system, not running kvm at this point.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] lvmetad doesn't terminate with SIGTERM if thin volume used

2016-08-03 Thread james harvey
After upgrading to systemd v231, my shutdowns/reboots have a 90 second
delay at the very end.  Linux kvm 4.6.4-1.

After I looked into it, I found it's due to lvmetad never terminating
when receiving a SIGTERM, and after 90 seconds, systemd performs a
SIGKILL.

systemd 231 (commit d4506129) changed the timeout on sending a SIGKILL
after a SIGTERM from 10 seconds to 90 seconds.  I think this bug has
been around for quite a while, because I've noticed shutdowns had
about a 10 second delay at the same spot that now has a 90 second
delay.

With lvmetad running with "-l all", a systemd debug dmesg log through
shutdown is attached, after running it through "grep -i lvm2".  The
full (4MB) version is here:
http://45.63.106.241/share/lvm2-lvmetad.shutdown-log2.txt

This also happens if I attempt stopping lvm2-lvmetad.  Attached is
information showing that.

Also attached is the minimal steps I used to cause the problem, using
one disk and EXT4.

If during the install I combine the 2 lvcreate commands into a single
one without using thin pools, then lvmetad terminates pretty much
immediately with SIGTERM.

==

# systemctl status lvm2-lvmetad
● lvm2-lvmetad.service - LVM2 metadata daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
disabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-08-03 21:36:54 EDT; 1min 14s ago
 Docs: man:lvmetad(8)
 Main PID: 398 (lvmetad)
Tasks: 2 (limit: 4915)
   CGroup: /system.slice/lvm2-lvmetad.service
   └─398 /usr/bin/lvmetad -f

# systemctl stop lvm2-lvmetad
{{{ after 90 seconds }}}
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
  lvm2-lvmetad.socket
# systemctl status lvm2-lvmetad
● lvm2-lvmetad.service - LVM2 metadata daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service;
disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2016-08-03 21:40:33 EDT; 12s ago
 Docs: man:lvmetad(8)
  Process: 398 ExecStart=/usr/bin/lvmetad -f (code=killed, signal=KILL)
 Main PID: 398 (code=killed, signal=KILL)

Aug 03 21:36:54 terra systemd[1]: Started LVM2 metadata daemon.
Aug 03 21:39:03 terra systemd[1]: Stopping LVM2 metadata daemon...
Aug 03 21:39:03 terra lvmetad[398]: Failed to accept connection.
Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: State
'stop-sigterm' timed out. Killing.
Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Killing
process 398 (lvmetad) with signal SIGKILL.
Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Main process
exited, code=killed, status=9/KILL
Aug 03 21:40:33 terra systemd[1]: Stopped LVM2 metadata daemon.
Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Unit entered
failed state.
Aug 03 21:40:33 terra systemd[1]: lvm2-lvmetad.service: Failed with
result 'signal'.



/dev/sda1 3.5G Linux filesystem
/dev/sda2 4.5TB Linux LVM

{ Setup LVM and filesystems }
# mkfs.ext4 -L boot /dev/sda1
# pvcreate /dev/sda2
# vgcreate disk1 /dev/sda2
{ Merging these 2 lvcreates, removing the thin volume usage makes
lvm2-lvmetad properly terminate on SIGTERM }
# lvcreate --size 500G --thinpool disk1thin disk1
# lvcreate --virtualsize 100G --name root disk1/disk1thin
# mkfs.ext4 -L /mnt /dev/disk1/main
# mount /dev/disk1/main /mnt
# mkdir /mnt/boot
# mount /dev/sda1 /mnt/boot

{ Install Arch Linux }
# vi /etc/pacman.d/mirrorlist
# pacstrap -i /mnt base syslinux gptfdisk lvm2
# arch-chroot /mnt
# vi /etc/locale.gen
# locale-gen
# locale > /etc/locale.conf
# vi /etc/nsswitch.conf
# systemctl enable systemd-resolved systemd-networkd
# ln -s /usr/share/zoneinfo/America/Detroit /etc/localtime
# hwclock --utc --systohc
# passwd
{ Add lvm2 between block and filesystems }
# vi /etc/mkinitcpio.conf
# mkinitcpio -p linux
# echo hostname > /etc/hostname
# vi /etc/systemd/network/enp31s0.network
# syslinux-install_update -i -a -m
# vi /boot/syslinux/syslinux.cfg

{ After Reboot }
# vi /etc/fstab
[1.329731] systemd-udevd[102]: Reading rules file: 
/usr/lib/udev/rules.d/11-dm-lvm.rules
[1.330295] systemd-udevd[102]: Reading rules file: 
/usr/lib/udev/rules.d/69-dm-lvm-metad.rules
[2.784667] systemd-udevd[171]: LINK 
'disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' 
/usr/lib/udev/rules.d/69-dm-lvm-metad.rules:38
[2.784686] systemd-udevd[171]: RUN '/usr/bin/lvm pvscan --background 
--cache --activate ay --major $major --minor $minor' 
/usr/lib/udev/rules.d/69-dm-lvm-metad.rules:91
[2.784778] systemd-udevd[171]: creating link 
'/dev/disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' to 
'/dev/sda2'
[2.784786] systemd-udevd[171]: creating symlink 
'/dev/disk/by-id/lvm-pv-uuid-OtSgxO-WvMd-mzfn-FbPw-aXBv-TaT6-xHP2bM' to 
'../../sda2'
[2.785414] systemd-udevd[228]: starting '/usr/bin/lvm pvscan --background 
--cache --activate ay --major 8 --minor 2'
[2.787879] systemd-udevd[171]: '/usr/bin/lvm pvscan --background --cache 
--activate ay