Re: [linux-lvm] thin: pool target too small
On Mon, Sep 21, 2020 at 5:23 AM Zdenek Kabelac wrote: > > Dne 21. 09. 20 v 1:48 Duncan Townsend napsal(a): > > Hello! > > > > I think the problem I'm having is a related problem to this thread: > > https://www.redhat.com/archives/linux-lvm/2016-May/msg00092.html > > continuation > > https://www.redhat.com/archives/linux-lvm/2016-June/msg0.html > > . In the previous thread, Zdenek Kabelac fixed the problem manually, > > but there was no information about exactly what or how the problem was > > fixed. I have also posted about this problem on the #lvm on freenode > > and on Stack Exchange > > (https://superuser.com/questions/1587224/lvm2-thin-pool-pool-target-too-small), > > so my apologies to those of you who are seeing this again. > > > Hi > > At first it's worth to remain which version of kernel, lvm2, thin-tools > (d-m-p-d package on RHEL/Fedora- aka thin_check -V) is this. Ahh, thank you for the reminder. My apologies for not including this in my original message. I use Void Linux on aarch64-musl: # uname -a Linux (none) 5.7.0_1 #1 SMP Thu Aug 6 20:19:56 UTC 2020 aarch64 GNU/Linux # lvm version LVM version: 2.02.187(2) (2020-03-24) Library version: 1.02.170 (2020-03-24) Driver version: 4.42.0 Configuration: ./configure --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --bindir=/usr/bin --mandir=/usr/share/man --infodir=/usr/share/info --localstatedir=/var --disable-selinux --enable-readline --enable-pkgconfig --enable-fsadm --enable-applib --enable-dmeventd --enable-cmdlib --enable-udev_sync --enable-udev_rules --enable-lvmetad --with-udevdir=/usr/lib/udev/rules.d --with-default-pid-dir=/run --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --enable-static_link --host=x86_64-unknown-linux-musl --build=x86_64-unknown-linux-musl --host=aarch64-linux-musl --with-sysroot=/usr/aarch64-linux-musl --with-libtool-sysroot=/usr/aarch64-linux-musl # thin_check -V 0.8.5 > > I had a problem with a runit script that caused my dmeventd to be > > killed and restarted every 5 seconds. The script has been fixed, but > > Kill dmeventd is always BAD plan. > Either you do not want monitoring (set to 0 in lvm.conf) - or > leave it to it jobs - kill dmeventd in the middle of its work > isn't going to end well...) Thank you for reinforcing this. My runit script was fighting with dracut in my initramfs. My runit script saw that there was a dmeventd not under its control, and so tried to kill the one started by dracut. I've gone and disabled the runit script and replaced it with a stub that simply tried to kill the dracut-started dmeventd when it receives a signal. > > device-mapper: thin: 253:10: reached low water mark for data device: > > sending event. > > lvm[1221]: WARNING: Sum of all thin volume sizes (2.81 TiB) exceeds > > the size of thin pools and the size of whole volume group (1.86 TiB). > > lvm[1221]: Size of logical volume > > nellodee-nvme/nellodee-nvme-thin_tdata changed from 212.64 GiB (13609 > > extents) to <233.91 GiB (14970 extents). > > device-mapper: thin: 253:10: growing the data device from 13609 to 14970 > > blocks > > lvm[1221]: Logical volume nellodee-nvme/nellodee-nvme-thin_tdata > > successfully resized. > > So here was successful resize - > > > lvm[1221]: dmeventd received break, scheduling exit. > > lvm[1221]: dmeventd received break, scheduling exit. > lvm[1221]: WARNING: > > Thin pool > > nellodee--nvme-nellodee--nvme--thin-tpool data is now 81.88% full. > > (lots of repeats of "lvm[1221]: dmeventd received break, > > scheduling exit.") > > lvm[1221]: No longer monitoring thin pool > > nellodee--nvme-nellodee--nvme--thin-tpool. > > device-mapper: thin: 253:10: pool target (13609 blocks) too small: > > expected 14970 > > And now we can see the problem - the thin-pool was already upsized to bigger > size (13609 -> 14970 as seen above) - yet something has tried to activate > thin-pool with smaller metadata volume. I think what happened here is that the dmeventd started by dracut finally exited, and then the dmeventd started by runit takes over. Then the started-by-runit dmevent and tries to activate the thin-pool which is in the process of being resized? > > device-mapper: table: 253:10: thin-pool: preresume failed, error = -22 > > This is correct - it's preventing further damage of thin-pool to happen. > > > lvm[1221]: dmeventd received break, scheduling exit. > > (previous message repeats many times) > > > > After this, the system became unresponsive, so I power cycled it. Upon > > boot up, the following message was printed and I was dropped into an > > emergency shell: > > > > device-mapper: thin: 253:10: pool target (13609 blocks) too small: > > expected 14970 > > device-mapper: table: 253:10: thin-pool: preresume failed, error = -22 > > > So the primary question is - how the LVM could have got 'smaller' metadata > back - have you played with 'vgcfgrestore' ? > > So when you submit version of tools - also
Re: [linux-lvm] Why isn't issue_discards enabled by default?
On Mon, Sep 21, 2020 at 10:14 AM nl6720 wrote: > I wanted to know why the "issue_discards" setting isn't enabled by > default. Are there any dangers in enabling it or if not is there a > chance of getting the default changed? > > Also it's not entirely clear to me if/how "issue_discards" affects thin > pool discard passdown. > Historically, there have been dangers. Even today, there might still be dangers - although, I believe Linux (and other OS) may disable the feature in hardware which is known to behave improperly. If you do research and ensure you are using a good storage drive, there should not be any problems. I enable issue_discards on all systems I work with at home and work, and have not encountered any problems. But, I also don't buy cheap drives with questionable namebrands. It's pretty common for settings such as these to be more conservative, to ensure that the users who are willing to accept the risk (no matter how small) can turn it on as an option, and the users who are unaware or might not have evaluated the risk, cannot blame the software vendors for losing their data. In the case of LVM - it's not LVM's fault that some drives might lose your data when discard is sent. But, users of LVM might blame LVM. -- Mark Mielke ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Why isn't issue_discards enabled by default?
Dne 21. 09. 20 v 16:14 nl6720 napsal(a): Hi, I wanted to know why the "issue_discards" setting isn't enabled by default. Are there any dangers in enabling it or if not is there a chance of getting the default changed? Also it's not entirely clear to me if/how "issue_discards" affects thin pool discard passdown. Hi Have you checked it's enclosed documentation in within /etc/lvm/lvm.conf ? issue_discards is PURELY & ONLY related to sending discard to removed disk extents/areas after 'lvremove'. It is't not in ANY way related to actual discard handling of the LV itself. So if you have LV on SSD it is automatically processing discards. From the same reason it's unrelated to discard processing of thin-pools. And finally why we prefer issue_discards to be disabled (=0) by default. It's very simple - with lvm2 we try (when we can) to support one-command-back restore - so if you do 'lvremove' - you can use vgcfgrestore to restore previous metadata and you have your LV back with all the data inside. When you have issue_discards=1 - the device gets TRIM - so all the data are discarded at device level - so when you try to restore your previous metadata - well it's nice - but content is gone forever If user can live with this 'risk' and prefers immediate discard - perfectly fine - but it should be (IMHO) admin's decision. Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Removing lvmcache fails
> Please can you open upstream BZ here: > > https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper > > List all the info - i.e. package versions (yep even .debs), kernel versions > lvmdump archive. > > I can probably easily 'hand-make' lvm2 metadata for you for 'vgcfgrestore' - > but I'd like to track this case and create some reproducer for this issue so > we can handle case were the cache cannot be cleared. https://bugzilla.redhat.com/show_bug.cgi?id=1881056 Here you go. Thanks for any help :) PS: I love languages, I just wonder how you pronounce Zdenek ;) Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] Why isn't issue_discards enabled by default?
Hi, I wanted to know why the "issue_discards" setting isn't enabled by default. Are there any dangers in enabling it or if not is there a chance of getting the default changed? Also it's not entirely clear to me if/how "issue_discards" affects thin pool discard passdown. signature.asc Description: This is a digitally signed message part. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Removing lvmcache fails
> Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a): >> Hi all >> >> I have an ssd hooked up to my system to do some caching of the raid-6 array. >> Now, that SSD isn't very new and has started developing some smart errors, >> so I >> thought I'd just remove it and find a nice recycle bin for it. However, this >> fails. > > Hmm - cache isn't yet very easy to use once you start to have faulty devices > in your device stack Well, that's why I want to replace it ;) >> This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2) >> >> Any idea how to get rid of this? >> >> # lvconvert --uncache data/data >>Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 >> 35277148 >>19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2 >>migration_threshold 2048 smq 0 rw - > > > However since you have 'writethrough' cache mode - it should be possible to > use 'lvconvert --uncache --force' # lvconvert --uncache --force data/data Unknown feature in status: 8 2484/262144 128 819198/819200 57470386 35323243 19987362 9940326 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - (et cetera et cetera ad infinitum) > But I'm somehow confused how you can have any dirty blocks in this case ?? So am I > lvm2 2.03.02 version was somewhat experimental release - so I've recommend > something newer - which will also properly parse newer status output from > newer kernel dm-cache module. Well, that's weird - debian is usually rather conservative on its packages in a stable release. Are you sure? I also updated to a backported kernel 5.7 without any change. Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] Removing lvmcache fails
Hi all I have an ssd hooked up to my system to do some caching of the raid-6 array. Now, that SSD isn't very new and has started developing some smart errors, so I thought I'd just remove it and find a nice recycle bin for it. However, this fails. This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2) Any idea how to get rid of this? # lvconvert --uncache data/data Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 35277148 19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 19684834 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold (and so on) # lvs -o+cache_mode data/data Unknown feature in status: 8 2484/262144 128 819195/819200 57263401 35277148 19955449 9916671 0 3 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw - LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert CacheMode data data Cwi-aoC--- 13,67t [_cache] [data_corig] 99,99 0,950,01 writethrough Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Removing lvmcache fails
Dne 21. 09. 20 v 12:01 Roy Sigurd Karlsbakk napsal(a): Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a): Hi all However since you have 'writethrough' cache mode - it should be possible to use 'lvconvert --uncache --force' # lvconvert --uncache --force data/data Unknown feature in status: 8 2484/262144 128 819198/819200 57470386 35323243 19987362 9940326 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - Flushing 1 blocks for cache data/data. Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 cleaner 0 rw - (et cetera et cetera ad infinitum) But I'm somehow confused how you can have any dirty blocks in this case ?? So am I lvm2 2.03.02 version was somewhat experimental release - so I've recommend something newer - which will also properly parse newer status output from newer kernel dm-cache module. Well, that's weird - debian is usually rather conservative on its packages in a stable release. Are you sure? I also updated to a backported kernel 5.7 without any change. Hi Please can you open upstream BZ here: https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper List all the info - i.e. package versions (yep even .debs), kernel versions lvmdump archive. I can probably easily 'hand-make' lvm2 metadata for you for 'vgcfgrestore' - but I'd like to track this case and create some reproducer for this issue so we can handle case were the cache cannot be cleared. Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Removing lvmcache fails
Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a): Hi all I have an ssd hooked up to my system to do some caching of the raid-6 array. Now, that SSD isn't very new and has started developing some smart errors, so I thought I'd just remove it and find a nice recycle bin for it. However, this fails. Hmm - cache isn't yet very easy to use once you start to have faulty devices in your device stack This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2) Any idea how to get rid of this? # lvconvert --uncache data/data Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 35277148 19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw - However since you have 'writethrough' cache mode - it should be possible to use 'lvconvert --uncache --force' But I'm somehow confused how you can have any dirty blocks in this case ?? lvm2 2.03.02 version was somewhat experimental release - so I've recommend something newer - which will also properly parse newer status output from newer kernel dm-cache module. Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] thin: pool target too small
Dne 21. 09. 20 v 1:48 Duncan Townsend napsal(a): Hello! I think the problem I'm having is a related problem to this thread: https://www.redhat.com/archives/linux-lvm/2016-May/msg00092.html continuation https://www.redhat.com/archives/linux-lvm/2016-June/msg0.html . In the previous thread, Zdenek Kabelac fixed the problem manually, but there was no information about exactly what or how the problem was fixed. I have also posted about this problem on the #lvm on freenode and on Stack Exchange (https://superuser.com/questions/1587224/lvm2-thin-pool-pool-target-too-small), so my apologies to those of you who are seeing this again. Hi At first it's worth to remain which version of kernel, lvm2, thin-tools (d-m-p-d package on RHEL/Fedora- aka thin_check -V) is this. I had a problem with a runit script that caused my dmeventd to be killed and restarted every 5 seconds. The script has been fixed, but Kill dmeventd is always BAD plan. Either you do not want monitoring (set to 0 in lvm.conf) - or leave it to it jobs - kill dmeventd in the middle of its work isn't going to end well...) device-mapper: thin: 253:10: reached low water mark for data device: sending event. lvm[1221]: WARNING: Sum of all thin volume sizes (2.81 TiB) exceeds the size of thin pools and the size of whole volume group (1.86 TiB). lvm[1221]: Size of logical volume nellodee-nvme/nellodee-nvme-thin_tdata changed from 212.64 GiB (13609 extents) to <233.91 GiB (14970 extents). device-mapper: thin: 253:10: growing the data device from 13609 to 14970 blocks lvm[1221]: Logical volume nellodee-nvme/nellodee-nvme-thin_tdata successfully resized. So here was successful resize - lvm[1221]: dmeventd received break, scheduling exit. lvm[1221]: dmeventd received break, scheduling exit. > lvm[1221]: WARNING: Thin pool nellodee--nvme-nellodee--nvme--thin-tpool data is now 81.88% full. (lots of repeats of "lvm[1221]: dmeventd received break, scheduling exit.") lvm[1221]: No longer monitoring thin pool nellodee--nvme-nellodee--nvme--thin-tpool. device-mapper: thin: 253:10: pool target (13609 blocks) too small: expected 14970 And now we can see the problem - the thin-pool was already upsized to bigger size (13609 -> 14970 as seen above) - yet something has tried to activate thin-pool with smaller metadata volume. device-mapper: table: 253:10: thin-pool: preresume failed, error = -22 This is correct - it's preventing further damage of thin-pool to happen. lvm[1221]: dmeventd received break, scheduling exit. (previous message repeats many times) After this, the system became unresponsive, so I power cycled it. Upon boot up, the following message was printed and I was dropped into an emergency shell: device-mapper: thin: 253:10: pool target (13609 blocks) too small: expected 14970 device-mapper: table: 253:10: thin-pool: preresume failed, error = -22 So the primary question is - how the LVM could have got 'smaller' metadata back - have you played with 'vgcfgrestore' ? So when you submit version of tools - also provide /etc/lvm/archive (eventually lvmdump archive) I have tried using thin_repair, which reported success and didn't solve the problem. I tried vgcfgrestore (using metadata backups going back quite a ways), which also reported success and did not solve the problem. I tried lvchange --repair. I tried lvextending the thin 'lvconvert --repair' can solve only very basic issues - it's not able to resolve badly sized metadata device ATM. For all other case you need to use manual repair steps. I am at a loss here about how to proceed with fixing this problem. Is there some flag I've missed or some tool I don't know about that I can apply to fixing this problem? Thank you very much for your attention, I'd expect in your /etc/lvm/archive (or in the 1st. 1MiB of your device header) there can be seen a history of changes of your lvm2 metadata and you should be able ot find when then _tmeta LV was matching your new metadata size and maybe see when it's got previous size. Without knowing more detail it's hard to give precise answer - but before you will try to do some next steps of your recovery be sure you know what you are doing - it's better to ask here the be sorry later. Regards Zdenek ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/