Re: [linux-lvm] thin: pool target too small

2020-09-21 Thread Duncan Townsend
On Mon, Sep 21, 2020 at 5:23 AM Zdenek Kabelac  wrote:
>
> Dne 21. 09. 20 v 1:48 Duncan Townsend napsal(a):
> > Hello!
> >
> > I think the problem I'm having is a related problem to this thread:
> > https://www.redhat.com/archives/linux-lvm/2016-May/msg00092.html
> > continuation 
> > https://www.redhat.com/archives/linux-lvm/2016-June/msg0.html
> > . In the previous thread, Zdenek Kabelac fixed the problem manually,
> > but there was no information about exactly what or how the problem was
> > fixed. I have also posted about this problem on the #lvm on freenode
> > and on Stack Exchange
> > (https://superuser.com/questions/1587224/lvm2-thin-pool-pool-target-too-small),
> > so my apologies to those of you who are seeing this again.
>
>
> Hi
>
> At first it's worth to remain which version of  kernel, lvm2, thin-tools
> (d-m-p-d package on RHEL/Fedora-   aka  thin_check -V) is this.

Ahh, thank you for the reminder. My apologies for not including this
in my original message. I use Void Linux on aarch64-musl:

# uname -a
Linux (none) 5.7.0_1 #1 SMP Thu Aug 6 20:19:56 UTC 2020 aarch64 GNU/Linux

# lvm version
  LVM version: 2.02.187(2) (2020-03-24)
  Library version: 1.02.170 (2020-03-24)
  Driver version:  4.42.0
  Configuration:   ./configure --prefix=/usr --sysconfdir=/etc
--sbindir=/usr/bin --bindir=/usr/bin --mandir=/usr/share/man
--infodir=/usr/share/info --localstatedir=/var --disable-selinux
--enable-readline --enable-pkgconfig --enable-fsadm --enable-applib
--enable-dmeventd --enable-cmdlib --enable-udev_sync
--enable-udev_rules --enable-lvmetad
--with-udevdir=/usr/lib/udev/rules.d --with-default-pid-dir=/run
--with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm
--with-default-locking-dir=/run/lock/lvm --enable-static_link
--host=x86_64-unknown-linux-musl --build=x86_64-unknown-linux-musl
--host=aarch64-linux-musl --with-sysroot=/usr/aarch64-linux-musl
--with-libtool-sysroot=/usr/aarch64-linux-musl

# thin_check -V
0.8.5

> > I had a problem with a runit script that caused my dmeventd to be
> > killed and restarted every 5 seconds. The script has been fixed, but
>
> Kill dmeventd is always BAD plan.
> Either you do not want monitoring (set to 0 in lvm.conf) - or
> leave it to it jobs - kill dmeventd in the middle of its work
> isn't going to end well...)

Thank you for reinforcing this. My runit script was fighting with
dracut in my initramfs. My runit script saw that there was a dmeventd
not under its control, and so tried to kill the one started by dracut.
I've gone and disabled the runit script and replaced it with a stub
that simply tried to kill the dracut-started dmeventd when it receives
a signal.

> > device-mapper: thin: 253:10: reached low water mark for data device:
> > sending event.
> > lvm[1221]: WARNING: Sum of all thin volume sizes (2.81 TiB) exceeds
> > the size of thin pools and the size of whole volume group (1.86 TiB).
> > lvm[1221]: Size of logical volume
> > nellodee-nvme/nellodee-nvme-thin_tdata changed from 212.64 GiB (13609
> > extents) to <233.91 GiB (14970 extents).
> > device-mapper: thin: 253:10: growing the data device from 13609 to 14970 
> > blocks
> > lvm[1221]: Logical volume nellodee-nvme/nellodee-nvme-thin_tdata
> > successfully resized.
>
> So here was successful resize -
>
> > lvm[1221]: dmeventd received break, scheduling exit.
> > lvm[1221]: dmeventd received break, scheduling exit. > lvm[1221]: WARNING: 
> > Thin pool
> > nellodee--nvme-nellodee--nvme--thin-tpool data is now 81.88% full.
> >  (lots of repeats of "lvm[1221]: dmeventd received break,
> > scheduling exit.")
> > lvm[1221]: No longer monitoring thin pool
> > nellodee--nvme-nellodee--nvme--thin-tpool.
> > device-mapper: thin: 253:10: pool target (13609 blocks) too small:
> > expected 14970
>
> And now we can see the problem - the thin-pool was already upsized to bigger
> size (13609 -> 14970 as seen above) - yet something has tried to activate
> thin-pool with smaller metadata volume.

I think what happened here is that the dmeventd started by dracut
finally exited, and then the dmeventd started by runit takes over.
Then the started-by-runit dmevent and tries to activate the thin-pool
which is in the process of being resized?

> > device-mapper: table: 253:10: thin-pool: preresume failed, error = -22
>
> This is correct - it's preventing further damage of thin-pool to happen.
>
> > lvm[1221]: dmeventd received break, scheduling exit.
> > (previous message repeats many times)
> >
> > After this, the system became unresponsive, so I power cycled it. Upon
> > boot up, the following message was printed and I was dropped into an
> > emergency shell:
> >
> > device-mapper: thin: 253:10: pool target (13609 blocks) too small:
> > expected 14970
> > device-mapper: table: 253:10: thin-pool: preresume failed, error = -22
>
>
> So the primary question is - how the LVM could have got 'smaller' metadata
> back - have you played with  'vgcfgrestore' ?
>
> So when you submit version of tools - also 

Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-21 Thread Mark Mielke
On Mon, Sep 21, 2020 at 10:14 AM nl6720  wrote:

> I wanted to know why the "issue_discards" setting isn't enabled by
> default. Are there any dangers in enabling it or if not is there a
> chance of getting the default changed?
>
> Also it's not entirely clear to me if/how "issue_discards" affects thin
> pool discard passdown.
>

Historically, there have been dangers. Even today, there might still be
dangers - although, I believe Linux (and other OS) may disable the feature
in hardware which is known to behave improperly.

If you do research and ensure you are using a good storage drive, there
should not be any problems. I enable issue_discards on all systems I work
with at home and work, and have not encountered any problems. But, I also
don't buy cheap drives with questionable namebrands.

It's pretty common for settings such as these to be more conservative, to
ensure that the users who are willing to accept the risk (no matter how
small) can turn it on as an option, and the users who are unaware or might
not have evaluated the risk, cannot blame the software vendors for losing
their data. In the case of LVM - it's not LVM's fault that some drives
might lose your data when discard is sent. But, users of LVM might blame
LVM.

-- 
Mark Mielke 
___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-21 Thread Zdenek Kabelac

Dne 21. 09. 20 v 16:14 nl6720 napsal(a):

Hi,

I wanted to know why the "issue_discards" setting isn't enabled by
default. Are there any dangers in enabling it or if not is there a
chance of getting the default changed?

Also it's not entirely clear to me if/how "issue_discards" affects thin
pool discard passdown.


Hi

Have you checked it's enclosed documentation in within /etc/lvm/lvm.conf ?

issue_discards is PURELY & ONLY related to sending discard to removed disk 
extents/areas after 'lvremove'.


It is't not in ANY way related to actual discard handling of the LV itself. So 
if you have LV on SSD it is automatically processing discards. From the same 
reason it's unrelated to discard processing of thin-pools.


And finally why we prefer issue_discards to be disabled (=0) by default. It's 
very simple - with lvm2 we try (when we can) to support one-command-back 
restore - so if you do 'lvremove' - you can use vgcfgrestore to restore 
previous metadata and you have your LV back with all the data inside.


When you have issue_discards=1  - the device gets TRIM - so all the data
are discarded at device level - so when you try to restore your
previous metadata - well it's nice - but content is gone forever

If user can live with this 'risk' and prefers immediate discard - perfectly 
fine - but it should be (IMHO) admin's  decision.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Removing lvmcache fails

2020-09-21 Thread Roy Sigurd Karlsbakk
> Please can you open upstream BZ here:
> 
> https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper
> 
> List all the info - i.e. package versions (yep even .debs), kernel versions
> lvmdump  archive.
> 
> I can probably easily 'hand-make' lvm2 metadata for you for 'vgcfgrestore' -
> but I'd like to track this case and create some reproducer for this issue so
> we can handle case were the cache cannot be cleared.

https://bugzilla.redhat.com/show_bug.cgi?id=1881056

Here you go. Thanks for any help :)

PS: I love languages, I just wonder how you pronounce Zdenek ;)

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] Why isn't issue_discards enabled by default?

2020-09-21 Thread nl6720
Hi,

I wanted to know why the "issue_discards" setting isn't enabled by 
default. Are there any dangers in enabling it or if not is there a 
chance of getting the default changed?

Also it's not entirely clear to me if/how "issue_discards" affects thin 
pool discard passdown.


signature.asc
Description: This is a digitally signed message part.
___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Removing lvmcache fails

2020-09-21 Thread Roy Sigurd Karlsbakk
> Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a):
>> Hi all
>> 
>> I have an ssd hooked up to my system to do some caching of the raid-6 array.
>> Now, that SSD isn't very new and has started developing some smart errors, 
>> so I
>> thought I'd just remove it and find a nice recycle bin for it. However, this
>> fails.
> 
> Hmm - cache isn't yet very easy to use once you start to have faulty devices
> in your device stack

Well, that's why I want to replace it ;)

>> This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2)
>> 
>> Any idea how to get rid of this?
>> 
>> # lvconvert --uncache data/data
>>Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 
>> 35277148
>>19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2
>>migration_threshold 2048 smq 0 rw -
> 
> 
> However since you have 'writethrough' cache mode - it should be possible to
> use  'lvconvert --uncache --force'

# lvconvert --uncache --force data/data
  Unknown feature in status: 8 2484/262144 128 819198/819200 57470386 35323243 
19987362 9940326 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 smq 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 
19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 
19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
(et cetera et cetera ad infinitum)

> But I'm somehow confused how you can have any dirty blocks in this case ??

So am I

> lvm2 2.03.02 version was somewhat experimental release - so I've recommend
> something newer  - which will also properly parse newer status output from
> newer kernel dm-cache module.

Well, that's weird - debian is usually rather conservative on its packages in a 
stable release. Are you sure?

I also updated to a backported kernel 5.7 without any change.

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] Removing lvmcache fails

2020-09-21 Thread Roy Sigurd Karlsbakk
Hi all

I have an ssd hooked up to my system to do some caching of the raid-6 array. 
Now, that SSD isn't very new and has started developing some smart errors, so I 
thought I'd just remove it and find a nice recycle bin for it. However, this 
fails.

This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2)

Any idea how to get rid of this?

# lvconvert --uncache data/data
  Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 35277148 
19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 smq 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 
19684834 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 
19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 
19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
  Flushing 1 blocks for cache data/data.
  Unknown feature in status: 8 2484/262144 128 819192/819200 56391295 33200598 
19684835 6317925 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 
(and so on)

# lvs -o+cache_mode data/data
  Unknown feature in status: 8 2484/262144 128 819195/819200 57263401 35277148 
19955449 9916671 0 3 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 smq 0 rw -
  LV   VG   Attr   LSize  Pool Origin   Data%  Meta%  Move Log 
Cpy%Sync Convert CacheMode
  data data Cwi-aoC--- 13,67t [_cache] [data_corig] 99,99  0,950,01 
writethrough

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Removing lvmcache fails

2020-09-21 Thread Zdenek Kabelac

Dne 21. 09. 20 v 12:01 Roy Sigurd Karlsbakk napsal(a):

Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a):

Hi all




However since you have 'writethrough' cache mode - it should be possible to
use  'lvconvert --uncache --force'


# lvconvert --uncache --force data/data
   Unknown feature in status: 8 2484/262144 128 819198/819200 57470386 35323243 
19987362 9940326 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 smq 0 rw -
   Flushing 1 blocks for cache data/data.
   Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 
19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
   Flushing 1 blocks for cache data/data.
   Unknown feature in status: 8 2484/262144 128 819198/819200 57439248 35307947 
19981793 9938992 0 0 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 cleaner 0 rw -
(et cetera et cetera ad infinitum)


But I'm somehow confused how you can have any dirty blocks in this case ??


So am I


lvm2 2.03.02 version was somewhat experimental release - so I've recommend
something newer  - which will also properly parse newer status output from
newer kernel dm-cache module.


Well, that's weird - debian is usually rather conservative on its packages in a 
stable release. Are you sure?

I also updated to a backported kernel 5.7 without any change.



Hi

Please can you open upstream BZ here:

https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper

List all the info - i.e. package versions (yep even .debs), kernel versions
lvmdump  archive.

I can probably easily 'hand-make' lvm2 metadata for you for 'vgcfgrestore' - 
but I'd like to track this case and create some reproducer for this issue so

we can handle case were the cache cannot be cleared.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Removing lvmcache fails

2020-09-21 Thread Zdenek Kabelac

Dne 21. 09. 20 v 10:51 Roy Sigurd Karlsbakk napsal(a):

Hi all

I have an ssd hooked up to my system to do some caching of the raid-6 array. 
Now, that SSD isn't very new and has started developing some smart errors, so I 
thought I'd just remove it and find a nice recycle bin for it. However, this 
fails.


Hmm - cache isn't yet very easy to use once you start to have faulty devices 
in your device stack




This is Debian Buster (latest) with kernel 5.4 and lvm 2.03.02(2)

Any idea how to get rid of this?

# lvconvert --uncache data/data
   Unknown feature in status: 8 2488/262144 128 819192/819200 57263288 35277148 
19955332 9916661 0 1 1 3 metadata2 writethrough no_discard_passdown 2 
migration_threshold 2048 smq 0 rw -



However since you have 'writethrough' cache mode - it should be possible to 
use  'lvconvert --uncache --force'


But I'm somehow confused how you can have any dirty blocks in this case ??

lvm2 2.03.02 version was somewhat experimental release - so I've recommend 
something newer  - which will also properly parse newer status output from

newer kernel dm-cache module.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] thin: pool target too small

2020-09-21 Thread Zdenek Kabelac

Dne 21. 09. 20 v 1:48 Duncan Townsend napsal(a):

Hello!

I think the problem I'm having is a related problem to this thread:
https://www.redhat.com/archives/linux-lvm/2016-May/msg00092.html
continuation https://www.redhat.com/archives/linux-lvm/2016-June/msg0.html
. In the previous thread, Zdenek Kabelac fixed the problem manually,
but there was no information about exactly what or how the problem was
fixed. I have also posted about this problem on the #lvm on freenode
and on Stack Exchange
(https://superuser.com/questions/1587224/lvm2-thin-pool-pool-target-too-small),
so my apologies to those of you who are seeing this again.



Hi

At first it's worth to remain which version of  kernel, lvm2, thin-tools 
(d-m-p-d package on RHEL/Fedora-   aka  thin_check -V) is this.




I had a problem with a runit script that caused my dmeventd to be
killed and restarted every 5 seconds. The script has been fixed, but


Kill dmeventd is always BAD plan.
Either you do not want monitoring (set to 0 in lvm.conf) - or
leave it to it jobs - kill dmeventd in the middle of its work
isn't going to end well...)




device-mapper: thin: 253:10: reached low water mark for data device:
sending event.
lvm[1221]: WARNING: Sum of all thin volume sizes (2.81 TiB) exceeds
the size of thin pools and the size of whole volume group (1.86 TiB).
lvm[1221]: Size of logical volume
nellodee-nvme/nellodee-nvme-thin_tdata changed from 212.64 GiB (13609
extents) to <233.91 GiB (14970 extents).
device-mapper: thin: 253:10: growing the data device from 13609 to 14970 blocks
lvm[1221]: Logical volume nellodee-nvme/nellodee-nvme-thin_tdata
successfully resized.


So here was successful resize -


lvm[1221]: dmeventd received break, scheduling exit.
lvm[1221]: dmeventd received break, scheduling exit. > lvm[1221]: WARNING: Thin 
pool
nellodee--nvme-nellodee--nvme--thin-tpool data is now 81.88% full.
 (lots of repeats of "lvm[1221]: dmeventd received break,
scheduling exit.")
lvm[1221]: No longer monitoring thin pool
nellodee--nvme-nellodee--nvme--thin-tpool.
device-mapper: thin: 253:10: pool target (13609 blocks) too small:
expected 14970


And now we can see the problem - the thin-pool was already upsized to bigger
size (13609 -> 14970 as seen above) - yet something has tried to activate 
thin-pool with smaller metadata volume.




device-mapper: table: 253:10: thin-pool: preresume failed, error = -22


This is correct - it's preventing further damage of thin-pool to happen.


lvm[1221]: dmeventd received break, scheduling exit.
(previous message repeats many times)

After this, the system became unresponsive, so I power cycled it. Upon
boot up, the following message was printed and I was dropped into an
emergency shell:

device-mapper: thin: 253:10: pool target (13609 blocks) too small:
expected 14970
device-mapper: table: 253:10: thin-pool: preresume failed, error = -22



So the primary question is - how the LVM could have got 'smaller' metadata
back - have you played with  'vgcfgrestore' ?

So when you submit version of tools - also provide  /etc/lvm/archive
(eventually lvmdump archive)


I have tried using thin_repair, which reported success and didn't
solve the problem. I tried vgcfgrestore (using metadata backups going
back quite a ways), which also reported success and did not solve the
problem. I tried lvchange --repair. I tried lvextending the thin



'lvconvert --repair' can solve only very basic issues - it's not
able to resolve badly sized metadata device ATM.

For all other case you need to use manual repair steps.



I am at a loss here about how to proceed with fixing this problem. Is
there some flag I've missed or some tool I don't know about that I can
apply to fixing this problem? Thank you very much for your attention,


I'd expect in your /etc/lvm/archive   (or in the 1st. 1MiB of your device 
header) there can be seen a history of changes of your lvm2 metadata and you 
should be able ot find when then _tmeta LV was matching your new metadata size

and maybe see when it's got previous size.

Without knowing more detail it's hard to give precise answer - but before you
will try to do some next steps of your recovery be sure you know what you
are doing - it's better to ask here the be sorry later.

Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/