Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-29 Thread Zdenek Kabelac

Dne 28. 09. 23 v 14:23 Jean-Marc Saffroy napsal(a):

On Wed, Sep 27, 2023 at 5:41 PM Zdenek Kabelac  wrote:



What is the role of "dmsetup suspend"? I am having trouble finding
decent documentation about its purpose and how it's related to
snapshots. I did not need it in my experiments, so I am curious.




Suspend is freezing device's i/o queue  (together with freezing FS layer - so
the snapshot should be easily mountable without requiring extensive fsck
operation as it would be missing some important metadata to be written on disk)
So the goal of a suspend is to take a 'good point in time' where the content
of snapshot is having all 'committed' transaction on disk in valid state.


Is this still required or useful with a journaling FS like ext4? It is
robust to pulling the plug at any time, so any point in time should be
good, no?


Wondering where do you came to the idea that journaling FS can rescue such 
scenario flawlessly.  Sure 'FS' should not completely broke itself if you 
avoid this suspension & fsfreeze  - but on the other hand the internal 
inconsistency within a snapshot would require some repairing operation to 
happen - and potential risk of valid data loss as even ext4 by default 
journals only it's metadata, and 'data' are journaled only in 'data=journal' 
mode - which is however used only by very small group of users who are willing 
to give-up performance for this feature.


In all other cases - you want to get FS into frozen state before taking its 
snapshot - so there is maximal consistency.




That said, I am curious about what can be achieved with dmsetup
commands. By any chance, do you have pointers to documentation besides
what's in the kernel (Documentation/admin-guide/device-mapper/)?


There are some DM talks available on the net describing some target logic in 
greater details with some drawn boxes describing I/O flow - but other then 
that I'm not sure what other kind of help would be needed here?


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-27 Thread Zdenek Kabelac

Dne 27. 09. 23 v 15:45 Jean-Marc Saffroy napsal(a):

On Wed, Sep 27, 2023 at 11:58 AM Zdenek Kabelac
 wrote:


Dne 27. 09. 23 v 1:10 Jean-Marc Saffroy napsal(a):

Hi,

On Tue, Sep 26, 2023 at 10:00 PM Zdenek Kabelac
 wrote:

Yep typical usage is to encrypt underlying PV - and then create LVs and its
snapshots on encrypted device.


Sure, I'd do that in other circumstances.

But in my case it would just be a waste: I am replacing several disks
on a desktop computer with a single 2TB NVME SSD for everything. Only
/home needs to be encrypted, and it's tiny, like 100-200GB. Going
through encryption for most application I/Os would use CPU time and
increase latency with no benefit.

So I prefer to manage available raw (un-encrypted) space with LVM.

Now, I also need to do backups of /home, and that's why I want
snapshots. But that first layer of LVM would only show a snapshot of
an encrypted volume, and the backup job shouldn't have the passphrase
to decrypt the volume.

Which is why I'm trying to find a way of doing snaphots of an "opened"
LUKS volume: this way, the backup job can do its job without requiring
a passphrase.


well that's where you will considerably 'complicate' your life :)
As you would need to 'orchestrace' this yourself with 'dmsetup' usage.


If you mean that I'd have to script a few things, then I am perfectly
fine with that.


running 'dmsetup suspend' on your home device,
that taking a snapshot of your underlying  LV.


What is the role of "dmsetup suspend"? I am having trouble finding
decent documentation about its purpose and how it's related to
snapshots. I did not need it in my experiments, so I am curious.




Suspend is freezing device's i/o queue  (together with freezing FS layer - so 
the snapshot should be easily mountable without requiring extensive fsck 
operation as it would be missing some important metadata to be written on disk)
So the goal of a suspend is to take a 'good point in time' where the content 
of snapshot is having all 'committed' transaction on disk in valid state.


Clearly that needs a 'top-level device - which would be a crypto DM in your 
case - and goes via 'tree' down to PV level.


Clearly lvm2 does this while taking snapshot  - and you can easily observe 
that 'magic' if you read carefully  -  trace of a command.


Then your script needs to replicate this at script level.
Fun would begin once you would start to resolve all the possible error paths...



If I were okay with giving the passphrase to my backup script, then I
could simply have the backup script create its snapshot from the
encrypted LV, and I wouldn't have started this thread in this case.
:-)


Maybe you could drop your whole disk encryption idea then and just use some 
encrypted tarballs - since if you tend to place passwords into scripts - it's 
kind of big security hole


If all you want to have is encrypted files on disk - there are probably more 
easier approaches with the use of encrypted filesystems...



But the level of complexity here is rather  high - this it might be actually
way easier to just 'partition' your device for  'encrypted'  and unecrypted'
parts and use 2 PVs for 2 VGs


But then I can't resize the encrypted volume/partition.


Not sure how often do you need to do that - surely a 'split' between those 2 
partitions is a decisive point...


But then whatever you do within those VG is fully resizable as before.



It seems LVM cannot do it directly, but it becomes possible (at least
in my simple tests) if I use a bunch of dmsetup commands, or if I use
the decrypted device as the PV for a new VG.

But I don't know if these approaches are safe to use, and that is what
drove me here.

In the mean time, I found this page:
https://access.redhat.com/articles/2106521

Apparently, LVM on LUKS on LVM would be case of "LVM recursion", and
so not entirely unheard of.

Does anyone here have experience with "LVM recursion"?


lvm2 does not advice to use 'stacking'  - it's very complicated and in some 
way inefficient - it's always true the less layers -> the better the 
performance will be (especially true with modern NVMe devices...)


And lvm2 itself does NOT support/count with recursion - so while it may appear 
to be working - there will be corner cases with unresolvable problems - 
although this is mostly an issue if you are running your system from such 
'stacked' solution (causing deadlocks...)


Anyway - placing VG on top of another VG is always good to avoid and use only 
as a last resource if there is no better approach existing.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-27 Thread Zdenek Kabelac

Dne 27. 09. 23 v 1:10 Jean-Marc Saffroy napsal(a):

Hi,

On Tue, Sep 26, 2023 at 10:00 PM Zdenek Kabelac
 wrote:

Yep typical usage is to encrypt underlying PV - and then create LVs and its
snapshots on encrypted device.


Sure, I'd do that in other circumstances.

But in my case it would just be a waste: I am replacing several disks
on a desktop computer with a single 2TB NVME SSD for everything. Only
/home needs to be encrypted, and it's tiny, like 100-200GB. Going
through encryption for most application I/Os would use CPU time and
increase latency with no benefit.

So I prefer to manage available raw (un-encrypted) space with LVM.

Now, I also need to do backups of /home, and that's why I want
snapshots. But that first layer of LVM would only show a snapshot of
an encrypted volume, and the backup job shouldn't have the passphrase
to decrypt the volume.

Which is why I'm trying to find a way of doing snaphots of an "opened"
LUKS volume: this way, the backup job can do its job without requiring
a passphrase.


well that's where you will considerably 'complicate' your life :)
As you would need to 'orchestrace' this yourself with 'dmsetup' usage.

running 'dmsetup suspend' on your home device,
that taking a snapshot of your underlying  LV.

Here the usage of 'thin-pool' would possibly help a little bit - as you get a 
control over when a snapshot LV appears in your system.


Once you have the snapshot created you 'resume'  the top-level
decrypted volume.

Then if you want to access your snapshot - you create another 'crypto' device 
- unlock it again with your key - and it should work.


But the level of complexity here is rather  high - this it might be actually
way easier to just 'partition' your device for  'encrypted'  and unecrypted' 
parts and use 2 PVs for 2 VGs



But my tests don't tell me if there are other people doing similar
things on production systems, or if they are happy with the results.
Unusual setups tend to exhibit unusual bugs, and I am not super fond
of bugs in my storage systems. :-)


Yep - people prefer simple rock solid solutions
That's why the above describe scenarios is not really used
As solving then all individual errors that may appear is far from being simple.



Just the one /home in my case, so no worse than prompting for the
passphrase for an entire disk.


Every access to a snapshot needs then a new separate  'unlock'..


Speaking about snapshots - you should consider switching to 'thin-pools'  for
far better performance...


I only need snapshots for backups: once a day, create a snapshot,
mount it, do a file-level incremental backup, unmount it, delete it.

Would the thin-pools make a difference in this case?


Well there are many ways how to skin a cat...
I.e. check  blk-archive  https://github.com/jthornber/blk-archive

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-26 Thread Zdenek Kabelac

Dne 25. 09. 23 v 0:09 Jean-Marc Saffroy napsal(a):

Hello LVM experts,

I am trying to create a volume with the following properties:
- the volume can be resized
- the volume is encrypted
- the volume can be snapshotted (for online backups)

So I thought I'd create the volume with LVM, encrypt it with LUKS, and 
snapshot it with LVM. However, LVM doesn't want to snapshot the unencrypted 
LUKS volume, as it is not an actual logical volume known to LVM (and I am not 
keen on snapshotting the encrypted volume, as that means the backup process 
would need the passphrase to mount the encrypted snapshot).


Is there a good way to achieve this with LUKS and LVM, or should I look 
elsewhere?

I have two ideas but I don't know if they are safe or practical:
- I could try running LVM (snapshots) ontop of LUKS (encryption)itself ontop 
of LVM (resize)


Hi


Yep typical usage is to encrypt underlying PV - and then create LVs and its 
snapshots on encrypted device.




- or I could try working with dmsetup to fill the gap between LUKS and LVM

I did simple tests with dmsetup, and that *seems* to work, however I am not 
sure at all if that would be robust. An outline of my test:

- create an LVM volume (lvcreate) from a larger volume group
- make it a LUKS volume (cryptsetup lukfsFormat)
- "open" the LUKS volume (cryptsetup open)
- create a snapshot-origin volume from the open LUKS volume (dmsetup create)
- mount that as my active volume
- every time I want to do a backup:
   create a temporary snapshot volume from the origin, mount it, run the 
backup, unmount it, delete it


Usually those 'into encryption' want to have encrypted everything - thus even 
layout of the whole storage.


Encrypting 'individual' LVs - while certainly 'doable' would i.e. create a 
considerable larger amount of volumes that would need individual 'unlocking' 
with each activation.


Speaking about snapshots - you should consider switching to 'thin-pools'  for 
far better performance...


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Trouble Booting Custom Kernel with QEMU: dracut-initqueue timeout waiting for /dev/sysvg/root

2023-08-31 Thread Zdenek Kabelac

Dne 31. 08. 23 v 0:40 Vishal Chourasia napsal(a):

On 8/30/23 20:01, Zdenek Kabelac wrote:

Dne 30. 08. 23 v 8:40 Vishal Chourasia napsal(a):

Hi All,


# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.4.7-200.fc38.x86_64
root=/dev/mapper/sysvg-root ro console=tty1 console=ttyS0,115200n8
rd.lvm.lv=sysvg/root

I have tried passing "root=/dev/mapper/sysvg-root rd.lvm.lv=sysvg/root"
with the -append option and it hasn't worked either.


dracut that needs to include lvm2 code and be able to activate such LV
prior switch to rootfs  (having somewhere inside  lvchange -ay
vgname/lvrootname)

1. How may I verify that dracut has included lvm2 code or not?
2. Which file in the dracut would contain
`lvchange -ay vgname/lvrootname` code?


Hi

If you are making ramdisk image from the system - dracut should autodetect 
such case and build the image with lvm2 support inside (unless you've 
instructed in dracut.conf to not doing so).


You can check if you dracut ramdisk image contains /usr/sbin/lvm
and also if dirs  in /usr/lib/dracut/hooks do have lvm2 scripts inside as well 
as /etc/lvm/lvm.conf.


Not really sure how are you trying to build this yourself since the Fedora 
anaconda installer should be doing all of this for you automatically - so how 
did you got to the point you don't get this from your installation time ?


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Trouble Booting Custom Kernel with QEMU: dracut-initqueue timeout waiting for /dev/sysvg/root

2023-08-30 Thread Zdenek Kabelac

Dne 30. 08. 23 v 8:40 Vishal Chourasia napsal(a):

Hi All,

I hope this email finds you well. I am currently facing an issue with
booting a VM using a custom-compiled kernel and would appreciate your
expertise on the matter.



Hi

Not really sure how this relates to 'lvm2' yet - your kernel 'stuck' seems to 
be caused by the inability to switch to 'rootfs'  - which should be located on 
/dev/vda3  (according to your qemu exec line) - but kernel panics

reports that device does not have recognizable filesystem.

There seems to be no lvm2 involved at all so far.



### Problem Description
I have downloaded the `Fedora-Server-KVM-38-1.6.x86_64.qcow2` image and
successfully booted it using `qemu-system-x86_64`. However, when I try
to boot this VM with a custom-compiled kernel using the `-kernel` flag,
it fails to boot. The root filesystem is on LVM, and it seems the kernel
needs to activate volume groups before mounting the root filesystem.


If you believe the filesystem is really on LVM and your /dev/vda3 is just a PV 
- then your boot line is wrong - and you need to be using different naming -

possibly something like:
root=/dev/vgname/lvrootname rd.lvm.lv=vgname/lvrootname

dracut that needs to include lvm2 code and be able to activate such LV prior 
switch to rootfs  (having somewhere inside  lvchange -ay vgname/lvrootname)


But it's not really clear how have you moved from your /dev/vda3 to
something on top of lvm2...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Problems encountered in compiling lvm packages using the musl Toolchain

2023-08-09 Thread Zdenek Kabelac

Dne 26. 07. 23 v 4:02 程智星 napsal(a):

hi,friend:

I encountered the following problems when compiling lvm packages using the 
musl Toolchain:


Question 1 :

| vgimportdevices.c: In function 'vgimportdevices':
| vgimportdevices.c:148:30: error: 'LOCK_EX' undeclared (first use in this 
function); did you mean 'LOCKED'?

|   148 |  if (!lock_devices_file(cmd, LOCK_EX)) {
|       |                              ^~~
|       |                              LOCKED
| vgimportdevices.c:148:30: note: each undeclared identifier is reported only 
once for each function it appears in


About LOCK_EX definition is in file.h, I can compile this package by adding 
this Header file in vgimportdevices.c:


#include 

Could you add this Header file to our source code to fix this problem or do 
you have a better solution?



Looking forward to your reply!



Hi

Commit a1a1439215f56335a06ae5ac6ca73b5e0d734760 has been added upstream Feb 7.
So please make sure you are compiling latest upstream.

Note: recently the git HEAD has been moved to https://gitlab.com/lvmteam/lvm2
although sourceware git mirror should also work.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] ThinPool performance problem with NVMe

2023-07-17 Thread Zdenek Kabelac

Dne 24. 06. 23 v 1:22 Anton Kulshenko napsal(a):

Hello.

Please help me figure out what my problem is. No matter how I configure the 
system, I can't get high performance, especially on writes.


OS: Oracle Linux 8.6, 5.4.17-2136.311.6.el8uek.x86_64
Platform: Gigabyte R282-Z94 with 2x 7702 64cores AMD EPYC and 2 TB of RAM
Disks: NVMe Samsung PM1733 7.68 TB

What I do:
vgcreate vg1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1
lvcreate -n thin_pool_1 -L 20T vg1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 
/dev/nvme3n1 -i 4 -I 4


-i4 for striping between all disks, -I4 strip size. Also I tried 8, 16, 32... 
In my setup I can't find a big difference.




stripe size needs to be aligned with some 'hw' properties.

In the case of 'NVMe' where the write unit for optimal performance is usually 
0.5M or more - using 4K block is basically massively destroying your 
performance since you generate for each  large write huge amount of splits.



I only get 40k iops, while one drive at the same load easily gives 130k iops.
I have tried different block sizes, strip sizes, etc. with no result. When I 
look in iostat I see the load on the disk where the metadata is:

80 WMB/s, 12500 wrqm/s, 68 %wrqm

I don't understand what I'm missing when configuring the system.



As mentioned by Mathew - you likely should start with some 'initial' thin-pool 
size - maybe sitting fully on single NVMe -  possibly deploy metadata on 2nd. 
NVMe for better bus utilization.


For striping - you would need to go with 512K units at least - then it's the 
question how it fits your workload...


Anyway now you have way more things to experiment and benchmark and figure out 
what is the best on your particular hw.


One more thing - increasing chunksize to 256K or 512K also may significantly 
raise the performance - but at the price of reduced sharing in case of taking 
a snapshot of a thin volume...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm2-testsuite stability

2023-06-20 Thread Zdenek Kabelac

Dne 19. 06. 23 v 20:22 Scott Moser napsal(a):

Hi, thanks for your response.


Yep - some tests are failing


expected-fail  api/dbustest.sh


We do have them even split to individual tests;
api/dbus_test_cache_lv_create.sh
api/dbus_test_log_file_option.

sh

That is not available upstream, right?
I just saw the single 'dbustest.sh' in
[main/test](https://github.com/lvmteam/lvm2/tree/master/test/api).
Is there another branch I should be looking at?


Correct - that's a local 'mod' for some test machines - but I'd like to get it
merged to upstream - although made in different way.


I'd likely need to get access/see  to the logs of such machines
(or you would need to provide as some downloadable image of you Qemu machine
installation)


The gist at https://gist.github.com/smoser/3107dafec490c0f4d9bf9faf02327f04
describes how I am doing this.  It is "standard" package build and autopkgtest
on debian/ubuntu.  The autopkgtest VM does not use LVM for the system
so we don't have to worry about interaction with that.

I could provide a vm image if you were interested.



The tricky part with lvm2 is it's dependency on the proper 'udev' rule 
processing. Unfortunately Debian distro somewhat changes those rules on it's 
upstream package without deeper consultation with upstream and there were few 
more difference that upstream lvm2 doesn't consider as valid modification 
(though haven't checked the recent state)



Do others run this test-suite in automation and get reliabl results ?


Yes our VM machines do give reliable results for properly configured boxes.
Although as said before - there are some 'failing' tests we know about.



Identifying the set of tests that were allowed to fail in git
and gating pull requests on successful pass would be wonderful.  Without
some expected-working list, it is hard for me as a downstream user to
separate signal from noise.


There are no 'tests' allowed to fail.

There are either 'broken' tests or broken lvm2 code - but it's just not always 
exactly easy to fix some bugs and not enough hands to fix all issues quickly.
So all failing tests do present some real problem from class  a)  or  b) and 
should be fixed - it may have just lower priority with other tasks.




Would upstream be open to pull requests that added test suite running
  via github actions?  is there some other preferred mechanism for such a thing?

The test suite is really well done. I was surprised how well it insulates
itself from the system and how easy it was to use.  Running it in a
distro would give the distro developer a *huge* boost in confidence when
attempting to integrate a new LVM release into the distro.


Basically we are in decision point to move either to github or gitlab and add
this CI capabilities - but definitively some extra hands here might be helpful.



We would need to think much harder if the test should be running with
some daemons or autoactivation on the system that could see and could
interact with our devices generated during the test run (one of the
reasons machine for tests need some local modification - we may provide
some Ansible-like testing script eventually.


Autopkgtest will
  * start a new vm for each run of the tests
  * install the packages listed as dependencies of the test.
  * run the test "entrypoint" (debian/test/testsuite).

I think that I have debian/test/testsuite correctly shutting
down/masking the necessary system services before invoking the tests. As
suggested in TESTING.


I'm not sure what's the state of current udev rules - and these may impact 
some tests and possibly add some unexpected randomness


Another aspect of our test suite is the 'try-out' of various 'race' moments,
which may eventually need further tuning on even faster hardware to hit the 
race - but that might be possibly harder to 'set-up' if the VM are without
'ssh' access for a developer to enhance testing (it might be somewhat annoying 
trying to fix this with individual git commits)



If you are willing to help, I can post a vm image somewhere. I suspect


For at least initial diagnostics should be sufficient to just expose somewhere 
results from failing tests (content of failing tests in this subdir basically).



you're not working with debian or ubuntu on a daily basis.  If you had
access to a debian or ubuntu system it would probably be easiest to
just let autopkgtest do the running. autopkgtest does provide a
`--shell` and `--shell-fail` parameter to put you into a root shell
after the tests.

My ultimate goal is to provide a distro with confidence that the lvm2
package they're integrating is working correctly.  I'm ok to skip
tests that provide noisy results.  In this case, having *some*
reliable test is a huge improvement.


We were kind of trying to get some 'strange' deviation of Debian package fixed 
in past - however it seemed to lead nowhere...
(Ideally all the 'needed' changes should be only set via configure option and 
there should be no need of any extra patch on 

Re: [linux-lvm] lvm2-testsuite stability

2023-06-19 Thread Zdenek Kabelac

Dne 15. 06. 23 v 20:02 Scott Moser napsal(a):

Hi,
[sorry for duplicate post, re-sending from a subscribed address]

I'm looking to enable the lvm2 testsuite as an autopkgtest [1] to run
in debian and ubuntu. I have a merge request up at [2].  The general
idea is just to a.) package 'lvm2-testsuite' as an installable package
b.) run the testsuite as part of the autopkgtest.

The version I'm testing on Ubuntu 22.04 is 2.03.16-3 from debian
(rebuilt for 22.04). I'm running udev-vanilla  in a 2 cpu/4GB VM, and
stopping/masking  the following services: dm-event lvm2-lvmpolld
lvm2-monitor lvm2-lvmdbusd .

I'm seeing some failures when running the test.  Some seem expected
due to size limitations, some seem to fail every time, and some see
transient failures.

Here is the list of tests that I'm seeing fail and my initial
categorization.  I've seen this across say half a dozen runs:



Yep - some tests are failing


expected-fail  api/dbustest.sh


We do have them even split to individual tests;

api/dbus_test_cache_lv_create.sh


api/dbus_test_copy_signature.sh 



api/dbus_test_external_event.sh 



api/dbus_test_log_file_option.sh



api/dbus_test_wipefs.sh 



api/dbus_test_z_sigint.sh

these need to be fixed and resolved.


expected-fail  shell/lvconvert-repair-thin.sh





space-req  shell/lvcreate-large-raid.sh
space-req  shell/lvcreate-thin-limits.sh
expected-fail  shell/lvm-conf-error.sh
expected-fail  shell/lvresize-full.sh
timeoutshell/pvmove-abort-all.sh
space-req  shell/pvmove-basic.sh
expected-fail  shell/pvscan-autoactivation-polling.sh
expected-fail  shell/snapshot-merge.sh
space-req  shell/thin-large.sh
racy   shell/writecache-cache-blocksize.sh


These are individual - we have some of those testing on some machines.
They may need some 'extra' care.



expected-fail fails most every time. timeout seems to work sometimes,
space-req i think is just space requirement issue (i'll just skip
those tests).



I'd likely need to get access/see  to the logs of such machines
(or you would need to provide as some downloadable image of you Qemu machine 
installation)




The full output from the test run can be seen at [3] in the
testsuite-stdout.txt and testsuite-stderr.txt files.

Do others run this test-suite in automation and get reliable results ?



We surely do run these tests on regular basis on VM - so those are usually 
slightly modified to avoid collisions with tests.
There is also no strict rule to not break some 'tests' - so occasionally some 
tests can be failing for a while if they are seen 'less important' over some 
other bugs...


We would need to think much harder if the test should be running with some 
daemons or autoactivation on the system that could see and could interact with 
our devices generated during the test run (one of the reasons machine for 
tests need some local modification - we may provide some Ansible-like testing 
script eventually.


But anyway - the easiest is to give us access to your test results so we could 
see whether there is something wrong with our test environment,  lvm2 bug, or 
system setup - it's not always trivial to guess...



Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] bug? shrink lv by specifying pv extent to be removed does not behave as expected

2023-04-12 Thread Zdenek Kabelac

Dne 12. 04. 23 v 14:37 Roland napsal(a):

 >Reall silly plan  - been there years back in time when drives were FAR
more expensive with the price per GiB.
 >Todays - just throw bad drive to recycle bin - it's not worth to do
this silliness.

ok, i understand your point of view. and thank you for the input.

but this applies to a world with endless ressources and when people can
afford the new hardware.



Hi

It's really not about the 'endless' resource - it's just about 'practical' 
thinking.



i think, with the same logic, you can designate some guy to be silly ,
if he puts a patch on his bicycle inner tube instead


To use your comparative case with bicycle - you wouldn't likely ride one - 
where the 'next' hole in the tube would be appearing randomly & unexpectedly
during your future rides - you would simply buy a new tube to be sure you 
could get somewhere


Bad drives are highly unpredictable - so as long as you simply don't care 
about data and you store there just something you could easily download again 
from some other place - that could be the only use case I could imagine,

but I'd never put there you only copy of your family album


but shouldn't we perhaps leave it up to the end user / owner of the
hardware,  to decide when it's ready for the recycle bin ?


Yeah - if the hardware would cost more :) then the time you spend with trying 
to analyze and use bad drives - than there would be whole 'recycling' industry 
for these drives - thankfully where are not heading towards this  ATM :)
What I could observe is that HDD of 'small' sizes are being totally obsoleted 
by SSDs/NMVes



or should we perhaps wait for the next harddrive supply crisis (like in
2011)?  then people start to get more creative in using
what they have, because they have no other option...


Assuming you are already preparing horses in the barn ?


yes, i have already come to the conclusion, that it's always better to
start from scratch like this. i dismissed the idea of
excluding or relocating bad sectors.


The key is - that you know how the drive is built and how many disk plates are 
affected and also how quickly errors are spreading to surrounding sectors


My best effort was always to leave some not so small amount of the 'free' 
space around bad disk areas - but once disks started to 'relocate' bad sectors 
on its own - this all became a game hard to win



 > But good advice from me - whenever  'smartctl' starts to show
relocation block errors - it's the right moment to  'dd_rescue'
 > any LV to your new drive...

yes, i'm totally aware that we walk on very thin ice here.

but i'd really like to collect some real world data/information on how
good such disk "recycling" can probably work.  i don't have


Maybe ask Google people :) they are the most experienced ones with trashing 
storage




i guess such "broken" disks being used with zfs in a redundant setup,
they could probable still serve a purpose. maybe not
for production data, but probably good enough for "not so important"
application.

it's a little bit academic project. for my own fun. i like to fiddle
with disks, lvm, zfs and that stuff


Another idea to 'deploy' (I've been even using myself) - is to just mkfs.ext2 
bad drive and and write there large sized (10MiB-100MiB) files of 'zeroes'.


Then simply md5 checksum - and remove 'correct' ones and keep and mark 
immutable those, where you fail to read them.


With some amount of luck - ext2 metadata will not hit the 'bad' parts of drive
(otherwise you would really need to use parted or lvm2 to skip those areas)

This way you end-up with somewhat 'usable' storage -  where bad sectors are 
hidden in those broken 'zero' files you just keep in filesystem.


Next time the 'error' spread  - you reapply same strategy.

And you even don't need lvm2 for this

This is the easiest way how to keep 'some' not fatally broken drives in use 
for a while - but just don't put there anything you depend on - as drive can 
be gone anytime very easily.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] bug? shrink lv by specifying pv extent to be removed does not behave as expected

2023-04-12 Thread Zdenek Kabelac

Dne 09. 04. 23 v 20:21 Roland napsal(a):

Well, if the LV is being used for anything real, then I don't know of
anything where you could remove a block in the middle and still have a
working fs.   You can only reduce fs'es (the ones that you can reduce)


my plan is to scan a disk for usable sectors and map the logical volume
around the broken sectors.

whenever more sectors get broken, i'd like to remove the broken ones to have
a usable lv without broken sectors.



Reall silly plan  - been there years back in time when drives were FAR more 
expensive with the price per GiB.


Todays - just throw bad drive to recycle bin - it's not worth to do this 
silliness.


HDD bad sectors are spreading - and slowly the surface gets destroyed

So if you make large 'head-room' around bad disk areas - if they are 
concentrated on some disk area - and you know topology of you disk drive
like i.e. 1% free disk space before and after bad area - you could possibly 
use disk for a little while more - but only to store invaluable data




since you need to rebuild your data anyway for that disk, you can also
recreate the whole logical volume.

my question and my project is a little bit academic. i'd simply want to try
out how much use you can have from some dead disks which are trash otherwise...


You could always take  'vgcfgbackup'  of lvm2 metadata and make some crazy 
transition of if with even  AWK/python/perl   -  but we really tend to support
just some useful features - as there is already  'too much' and users are 
getting often lost.


One very simply & naive implementation could be going alonge this path -

whenever you want to create new arrangement for you disk with 'bad' areas,
you can always start from 'scratch' - since afterall - lvm2 ONLY manipulates 
with metadata within disk front - so if you need to create new 'holes',

just   'pvcreate -f', vgcreate,   and 'lvcreate -Zn -Wn'
and then  'lvextend'  with normal  or  'lvextend --type error | --type zero' 
segment types around bad areas with specific size.
Once you are finished and your LV precisely matches your 'previous'  LV of you 
past VG - you can start to use this LV again with new arrangement of  'broken 
zeroed/errored' areas.


I've some serious doubts about usability of this with any filesystem :) but if 
you think this has some added value - fell free to use.
If the drive you play with would be 'discardable' (SSD/NVMe) then one must 
take extra care there is no 'discard/TRIM' anywhere in the process - as that 
would lose all data irrecoverably


But good advice from me - whenever  'smartctl' starts to show relocation block 
errors - it's the right moment to  'dd_rescue' any LV to your new drive...


yes, pvmove is the other approach for that.

but will pvmove continue/finish by all means when moving extents located on a
bad sector ?


pvmove  CANNOT be used with bad drives - it cannot deal with erroring sectors 
and basically gets stuck there trying to mirror unrecoverable disk areas...


Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] bad checksum in superblock, wanted

2023-03-30 Thread Zdenek Kabelac

Dne 30. 03. 23 v 5:34 Delarians napsal(a):
Good afternoon, I'm asking for help in recovering damaged metadata after a 
220V drop on the server, when the power was restored and the server booted up, 
I got an error in the Proxmox panel


Check of pool pve/data failed (status:1). Manual repair required!

then booted from LiveCD debian mounted corrupted metadata and tried to repair



Hi

You could try  'lvconvert --repair  vg/data'

However it depends how new version of 'thin_repair' tool is present on your 
system. The newer the tool - the more error cases it's able to recover.


i.e. I'd highly recommend to use version 0.9 or better.

In case the repair will NOT be successful (aka you cannot activate your thin 
volumes)  - you will have 'original' bad metadata in the device   data_meta0 
- you could compress and upload the content of this metadata device for futher 
analysis.


Regads

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM and RO device/partition(s

2023-03-22 Thread Zdenek Kabelac

Dne 20. 03. 23 v 17:37 lacsaP Patatetom napsal(a):

hi,

I come back to you with the memo mentioned : 
https://github.com/patatetom/lvm-on-readonly-block-device 

I hope that it will allow you to better understand this problem of alteration 
of the disk.


as I mentioned, LVM should normally/theoretically not touch the disk as long 
as it is read-only, but what bothers me the most is the fact that I can't 
"catch up" by correcting the new 6.1.15 kernel as I did before.


regards, lacsaP.

Le lun. 20 mars 2023 à 15:15, lacsaP Patatetom 

Hi

So I'm possibly finally starting to understand your problem here.

You are using your own patched kernel - where you were reverting
a32e236eb93e62a0f692e79b7c3c9636689559b9  linux kernel patch.

without likely understanding the consequences.

With kernel 6.X there is commit bdb7d420c6f6d2618d4c907cd7742c3195c425e2
modifying bio_check_ro() to return void  - so your 'reverting' patch
is no longer usable the way it's been created.


From your github report it seems you are creating  'raid' across 3 sdb drives.

So with normal kernel - it happens to be that  'dm' drives are allowed to 
bypass any 'read-only' protection set on a device.


So when you actually creating raid LV on  loop0 & loop1 - deactivate, then you 
make loop0 & loop1  read-only, active raid LV - then you can easily call 
'mkfs' and it will normally work.


Raid device consist or  '_rimage' & '_rmeta' LV per leg - where _rmeta is 
metadata device updated with activation of raid LV.


So when your local 'revert' patch for 6.X kernel no longer works - there is no 
surprise that your  'sdbX' drives are being actually modified - since ATM  dm 
targets are allowed to bypass  read-only protection.


Since the reason for the  'bypass' (snapshot read-only activation)  was fixed 
5 years ago we should probably build some better way how to restore to 
'read-only' protection - and allow to disable it only when user requests such 
behavior due to use of old user-space tooling.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM and RO device/partition(s)

2023-03-20 Thread Zdenek Kabelac

Dne 19. 03. 23 v 11:27 Pascal napsal(a):

hi,

the bio_check_ro function of the blk-core.c source file of the Linux kernel 
refers to LVM :

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/block/blk-core.c?h=v6.2.7#n500
 


how does LVM currently behave when faced with a device marked as readonly ?
does it automatically switch itself in readonly mode?

according to some tests carried out in a virtual machine, it seems that it 
doesn't and that LVM modifies the disk/partition(s) even though they are all 
readonly (chmod 444 && blockdev --setro).



Hi

There is no extra logic around RO devices in lvm2.  When lvm2 succeeds opening 
device in write mode, it'll use it for writing.


Also note - when you 'activate' a LV in read-write mode - someone opens such 
LV/device and you later on 'lvchange' such active LV to read-only mode - all 
writers will keep writing to such device.


It's not quite clear which kind of problem you are actually hitting - so maybe 
adding some more descriptive  environment +  logs  might give more info about 
your individual case.


Note: root admin typically can overwrite any 'mild' protections...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] The feasibility of implementing an alternative snapshot approach

2023-01-09 Thread Zdenek Kabelac

Dne 09. 01. 23 v 7:21 Zhiyong Ye napsal(a):

Hi Zdenek,

Thank you for your detailed answer.

For the thin snapshot I will use the latest version of kernel and lvm for 
further testing. I want to use both snapshot methods (thin and thick) in the 
production environment. But if the thick snapshot is only still in the 
maintenance phase, then for thick lv I have to see if there is any other way 
to accomplish the snapshot function.


FYI - there are still some delays with up-streaming of the latest improvement 
patches - so stay tuned for further speedup gains & IO throughput with thin 
provisioning)


By the maintenance phase for old thick snapshot I mean - the development of 
the existing thick snapshot target is basically done - the format is very 
ancient and cannot be changed without major rewrite of the whole snapshot 
target as such - and that's what we've made with newly introduced 
thin-provisioning target which addressed many shortcomings of the old 
dm-snapshot target.


I use lvm mainly for virtualized environments. Each lv acts as a block device 
of the virtual machine. So I also consider using qemu's own snapshot feature. 
When qemu creates a snapshot, the original image used by the virtual machine 
becomes read-only, and all write changes are stored in the new snapshot. But 
currently qemu's snapshots only support files, not block devices.


Depending on the use-case it might matter to pick the best fitting chunk-size.
i.e. if the changes are 'localized' in the filesystem areas to match thin-pool 
chunks (also selection of the filesystem itself might be part of equation 
here) - even if you use snapshots a lot, you may eventually get better result 
with bigger chunks like 128k or even 256k size instead of default 64K.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] The feasibility of implementing an alternative snapshot approach

2023-01-06 Thread Zdenek Kabelac

Dne 04. 01. 23 v 17:12 Zhiyong Ye napsal(a):

Hi Zdenek,

Thank you for your reply.

Snapshots of thinlv are indeed more efficient compared to standard lv, this is 
because data blocks can be shared between snapshot and original thinlv. But 
there is also a performance loss after thinlv creates a snapshot. This is 
because the first write to the snapshotted thinlv requires not only allocating 
a new chunk but also copying the old data.


Here are some performance data and a discussion of the thinlv snapshot:

https://listman.redhat.com/archives/linux-lvm/2022-June/026200.html



Well that's our current  'state-of-the-art' solution.

Make sure you are using latest kernels for your performance testing - there 
have been several improvements around the locking (6+ kernels) - but if this 
still not good enough for your case you might need to seek for some other 
solutions (although would be nice to know who handles this task better).


Definitely the old 'thick-snapshot' is mostly in maintenance phase and it's 
usability (and its design) is limited for some short living temporary 
snapshoting (i.e. you are making backup and after completing your backup of 
the filesystem you remove your temporary snapshot - it's been never designed 
to be used for multi-level multi-GiB snapshots - this will not fly...


When you use thin snapshots - make sure your metadata LV is located on your 
fast device and you use best fitting chunksize.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] The feasibility of implementing an alternative snapshot approach

2023-01-04 Thread Zdenek Kabelac

Dne 04. 01. 23 v 9:00 Zhiyong Ye napsal(a):

Dear all,

The current standard lv implementation of snapshots is COW (Copy-on-write), 
which creates snapshots very quickly. However, the first write performance of 
the original lv will be poor after creating a snapshot because of COW. 
Moreover, the more snapshots there are, the worse the performance of the 
original lv will be.


I tested the random read/write performance when the original lv was created 
with different number of snapshots. The data is shown below:

Number of snapshots  Randread(iops)  Randwrite(iops)
     0    21989   22034
     1    10048   10041
     2    6770    6773
     3    5375    5378

There are scenarios where the performance of the original lv is more 
demanding, and the speed of snapshot creation is not as strong a requirement. 
Because it is the original lv that will actually be used, and the snapshot is 
only a secondary function. Therefore snapshots using the COW approach will not 
meet the needs of this scenario.


Therefore, is it feasible to implement another way of taking snapshots? Let's 
say the first snapshot is created as a full snapshot, and all subsequent 
snapshots are based on incremental data from the previous snapshot.


Hi

Have you played with thin provisioning - as that's the answer to the slow 
snapshots.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm-vdo, snapshots and cache

2022-11-16 Thread Zdenek Kabelac

Dne 16. 11. 22 v 15:38 Gionatan Danti napsal(a):

Il 2022-11-16 11:50 Zdenek Kabelac ha scritto:

Well - as said - vg on vg is basically equivalent of the original LV
on top of VDO manager.


Hi Zdenek,
it seems clunkier to manage two nested VG/LV if you ask me.


IMHO not different from lvm on top of vdo managed volume.
It should be mostly equal (and in few cases possibly even easier or at least 
more OS friendly)





But still these fast snapshot do not solve you a problem of double
out-of-space fault.


Yeah, but available space requires constant monitor even when dealing with 
lvmthin.



It's good for experimenting - but I'd not suggest to use this setup.


In RHEL7 it was specifically supported - see here[1]:

"As a general rule, you should place certain storage layers under VDO and 
others on top of VDO:

Under VDO: DM-Multipath, DM-Crypt, and software RAID (LVM or mdraid).
On top of VDO: LVM cache, LVM snapshots, and LVM Thin Provisioning"

For a practical example: in the past I experimented with a storage stack 
composed of raid -> vdo -> lvmthin -> xfs serving qemu/kvm virtual disk files. 
The lvmthin layer enabled rolling snapshots, and was a key feature of the test 
setup.


Now the only method to replicate that is to have two nested LVM instances, one 
running vdo and one lvmthin - right?




Well - it's not difficult 'coding' to enable using 'VDO LV' as thinpool data 
LV, however we currently lack a lot on support for 'VDO LV' recovery alone in 
lvm2 - so until this will get polished - we try to avoid shot ourself to our 
foots with enabling these complicated stacks ATM :)


We also have some ideas for different snapshoting inside VDO - so we will see 
how that will go - it looks more supportable then combining VDO+THIN together.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvm-vdo, snapshots and cache

2022-11-16 Thread Zdenek Kabelac

Dne 15. 11. 22 v 23:41 Gionatan Danti napsal(a):

Il 2022-11-15 22:56 Zdenek Kabelac ha scritto:

You could try  'vg' on top of another 'vg' - however I'd not recommend
to use it this way (and it's unssuported () by lvm2 in
general)


Hi Zdenek,
yeah, I would strongly avoid that outside lab testing.


IMHO I'd not recommend to combine 2 provisioning technologies - it's
already hard to resolve 'out-of-space' troubles with just one
technology...


The issue is that fast snapshots are only provided by dm-thin, meaning that 
lvmthin is almost mandatory for implementing a rolling/continuous snapshot 
scheme. And, on a more basic level, if we can not stack LVM on top of a VDO LV 
(in a supported manner), how can we take a snapshot of user data residing on 
that specific volume?


Well - as said - vg on vg is basically equivalent of the original LV on top of 
VDO manager.


But still these fast snapshot do not solve you a problem of double 
out-of-space fault.



This seems a clear regression vs a separate VDO device. Am I missing something?


It's good for experimenting - but I'd not suggest to use this setup.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm-vdo, snapshots and cache

2022-11-15 Thread Zdenek Kabelac

Dne 15. 11. 22 v 18:42 Gionatan Danti napsal(a):

Dear all,
as previous vdo utils are gone in RHEL9, VDO volumes must be created via 
lvm-vdo and associated lvm commands.


What is not clear to me is how to combine lvm-vdo with lvmthin (for fast CoW 
snapshots) and/or lvmcache (for SSD caching of an HDD pool). For example, with 
the old /dev/mapper/vdo style volumes (ie: the one created via vdo create) one 
can do something as physical_dev -> vdo -> lvmthin -> lvmcache.


How to do the same with lvm-vdo?



Hi

You could try  'vg' on top of another 'vg' - however I'd not recommend to use 
it this way (and it's unssuported () by lvm2 in general)


We aim to support multiple VDO LV within a single VDOPOOL - will hopefully 
happen relatively soon  (so it should work like with thin pools)


IMHO I'd not recommend to combine 2 provisioning technologies - it's already 
hard to resolve 'out-of-space' troubles with just one technology...


We will see if the 'VDO thick snapshot' will also go with multiple VDO LV 
support - likely not - but will be added later on.


Caching should be already support - on VDO LV as well as on VDOPOOL LV (both 
work somewhat differently).


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [EXTERNAL] Re: LVM2 : performance drop even after deleting the snapshot

2022-10-20 Thread Zdenek Kabelac

Dne 17. 10. 22 v 15:41 Erwin van Londen napsal(a):
 From the looks of it the disk, as provisioned out of an Azure pool, is likely 
backed by an enterprise raid array. When you provision the pools with 
  discard_passdown the removal of the snapshot will also be pushed down to the 
underlying hypervisor or disk array. You would need to wait till that process 
is completed in order to make any comparisons.


ThinVolGrp-ThinDataLV-tpool: 0 1006632960 thin-pool 1 4878/4145152 
8325/7864320 - rw discard_passdown queue_if_no_space - 1024


As per man page

--discards passdown|nopassdown|ignore
Specifies how the device-mapper thin pool layer in the kernel should handle 
discards. ignore causes the thin pool to ignore discards. nopassdown causes the
thin pool to process discards itself to allow reuse of unneeded extents in the 
thin pool. passdown causes the thin pool to process discards itself (like

nopassdown) and pass the discards to the underlying device.

Try the same operation after changing the thin volume

lvchange --discards nopassdown VG/ThinPoolLV


Discard here is likely irrelevant - since there will likely no blocks for 
discarding.


When the user removes thin LV  (which happens to be sharing its block with 
some other thin LV  (origin -> snapshot)) there is just some metadata update 
reducing sharing of blocks with origin thinLV - so nothing to be discard for 
data (since snapshot is removed after its creation without any use - only if 
the origin would be meanwhile in this short period of time changed 
dramatically - then exclusively owned parts of such snapshot may be discarded)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] [EXTERNAL] Re: LVM2 : performance drop even after deleting the snapshot

2022-10-18 Thread Zdenek Kabelac

Dne 18. 10. 22 v 5:33 Pawan Sharma napsal(a):

Hi Zdenek,

I would like to highlight one point here is that we are creating and then 
deleting the snapshot immediately without writing anything anywhere. In this 
case, we are expecting the performance to go back to what it was before taking 
the thin snapshot. Here we are not getting the original performance after 
deleting the snapshot. Do you know any reason why that would be happening.


As explained in my previous post - with thin-provisioning - you are getting 
metadata updates for bTrees - thus there is no 'revert' to previous 'metadata 
state' - there is rolling update of bTrees which is by design 'seek 
unfriendly' - so for the performance hunting users the use of SSD/NVMe type 
storage for these metadata volumes is basically a must (and it's been designed 
for that).


The old 'thick' snapshot where you allocate explicit COW LV storage is going 
to give here your expected behavior - however you will (of course) loose all 
the benefits you get with thin-pools.


With thin-pool (as also mentioned in my previous post) - if you can't afford 
dedicated low-latency storage - you need to scale-up chunk size - so the 
amount of metadata updates is reduced together (lowering seeking). I'm afraid 
you can't expect much more in the near future.


FYI there is to be merged in the upcoming kernel this patch set:

https://listman.redhat.com/archives/dm-devel/2022-October/052367.html

which should also help a lot with multithreaded load on thin-pools

There is also some new metadata format being experimented with - but whether 
this will also tackle anything in the seek friendlier logic is hard to tell...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [EXTERNAL] Re: LVM2 : performance drop even after deleting the snapshot

2022-10-17 Thread Zdenek Kabelac

Dne 14. 10. 22 v 21:31 Mitta Sai Chaithanya napsal(a):

Hi Zdenek Kabelac,
   Thanks for your quick reply and suggestions.

We conducted couple of tests on Ubuntu 22.04 and observed similar performance 
behavior post thin snapshot deletion without writing any data anywhere.


*Commands used to create Thin LVM volume*:
- lvcreate  -L 480G --poolmetadataspare n --poolmetadatasize 16G 
--chunksize=64K --thinpool  ThinDataLV ThinVolGrp

- lvcreate -n ext4.ThinLV -V 100G --thinpool ThinDataLV ThinVolGrp



Hi

So now it's clear you are talking about thin snapshots - this is a very 
different story going on here (as we normally use term "COW" volumes for thick 
old snapshots)


I'll consult more with thinp author - however it does look to me you are using 
same device to store  data & metadata.


This is always a highly sub-optimal solution - the metadata device is likely 
best to be stored on fast (low latency) devices.


So my wild guess - you are possibly using rotational device backend to store 
your  thin-pools metadata volume and then your setups gets very sensitive on 
the metadata fragmentation.


Thin-pool was designed to be used with SSD/NVMe for metadata which is way less 
sensitive on seeking.


So when you 'create' snapshot - metadata gets updated - when you remove thin 
snapshot - metadata gets again a lots of changes (especially when your origin 
volume is already populated) - and fragmentation is inevitable and you are 
getting high penalty of holding metadata device on the same drive as your data 
device.


So while there are some plans to improve some metadata logistic - I'd not 
expect miracles on you particular setup - I'd highly recommend to plug-in some 
 SSD/NVMe storage for storing your thinpool metadata - this is the way to go 
to get better 'benchmarking' numbers here.


For an improvement on your setup - try to seek larger chunk size values where 
your data 'sharing' is still reasonably valuable - this depends on data-type 
usage - but chunk size 256K might be possibly a good compromise (with disabled 
zeroing - if you hunt for the best performance).



Regards

Zdenek

PS: later mails suggest you are using some 'MS Azure' devices?? - so please 
redo your testing with your local hardware/storage - where you have precise 
guarantees of storage drive performance - testing in the Cloud is random by 
design


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM2 : performance drop even after deleting the snapshot

2022-10-13 Thread Zdenek Kabelac

Dne 13. 10. 22 v 8:53 Pawan Sharma napsal(a):

adding this to lvm-devel mailing list also.

Regards,
Pawan
--
*From:* Pawan Sharma
*Sent:* Wednesday, October 12, 2022 10:42 PM
*To:* linux-lvm@redhat.com 
*Cc:* Mitta Sai Chaithanya ; Kapil Upadhayay 


*Subject:* LVM2 : performance drop even after deleting the snapshot
Hi Everyone,


We are evaluating lvm2 snapshots and doing performance testing on it. This is 
what we are doing :


 1. dump some data to lvm2 volume (using fio)
 2. take the snapshot
 3. delete the snapshot (no IOs anywhere after creating the snapshot)
 4. run the fio on lvm2 volume

Here as you can see, we are just creating the snapshot and immediately 
deleting it. There are no IOs to the main volume or anywhere. When we run the 
fio after this (step 4) and we see around 50% drop in performance with 
reference to the number we get in step 1.


It is expected to see a performance drop if there is a snapshot because of the 
COW. But here we deleted the snapshot, and it is not referring to any data 
also. We should not see any performance drop here.


Could someone please help me understand this behavior. Why are we seeing the 
performance drop in this case? It seems like we deleted the snapshot but still 
it is not deleted, and we are paying the COW penalty.


System Info:

OS : ubuntu 18.04
Kernel : 5.4.0

# lvm version
LVM version:2.02.176(2) (2017-11-03)
Library version: 1.02.145 (2017-11-03)
Driver version:4.41.0

We also tried on latest ubuntu with newer version of LVM. We got the same 
behavior.





Hi

Debugging  5 year old software is likely not going to get lot of attention 
from upstream.


So please:

a) reproduce the issue with some recent  kernel & lvm2
b) take   'dmsetup table && dmsetup status'  before you run every 'fio' test 
and present here your result in some form - otherwise we can hardly see what 
is the problem.



What should be expected - if you use old/thick snapshots - when you 'drop' 
snapshot - you have your original intact LV - so results should mostly match 
results before you take the snapshot - but you clearly have to take into 
account if you use some 'SSD/NVMe' discarding and other things - so always run 
series of tests and average your results.


If you use  thin snapshot - that you can get various results depending on your 
settings of thin chunks, discard usage.


Also maybe try your benchmark with different filesystems...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions

2022-09-29 Thread Zdenek Kabelac

Dne 29. 09. 22 v 13:15 Roberto Fastec napsal(a):

Hello Zdenek
Thank you for the explanation

May I kindly ask you what/which is the command line API to access and 
manipulate those metadata?




'command line API' in the mean of:

To create LV --   'lvcreate'
To remove LV --   'lvremove'


Note - many command can actually work without physical interaction with DM 
layer  (--driverloaded n) - however in some case some targets require presence 
of DM.


lvm2 commands are the way how to change your metadata properly.


And when you say vi editor, do you kindly mean direct edit of HEX values on 
the raw metadata?


No way -  you can't change metadata on disk - unless you would be basically 
precisely copying what lvm2 command does  -  so what would be the point ??


Simply use lvm2 command to make the job.  Unless I'm missing some important 
point why would you need to work  with lvm2 metadata but without lvm2 ??





Thank you

If you kindly may have some link to some documentation, thank you even more

Though here it is not the configuration that got lost


Well yeah - it will take some time - but i.e. RHEL storage documentation might 
be a good way to go through it.




Also, additional info, we now got that all the cases do have active the 
thin-provisionin and looks like that these are additional/different metadata 
tables


This-provisioning is handled by LVM2 only to provide  LVs for metadata and 
data LVs - and then the  thinLVs to a user.


Physical block layout for thin-provisioning is fully stored inside 
thin-pool's metadata device.


To explore those mappings you need to use tools like 'thin_dump', 'thin_ls'



So if these got messed/corrupted...



If these thin-pool metadata get corrupted, there is tool: 'thin_repair'.

Note: corruption of some high-level bTree nodes may result a severe damage to 
whole metadata structure ->  i.e. lots of thinLVs being lost.


It's a good idea to keep such metadata on some resilient type of storage 
(raid) and of course  rule #1  - create regular backups of your thin 
volumes...   (snapshot of thinLV is not a backup!).



In QNAP looks they have made some customization and so thin-provision LVM 
metadata are on a dedicated partition


we observed the HEX inside there and got partially the logic

About thin-provisioning, again, any "fsck"-like is available? (I suppose no, 
but just as confirmation)


This tool is called  'thin_check'

(and this tool is in fact executed with every thin-pool activation & 
deactivation by default by lvm2)


Note: just like with lvm2 metadata - also thin-pool's kernel metadata are 
check-summed (protected agains disc bit corruptions), so again zero chance 
with any 'hex-editor' to manipulate them - unless you would 'recreate' 
thin-pool engine...



Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] How to change default system dir

2022-09-29 Thread Zdenek Kabelac

Dne 28. 09. 22 v 17:16 Bartłomiej Błachut napsal(a):

Hi,
I have a question for you, because what I need is to change in my 
ubuntu(22.04) the localisation of default-system-dir from /etc/lvm to a place 
where I have write access e.g. /home/my_user/new_lvm_dir. I've tried so far to 
provide to ./configure option 
--with-default-system-dir=/home/my_user/new_lvm_dir or later I replace 
everywhere in code pattern /etc/lvm to /home/my_user/new_lvm_dir but without 
any success. Are you able to give me any idea how I can do it ?


I tried it on main/stable branches git://sourceware.org/git/lvm2.git 





Hi


lvm2 basically always require a 'root' access for any 'real usable' lvm2 
command - so saying you don't have write access to /etc  looks very strange to 
start with.


We have also configure option  --with-confdir=[/etc]
but I still think you should start with explaining what are you actually 
trying to do - as maybe there is some design issue??



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions

2022-09-29 Thread Zdenek Kabelac

Dne 27. 09. 22 v 12:10 Roberto Fastec napsal(a):

Dear friends of the LVM mailing list

I suppose this question is for some real LVM2 guru or even developer

Here I kindly make three question with three premises

premises
1. I'm a total noob about LVM2 low level logic, so I'm sorry of the questions 
will sound silly :-)
2. The following applies to a whole md RAID (in my example it will be a RAID5 
made of 4 drives 1TB each so useful available space more or less 2.7TB)
3. I assign whole those 2.7TB to one single PV and one single VG and one 
single LV.


questions
1. Given the premise 3. The corresponding LVM2 metadata/tables are and will be 
just a (allow me the term) "grid" "mapping that space" in an ordered sequence 
to in the subsequent use (and filling) of the RAID space "just mark" the used 
ones and the free ones? Or those grid cells will/could be in a messed order ?
And explicitly I mean. In case of metadata corruption (always with respect of 
premise 3.) , could we just generate a dummy metadata table with all the 
extents marked as "used" in such a way that we can anyway access them

And can we expect to have them ordered?


lvm2  'metadata handling'  is purely internal to the lvm2 codebase - you can't 
rely on any 'witnessed/observed' logic.


There is cmdline API to access and manipulate metadata in most cases.

Temporarily you can i.e. update/modify your current metadata with 'vi' editor 
and vgcfgrestore them - however this is not a 'guaranteed' operational mode - 
rather a workaround if the 'cmdline' interface is not handling some error case 
well - and it should be used as  RFE to enhance lvm2 in such case.




2. Does it exist a sort of "fsck" for the LVM2 metadata ? We do technical 
assistance and recently, specifically with those NAS devices that make use of 


In general - lvm2 metadata on disk always do have CRC32  checksum - when 
invalid -> metadata is garbage.


Each loaded CRC32 correct metadata is always then fully validated - yep it can 
be sometimes a bit costly in the case of very large metadata size - but so far 
- no big problems -  CPUs are mostly getting faster as well...  so bigger 
setups tends to have also powerful hw


LVM2, we have experienced really easy metadata corruption in occurence of just 
nothing or because of a electric power interruption (which is really 
astonishing). We mean no drives failures , no bad SMARTs . Just corruption 
from "nowhere" and "nocause"



Corrupted metadata are always considered unusable - user has to restore to 
previous valid version (and here sometimes all the combinations of error might 
eventually require  'vi editor' assistance - but again - in very very unusual 
circumstances.


Metadata are archived in /etc/lvm/archive and  they are also in ring-buffer 
present on all PVs in a VG  -  if there are too many PVs - user can 'opt-out' 
and consider only a subset of PVs to hold metadata - i.e.  200PVs - and only 
20PVs holding metadata - but these are highly unusual configurations...


Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvcreate hangs forever during snapshot creation when suspending volume

2022-08-29 Thread Zdenek Kabelac

Dne 29. 08. 22 v 0:38 Thomas Deutschmann napsal(a):

Hi,

just for the records: My problem is now resolved.

As expected, my problem was caused by a bug in kernel. A patch was already 
merged into mainline: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e053aaf4da56cbf0afb33a0fda4a62188e2c0637 



Backport for 5.15 and 5.19 is available but pending, 
https://www.spinics.net/lists/stable/msg588671.html.


Read https://lore.kernel.org/all/000401d8a746$3eaca200$bc05e600$@whissi.de/ 
for all details.


Thank you all for your help!




Hi

Great thanks for the follow-up and good investigation work on this issue.

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 19:35 Gionatan Danti napsal(a):

Il 2022-08-17 17:26 Zdenek Kabelac ha scritto:

I like the general idea of the udev watch. It is the magic that causes
newly created partitions to magically appear in the system, which is


Would disabling the watch rule be a reasonable approach in this case? If the 
user want to scan a new device, it only needs to issue partprobe or kpartx, or 
am I missing something?


Before diving into these 'deep waters'  - I'd really like to first see if the 
problem is still such an issue with our upstream code base.


There have been lot of minor optimization committed over the time - so the 
amount of fired 'watch' rules should be considerably smaller then the version 
mentioned in customer issue.



There is on going  'SID' project - that might push the logic somewhat
further, but existing 'device' support logic as is today is
unfortunate 'trace' of how the design should not have been made - and
since all 'original' programmers left the project long time ago - it's
non-trivial to push things forward.


Well, this is not good news. Just for my education, it is possibile to run a 
modern linux distro without udev at all? I still remember when the new cool 
thing for device autodiscovery was devfs (with some distro - like gentoo - 
taking the alternative approach to simply tarrig & untarring much of the 
entire /dev/ files to prepopulate the major+minor number... >

We just hope the SID will make some progress (although probably small
one at the beginning).


Any info on the project?


https://github.com/prajnoha/sid


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 15:41 Martin Wilck napsal(a):

On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:

Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):


Let's make clear we are very well aware of all the constrains
associated with
udev rule logic  (and we tried quite hard to minimize impact -
however udevd
developers kind of 'misunderstood'  how badly they will be impacting
system's
performance with the existing watch rule logic - and the story kind
of
'continues' with  'systemd's' & dBus services unfortunatelly...


I dimly remember you dislike udev ;-)


Well it's not 'a dislike' from my side - but the architecture alone is just 
missing in many areas...


Dave is a complete disliker of udev & systemd all together :)




I like the general idea of the udev watch. It is the magic that causes
newly created partitions to magically appear in the system, which is


Tragedy of design comes from the plain fact that there are only 'very 
occasional' consumers of all these 'collected' data - but gathering all the 
info and keeping all of it 'up-to-date' is getting very very expensive and can 
basically 'neutralize' a lot of your CPU if you have too many resources to 
watch and keep update.




very convenient for users and wouldn't work otherwise. I can see that
it might be inappropriate for LVM PVs. We can discuss changing the
rules such that the watch is disabled for LVM devices (both PV and LV).


It's really not fixable as is - since of the complete lack of 'error' handling 
of device in udev DB (i.e. duplicate devices, various frozen devices...)


There is on going  'SID' project - that might push the logic somewhat further, 
but existing 'device' support logic as is today is unfortunate 'trace' of how 
the design should not have been made - and since all 'original' programmers 
left the project long time ago - it's non-trivial to push things forward.



I don't claim to overlook all possible side effects, but it might be
worth a try. It would mean that newly created LVs, LV size changes etc.
would not be visible in the system immediately. I suppose you could
work around that in the LVM tools by triggering change events after
operations like lvcreate.


We just hope the SID will make some progress (although probably small one at 
the beginning).




However let's focus on 'pvmove' as it is potentially very lengthy
operation -
so it's not feasible to keep the  VG locked/blocked  across an
operation which
might take even days with slower storage and big moved sizes (write
access/lock disables all readers...)


So these close-after-write operations are caused by locking/unlocking
the PVs?

Note: We were observing that watch events were triggered every 30s, for
every PV, simultaneously. (@Heming correct me if I'mn wrong here).


That's why we would like to see 'metadata' and also check if the issue is 
appearing on the latest version of lvm2.



Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):

On Wed, 2022-08-17 at 18:47 +0800, Heming Zhao wrote:

On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:




ATM I'm not even sure if you are complaining about how CPU usage of
lvmpolld
or just huge udev rules processing overhead.


The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE
action
which is the trigger.


Let's be clear here: every close-after-write operation triggers udev's
"watch" mechanism for block devices, which causes the udev rules to be
executed for the device. That is not a cheap operation. In the case at
hand, the customer was observing a lot of "multipath -U" commands. So
apparently a significant part of the udev rule processing was spent in
"multipath -U". Running "multipath -U" is important, because the rule
could have been triggered by a change of the number of available paths
devices, and later commands run from udev rules might hang indefinitely
if the multipath device had no usable paths any more. "multipath -U" is
already quite well optimized, but it needs to do some I/O to complete
it's work, thus it takes a few milliseconds to run.

IOW, it would be misleading to point at multipath. close-after-write
operations on block devices should be avoided if possible. As you
probably know, the purpose udev's "watch" operation is to be able to
determine changes on layered devices, e.g. newly created LVs or the
like. "pvmove" is special, because by definition it will usually not
cause any changes in higher layers. Therefore it might make sense to
disable the udev watch on the affected PVs while pvmove is running, and
trigger a single change event (re-enabling the watch) after the pvmove
has finished. If that is impossible, lvmpolld and other lvm tools that
are involved in the pvmove operation should avoid calling close() on
the PVs, IOW keep the fds open until the operation is finished.


Hi

Let's make clear we are very well aware of all the constrains associated with 
udev rule logic  (and we tried quite hard to minimize impact - however udevd 
developers kind of 'misunderstood'  how badly they will be impacting system's 
performance with the existing watch rule logic - and the story kind of 
'continues' with  'systemd's' & dBus services unfortunatelly...


However let's focus on 'pvmove' as it is potentially very lengthy operation - 
so it's not feasible to keep the  VG locked/blocked  across an operation which 
might take even days with slower storage and big moved sizes (write 
access/lock disables all readers...)


So the lvm2 does try to minimize locking time. We will re validate whether 
just necessary  'vg updating' operation are using 'write' access - since 
occasionally due to some unrelated code changes it might eventually result 
sometimes in unwanted 'write' VG open - but we can't keep the operation 
blocking  a whole VG because of slow udev rule processing.


In normal circumstances udev rule should be processed very fast - unless there 
is something mis-designe causing a CPU overloading.


But as mentioned already few times - without more knowledge about the case we 
could hardly guess exact reasoning.  But we already provided useful suggestion 
how to reduce number of 'processed' device by udev by reduction of 'lvm2 
metadata PVs'  - the big reason for frequent metadata upsate would be a big 
segmentation of LV - but this we will not know without seeing user's 
'metadata' of a VG in this case...



Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 12:47 Heming Zhao napsal(a):

On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:

Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):

On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:

Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):

On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):

Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
is triggered by pvmove.


The machine connecting disks are more than 250. The VG has 103 PVs & 79 LVs.

# /sbin/vgs
   VG   #PV #LV #SN Attr   VSize   VFree
103  79   0 wz--n-  52t17t


Ok - so main issue could be too many  PVs with relatively high latency of 
mpath devices  (which could be all actually simulated easily in lvm2 test suite)



The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE action
which is the trigger.



I'll check lvmpolld whether it's using correct locking while checking for the 
operational state - you may possibly extend checking interval of polling 
(although that's where the mentioned patchset has been enhancing couple things)





If you have too many disks in VG  (again unclear how many there are paths
and how many distinct PVs) - user may *significantly* reduce burden
associated with metadata updating by reducing number of 'actively'
maintained metadata areas in VG - so i.e. if you have 100PVs in VG - you may
keep metadata only on 5-10 PVs to have 'enough' duplicate copies of lvm2
metadata within VG (vgchange --metadaatacopies X) - clearly it depends on
the use case and how many PVs are added/removed from a VG over the
lifetime


Thanks for the important info. I also found the related VG config from
/etc/lvm/backup/, this file shows 'metadata_copies = 0'.

This should be another solution. But why not lvm2 takes this behavior by
default, or give a notification when pv number beyond a threshold when user
executing pvs/vgs/lvs or pvmove.
There are too many magic switch, users don't know how to adjust them for
better performance.


Problem is always the same -  selecting right 'default' :) what suites to user 
A is sometimes  'no go' for user B. So ATM it's more 'secure/safe' to keep 
metadata with each PV - so when a PV is discovered it's known how the VG using 
such PV looks like.  When only fraction of PV have the info - VG is way more 
fragile on damage when disks are lost i.e. there is no 'smart' mechanism to 
pick disks in different racks


So this option is there for administrators that are 'clever' enough to deal 
with a new set of problems it may create for them.


Yes - lvm2 has lot of options - but that's what is usually necessary when we 
want to be capable to provide optimal solution for really wide variety of 
setups - so I think spending couple minutes on reading man pages pays off - 
especially if you had to spend 'days' on build your disk racks ;)


And yes we may add few more hints - but then we are asked by 'second' group of 
users ('skilled admins')  - why do we print so many dumb messages every time 
they do some simple operation :)



I'm busy with many bugs, still can't find a time slot to set up a env.
For this performance issue, it relates with mpath, I can't find a easy
way to set up a env. (I suspect it may trigger this issue by setting up
300 fake PVs without mpath, then do pvmove cmd.)



'Fragmented' LVs with small segment sizes my significantly raise the amount of 
metadata updates needed during pvmove operation as  each single LV segments 
will be mirrored by individual mirror.



Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):

On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:

Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):

On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):

Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
is triggered by pvmove.

On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):

Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a 
single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath 
devices.
   echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
/etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job 
will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
  2022--xxx  systemd-udevd[pid]: dm-179: Inotify event: 8 
for /dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update 
metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?


Hi

Please provide more info about lvm2  metadata and also some  'lvs -av'
trace so we can get better picture about the layout - also version of
lvm2,systemd,kernel in use.

pvmove is progressing by mirroring each segment of an LV - so if there would
be a lot of segments - then each such update may trigger udev watch rule
event.

But ATM I could hardly imagine how this could cause some 'dramatic'
performance decrease -  maybe there is something wrong with udev rules on
the system ?

What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it
tries to not eat all the disk bandwidth as such)

Regards
Zdenek


My mistake, I write here again:
The subject is wrong, not IO performance but cpu high load is triggered by 
pvmove.

There is no IO performance issue.

When system is connecting 80~200, the cpu load increase by 15~20, the
cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
times to the cores fully utilized.
In another word: a single pvmove process cost 5-6 (sometime 10) cores
utilization. It's abnormal & unaccepted.

The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.

BTW:
I change this mail subject from:  lvmpolld causes IO performance issue
to: lvmpolld causes high cpu load issue
Please use this mail for later discussing.



Hi

Could you please retest with recent version of lvm2. There have been
certainly some improvements in scanning - which might have caused in the
older releases some higher CPU usage with longer set of devices.

Regards

Zdenek


The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
version include the improvements change?
Could you mind to point out which commits related with the improvements?
I don't have the reproducible env, I need to get a little detail before
asking customer to try new version.




Please try to reproduce your customer's problem and see if the newer version
solves the issue.   Otherwise we could waste hours on theoretical
discussions what might or might not have helped with this problem. Having a
reproducer is a starting point for fixing it, if the problem is still there.

Here is one commit that may possibly affect CPU load:

d2522f4a05aa027bcc911ecb832450bc19b7fb57


Regards

Zdenek


I gave a little bit explain for the root cause in previous mail, And the
work around <2> also matchs my analysis.

The machine connects lots of LUNs. pvmove one disk will trigger lvm2
update all underlying mpath devices (80~200). I guess the update job is
vg_commit() which updates latest metadata info, and the metadata locates in
all PVs. The update job finished with close(2) which trigger hundreds
devices udevd IN_CLOSE_WRITE event. every IN_CLOSE_WRITE will trigger
mpathd udev rules (11-dm-mpath.rules) to start scanning devices. So the
real world will flooding hundreds of multipath processes, the cpus load
become high.



Your 'guess e

Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Zdenek Kabelac

Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):

On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):

Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
is triggered by pvmove.

On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):

Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a 
single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath 
devices.
  echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
   /etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job 
will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
 2022--xxx  systemd-udevd[pid]: dm-179: Inotify event: 8 
for /dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update 
metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?


Hi

Please provide more info about lvm2  metadata and also some  'lvs -av'
trace so we can get better picture about the layout - also version of
lvm2,systemd,kernel in use.

pvmove is progressing by mirroring each segment of an LV - so if there would
be a lot of segments - then each such update may trigger udev watch rule
event.

But ATM I could hardly imagine how this could cause some 'dramatic'
performance decrease -  maybe there is something wrong with udev rules on
the system ?

What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it
tries to not eat all the disk bandwidth as such)

Regards
Zdenek


My mistake, I write here again:
The subject is wrong, not IO performance but cpu high load is triggered by 
pvmove.

There is no IO performance issue.

When system is connecting 80~200, the cpu load increase by 15~20, the
cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
times to the cores fully utilized.
In another word: a single pvmove process cost 5-6 (sometime 10) cores
utilization. It's abnormal & unaccepted.

The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.

BTW:
I change this mail subject from:  lvmpolld causes IO performance issue
to: lvmpolld causes high cpu load issue
Please use this mail for later discussing.



Hi

Could you please retest with recent version of lvm2. There have been
certainly some improvements in scanning - which might have caused in the
older releases some higher CPU usage with longer set of devices.

Regards

Zdenek


The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
version include the improvements change?
Could you mind to point out which commits related with the improvements?
I don't have the reproducible env, I need to get a little detail before
asking customer to try new version.




Please try to reproduce your customer's problem and see if the newer version 
solves the issue.   Otherwise we could waste hours on theoretical discussions 
what might or might not have helped with this problem. Having a reproducer is 
a starting point for fixing it, if the problem is still there.


Here is one commit that may possibly affect CPU load:

d2522f4a05aa027bcc911ecb832450bc19b7fb57


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-16 Thread Zdenek Kabelac

Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):

Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
is triggered by pvmove.

On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:

Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):

Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a 
single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath 
devices.
 echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
  /etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job 
will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
2022--xxx  systemd-udevd[pid]: dm-179: Inotify event: 8 for 
/dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update 
metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?


Hi

Please provide more info about lvm2  metadata and also some  'lvs -av'
trace so we can get better picture about the layout - also version of
lvm2,systemd,kernel in use.

pvmove is progressing by mirroring each segment of an LV - so if there would
be a lot of segments - then each such update may trigger udev watch rule
event.

But ATM I could hardly imagine how this could cause some 'dramatic'
performance decrease -  maybe there is something wrong with udev rules on
the system ?

What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it
tries to not eat all the disk bandwidth as such)

Regards
Zdenek


My mistake, I write here again:
The subject is wrong, not IO performance but cpu high load is triggered by 
pvmove.

There is no IO performance issue.

When system is connecting 80~200, the cpu load increase by 15~20, the
cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
times to the cores fully utilized.
In another word: a single pvmove process cost 5-6 (sometime 10) cores
utilization. It's abnormal & unaccepted.

The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.

BTW:
I change this mail subject from:  lvmpolld causes IO performance issue
to: lvmpolld causes high cpu load issue
Please use this mail for later discussing.



Hi

Could you please retest with recent version of lvm2. There have been certainly 
some improvements in scanning - which might have caused in the older releases 
some higher CPU usage with longer set of devices.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvmpolld causes IO performance issue

2022-08-16 Thread Zdenek Kabelac

Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):

Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a 
single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath 
devices.
echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
 /etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job 
will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
   2022--xxx  systemd-udevd[pid]: dm-179: Inotify event: 8 for 
/dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update 
metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?


Hi

Please provide more info about lvm2  metadata and also some  'lvs -av' 
trace so we can get better picture about the layout - also version of 
lvm2,systemd,kernel in use.


pvmove is progressing by mirroring each segment of an LV - so if there would 
be a lot of segments - then each such update may trigger udev watch rule event.


But ATM I could hardly imagine how this could cause some 'dramatic' 
performance decrease -  maybe there is something wrong with udev rules on the 
system ?


What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it 
tries to not eat all the disk bandwidth as such)



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Creating/restoring snapshots in early userspace

2022-08-08 Thread Zdenek Kabelac

Dne 08. 08. 22 v 20:14 cd napsal(a):



Dne 07. 08. 22 v 22:38 cd napsal(a):


Hello,

I have created some scripts which runs in the initramfs during the boot 
process. Very specifically, it's an initcpio runtime hook 
https://man.archlinux.org/man/mkinitcpio.8#ABOUT_RUNTIME_HOOKS
Question: Is this a supported environment which to create/restore snapshots?

When my script runs lvm lvconvert --merge 
testvg/lvmautosnap-root-1659902622-good
it appears to succeed (exit code is 0, and the restore appears to work 
properly). However, the following warnings appear in stderr as part of the 
restore process:

/usr/bin/dmeventd: stat failed: No such file or directory
WARNING: Failed to unmonitor testvg/lvmautosnap-root-1659902622-good.
/usr/bin/dmeventd: stat failed: No such file or directory



Hi

Your initramfs likely needs to contain 'modified' version of your system's
lvm.conf where 'monitoring' will be disabled (set to 0) - as you do not want
to start your monitoring while you are operating in your ramdisk.

Once you flip to your rootfs with your regular /etc/lvm/lvm.conf - you need
to start monitoring of you activated LVs (vgchange --monitor y)


Merging of volume testvg/lvmautosnap-root-1659902622-good started.
/run/lvm/lvmpolld.socket: connect failed: No such file or directory



Again a thing you do not want to run in your ramdisk - lvmpolld is another
service/daemon you should run while you are in your rootfs.

fully removed.


And I get similar errors when trying to create new volumes with lvm lvcreate 
--permission=r --snapshot --monitor n --name my_snapshot
/usr/bin/dmeventd: stat failed: No such file or directory

In summary, I'm happy to just ignore the warning messages. I just want to make 
sure I'm not risking the integrity of the lvm volumes by modifying them during 
this part of the boot process.



It looks like you are trying to do something in your ramdisk you really should
be doing once you flip to your rootfs - ramdisk is purely meant to be used
to get things 'booting' and flip to rootfs ASAP - doing things in your
ramdisk which is really not a 'working environment' sounds like you are
asking yourself for some big troubles with resolving error paths (i.e. using
unmonitored devices like 'snapshot/mirror/raids/thin...' for longer period of
time is simply 'bad design/plan' - switch to rootfs should happen quickly
after you initiate things in your initramdfs...

Regards

Zdenek


Thanks for the insightful response. Indeed, setting monitoring = 0 in lvm.conf 
makes the warning messages go away. Interestingly, on arch, the initcpio hook 
for lvm2 _does_ attempt to set this setting:
with sed -i '/^\smonitoring =/s/1/0/' "${BUILDROOT}/etc/lvm/lvm.conf"
https://github.com/archlinux/svntogit-packages/blob/packages/lvm2/trunk/lvm2_install#L38

However, the sed pattern fails to match because the line is commented out in 
lvm.conf.
I've filed a bug with arch to address this: 
https://bugs.archlinux.org/task/75552?project=1=lvm2



Yep - I think there was a similar issue with Dracut.
It's the side-effect result of making most of default settings as 'commented' 
- then this scripts stopped to work in such case.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Creating/restoring snapshots in early userspace

2022-08-08 Thread Zdenek Kabelac

Dne 07. 08. 22 v 22:38 cd napsal(a):

Hello,

I have created some scripts which runs in the initramfs during the boot 
process. Very specifically, it's an initcpio runtime hook 
https://man.archlinux.org/man/mkinitcpio.8#ABOUT_RUNTIME_HOOKS
Question: Is this a supported environment which to create/restore snapshots?

When my script runs lvm lvconvert --merge 
testvg/lvmautosnap-root-1659902622-good
it appears to succeed (exit code is 0, and the restore appears to work 
properly). However, the following warnings appear in stderr as part of the 
restore process:

/usr/bin/dmeventd: stat failed: No such file or directory
WARNING: Failed to unmonitor testvg/lvmautosnap-root-1659902622-good.
/usr/bin/dmeventd: stat failed: No such file or directory


Hi

Your initramfs likely needs to contain  'modified' version of your system's 
lvm.conf where 'monitoring'  will be disabled  (set to 0) - as you do not want 
to start your monitoring while you are operating in your ramdisk.


Once you flip to your rootfs with your regular /etc/lvm/lvm.conf  - you need 
to start monitoring of you activated LVs  (vgchange --monitor y)



Merging of volume testvg/lvmautosnap-root-1659902622-good started.
/run/lvm/lvmpolld.socket: connect failed: No such file or directory


Again a thing you do not want to run in your ramdisk - lvmpolld is another 
service/daemon you should run while you are in your rootfs.


fully removed.


And I get similar errors when trying to create new volumes with lvm lvcreate 
--permission=r --snapshot --monitor n --name my_snapshot
  /usr/bin/dmeventd: stat failed: No such file or directory

In summary, I'm happy to just ignore the warning messages. I just want to make 
sure I'm not risking the integrity of the lvm volumes by modifying them during 
this part of the boot process.


It looks like you are trying to do something in your ramdisk you really should 
be doing once you flip to your rootfs  -   ramdisk is purely meant to be used 
to  get things  'booting' and flip to rootfs  ASAP -  doing things in your 
ramdisk which is really not a 'working environment'  sounds like you are 
asking yourself  for some big troubles with resolving error paths  (i.e. using 
unmonitored devices like 'snapshot/mirror/raids/thin...' for longer period of 
time is simply 'bad design/plan' - switch to rootfs should happen quickly 
after you initiate things in your initramdfs...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Problem with partially activate logical volume

2022-08-05 Thread Zdenek Kabelac

Dne 03. 08. 22 v 23:31 Ken Bass napsal(a):


That's pretty much it. Whenever any app attempts to read a block from the 
missing drive, I get the "Buffer I/O error" message. So, even though my 
recovery apps can scan the LV, marking blocks on the last drive as 
missing/unknown/etc., they can't display any recovered data - which I know 
does exist. Looking at raw data from the apps' scans, I can see directory 
entries, as well as files. I'm sure the inodes and bitmaps are still there for 
some of these, I just can't really reverse engineer and follow them through. 
But isn't that what the apps are supposed to do?


As mentioned by my previous email you shall *NOT* fix the partially activated 
device in-place - this will not lead to good result.


User should copy the content to some valid storage device with the same size 
as he tries to recover.


You can 'partially' activate device with  "zero"  filler instead of "error" 
(see the  lvm.conf setting: missing_stripe_filler="...") - this way you 
will just 'read' zero for missing parts.


Your another 2nd. option is to 'correct' the VG by filling  missing PV with a 
new one with preferable zeroed content - so you will not read 'random' garbage 
in places this new PV will fill the space after your missing PV.
Although even in this case - I'd still run  'fsck' on the snapshot created on 
top of such LV to give you another chance of recovery if you will pick a wrong 
answer  (since fsck might be 'quite' interactive when doing such large-scale 
repair)



Sorry I haven't replied sooner, but it takes a long time (days) to clone, then 
scan 16Tb...


So, please any suggestions are greatly appreciated, as well as needed.

ken

(I know: No backup; got burned; it hurts; and I will now always have backups. 
'Nuf said.)


Before you run your 'fsck' create a snapshot of your newly created 'backup' 
and make all the repair actions in the snapshots.


Once you are 'satisfied' with 'repaired'  filesystem you can then 'merge' 
snapshot back to your origin and use it.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvcreate hangs forever during snapshot creation when suspending volume

2022-08-02 Thread Zdenek Kabelac

Dne 02. 08. 22 v 5:01 Thomas Deutschmann napsal(a):

Hi,

Zdenek Kabelac wrote:

Now - you suggests you are able to reproduce this issue also on your bare
metal hw - in this case run these 3 commands  before  'lvcreate'



Zdenek Kabelac wrote:
Note: you could always 'experiment' without lvm2 in the picture -
you can ran   'fsfreeze --freeze|--unfreeze'  yourself - to see whether

even

this command is able to finish  ?


fsfreeze caused the same problem :/

I also changed filesystem from xfs to ext4 just in case... same issue.

For testing I stopped the MDRAID and removed one NVMe disk which I
cleared and where I created a new ext4 partition. Running

   $ fsfreeze --freeze /mnt/test

returned within seconds and I was unable do any I/O against /mnt/test
as expected.

I unfreezed the filesystem and started to copy ~50GB to the volume.
After waiting 5 minutes and verifying that /proc/meminfo didn't list
any 'dirty' pages, I re-run the fsfreeze command which caused the
same issue -- system hangs. :/

I will repeat the test with the other NVMe later...

So probably a kernel/driver issue or hardware problem.



Hi

So as guessed earlier - unrelated to lvm2.

You likely need to discover what is wrong with your 'raid' device ?
Was your raid array fully synchronized ?

Do you have only problem with one particular  MD 'raid' on your system - or 
any other 'raid' you attach/create will suffer the same problem ?


Is it 'nvme' related on your system ?

Are the 'individual' nvme  devices running fine - just when they are mixed 
together into a single array you get these  'fsfreeze' troubles ?


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvcreate hangs forever during snapshot creation when suspending volume

2022-08-01 Thread Zdenek Kabelac

Dne 01. 08. 22 v 19:29 Zdenek Kabelac napsal(a):

Dne 30. 07. 22 v 18:33 Thomas Deutschmann napsal(a):

Hi,

while trying to backup a Dell R7525 system running
Debian bookworm/testing using LVM snapshots I noticed that the system
will 'freeze' sometimes (not all the times) when creating the snapshot.
To recover from this, a power cycle is required.

Is this a problem caused by LVM or a kernel issue?

The command I run:

   /usr/sbin/lvcreate \
   -v \
   --size 100G \
   --snapshot /dev/mapper/devDataStore1-volMachines \
   --name volMachines_snap

The last 4 lines:
[Sat Jul 30 16:31:34 2022] debugfs: Directory 'dm-4' with parent 'block' 
already present!
[Sat Jul 30 16:31:34 2022] debugfs: Directory 'dm-7' with parent 'block' 
already present!
[Sat Jul 30 16:34:55 2022] INFO: task mariadbd:1607 blocked for more than 
120 seconds.

[Sat Jul 30 16:34:55 2022]   Not tainted 5.18.0-2-amd64 #1 Debian 5.18.5-1
[Sat Jul 30 16:34:55 2022] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Jul 30 16:34:55 2022] task:mariadbd    state:D stack:    0 pid: 
1607 ppid:  1289 flags:0x

[Sat Jul 30 16:34:55 2022] Call Trace:
[Sat Jul 30 16:34:55 2022]  
[Sat Jul 30 16:34:55 2022]  __schedule+0x30b/0x9e0
[Sat Jul 30 16:34:55 2022]  schedule+0x4e/0xb0
[Sat Jul 30 16:34:55 2022]  percpu_rwsem_wait+0x112/0x130
[Sat Jul 30 16:34:55 2022]  ? __percpu_rwsem_trylock.part.0+0x70/0x70
[Sat Jul 30 16:34:55 2022]  __percpu_down_read+0x5e/0x80
[Sat Jul 30 16:34:55 2022]  io_write+0x2e9/0x300
[Sat Jul 30 16:34:55 2022]  ? _raw_spin_lock+0x13/0x30
[Sat Jul 30 16:34:55 2022]  ? newidle_balance+0x26a/0x400
[Sat Jul 30 16:34:55 2022]  ? fget+0x7c/0xb0
[Sat Jul 30 16:34:55 2022]  io_issue_sqe+0x47c/0x2550
[Sat Jul 30 16:34:55 2022]  ? select_task_rq_fair+0x174/0x1240
[Sat Jul 30 16:34:55 2022]  ? hrtimer_try_to_cancel+0x78/0x110
[Sat Jul 30 16:34:55 2022]  io_submit_sqes+0x3ce/0x1aa0
[Sat Jul 30 16:34:55 2022]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[Sat Jul 30 16:34:55 2022]  ? wake_up_q+0x4a/0x90
[Sat Jul 30 16:34:55 2022]  ? __do_sys_io_uring_enter+0x565/0xa60
[Sat Jul 30 16:34:55 2022]  __do_sys_io_uring_enter+0x565/0xa60
[Sat Jul 30 16:34:55 2022]  do_syscall_64+0x3b/0xc0
[Sat Jul 30 16:34:55 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[Sat Jul 30 16:34:55 2022] RIP: 0033:0x7f05b90229b9
[Sat Jul 30 16:34:55 2022] RSP: 002b:7eff8e9efa38 EFLAGS: 0216 
ORIG_RAX: 01aa
[Sat Jul 30 16:34:55 2022] RAX: ffda RBX: 561c424f1d18 RCX: 
7f05b90229b9
[Sat Jul 30 16:34:55 2022] RDX:  RSI: 0001 RDI: 
0009
[Sat Jul 30 16:34:55 2022] RBP: 7eff8e9efa90 R08:  R09: 
0008
[Sat Jul 30 16:34:55 2022] R10:  R11: 0216 R12: 
561c42500938
[Sat Jul 30 16:34:55 2022] R13: 7f05b9821c00 R14: 561c425009e0 R15: 
561c424f1d18

[Sat Jul 30 16:34:55 2022]  
[Sat Jul 30 16:34:55 2022] INFO: task mariadbd:9955 blocked for more than 
120 seconds.

[Sat Jul 30 16:34:55 2022]   Not tainted 5.18.0-2-amd64 #1 Debian 5.18.5-1
[Sat Jul 30 16:34:55 2022] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Jul 30 16:34:55 2022] task:mariadbd    state:D stack:    0 pid: 
9955 ppid:  1289 flags:0x

[Sat Jul 30 16:34:55 2022] Call Trace:
[Sat Jul 30 16:34:55 2022]  
[...]


The message "mariadbd:1607 blocked for more than 120 seconds" will repeat.

MariaDB itself is running in a systemd-nspawn container. The container
storage is located on the volume for which snapshot creation will hang.



Hi


Lvm2 is *NOT* supported to be used within containers!

This requires some very specific 'system' modification and its in overal very 
problematic - so rule #1 is - always run lvm2 command on your hosting machine.



Now - you suggests you are able to reproduce this issue also on your bare 
metal hw - in this case run these 3 commands  before  'lvcreate'



# dmsetup table
# dmsetup info -c
# dmsetup ls --tree

# lvcreate 

If it blocks take again these:
# dmsetup table
# dmsetup info -c
# dmsetup ls --tree


You 'lvcreate -'  & 'dmesg' trace simply suggest that system is waiting to 
fulfill  'fsfreeze'  operation - it's unclear why it cannot be finished - 
maybe some problem with your 'raid' array ??


So far I do not see any bug on lvm2 side - all works from lvm2 side as 
expected - however it's unclear why your 'raid' is so slow ?


Note: you could always 'experiment' without lvm2 in the picture -
you can ran   'fsfreeze --freeze|--unfreeze'  yourself - to see whether even 
this command is able to finish  ?


Note2: if you system has lots of 'dirty' pages - it may potentially take a lot 
of time to 'fsfreeze' operation of a filesystem since all 'dirty' pages needs 
to be written to your disk..





Forgot to mention typical problem with container is a 'missing udevd' - 
resulting endless 

Re: [linux-lvm] lvcreate hangs forever during snapshot creation when suspending volume

2022-08-01 Thread Zdenek Kabelac

Dne 30. 07. 22 v 18:33 Thomas Deutschmann napsal(a):

Hi,

while trying to backup a Dell R7525 system running
Debian bookworm/testing using LVM snapshots I noticed that the system
will 'freeze' sometimes (not all the times) when creating the snapshot.
To recover from this, a power cycle is required.

Is this a problem caused by LVM or a kernel issue?

The command I run:

   /usr/sbin/lvcreate \
   -v \
   --size 100G \
   --snapshot /dev/mapper/devDataStore1-volMachines \
   --name volMachines_snap

The last 4 lines:

[Sat Jul 30 16:31:34 2022] debugfs: Directory 'dm-4' with parent 'block' 
already present!
[Sat Jul 30 16:31:34 2022] debugfs: Directory 'dm-7' with parent 'block' 
already present!
[Sat Jul 30 16:34:55 2022] INFO: task mariadbd:1607 blocked for more than 120 
seconds.
[Sat Jul 30 16:34:55 2022]   Not tainted 5.18.0-2-amd64 #1 Debian 5.18.5-1
[Sat Jul 30 16:34:55 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Sat Jul 30 16:34:55 2022] task:mariadbdstate:D stack:0 pid: 1607 
ppid:  1289 flags:0x
[Sat Jul 30 16:34:55 2022] Call Trace:
[Sat Jul 30 16:34:55 2022]  
[Sat Jul 30 16:34:55 2022]  __schedule+0x30b/0x9e0
[Sat Jul 30 16:34:55 2022]  schedule+0x4e/0xb0
[Sat Jul 30 16:34:55 2022]  percpu_rwsem_wait+0x112/0x130
[Sat Jul 30 16:34:55 2022]  ? __percpu_rwsem_trylock.part.0+0x70/0x70
[Sat Jul 30 16:34:55 2022]  __percpu_down_read+0x5e/0x80
[Sat Jul 30 16:34:55 2022]  io_write+0x2e9/0x300
[Sat Jul 30 16:34:55 2022]  ? _raw_spin_lock+0x13/0x30
[Sat Jul 30 16:34:55 2022]  ? newidle_balance+0x26a/0x400
[Sat Jul 30 16:34:55 2022]  ? fget+0x7c/0xb0
[Sat Jul 30 16:34:55 2022]  io_issue_sqe+0x47c/0x2550
[Sat Jul 30 16:34:55 2022]  ? select_task_rq_fair+0x174/0x1240
[Sat Jul 30 16:34:55 2022]  ? hrtimer_try_to_cancel+0x78/0x110
[Sat Jul 30 16:34:55 2022]  io_submit_sqes+0x3ce/0x1aa0
[Sat Jul 30 16:34:55 2022]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[Sat Jul 30 16:34:55 2022]  ? wake_up_q+0x4a/0x90
[Sat Jul 30 16:34:55 2022]  ? __do_sys_io_uring_enter+0x565/0xa60
[Sat Jul 30 16:34:55 2022]  __do_sys_io_uring_enter+0x565/0xa60
[Sat Jul 30 16:34:55 2022]  do_syscall_64+0x3b/0xc0
[Sat Jul 30 16:34:55 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[Sat Jul 30 16:34:55 2022] RIP: 0033:0x7f05b90229b9
[Sat Jul 30 16:34:55 2022] RSP: 002b:7eff8e9efa38 EFLAGS: 0216 
ORIG_RAX: 01aa
[Sat Jul 30 16:34:55 2022] RAX: ffda RBX: 561c424f1d18 RCX: 
7f05b90229b9
[Sat Jul 30 16:34:55 2022] RDX:  RSI: 0001 RDI: 
0009
[Sat Jul 30 16:34:55 2022] RBP: 7eff8e9efa90 R08:  R09: 
0008
[Sat Jul 30 16:34:55 2022] R10:  R11: 0216 R12: 
561c42500938
[Sat Jul 30 16:34:55 2022] R13: 7f05b9821c00 R14: 561c425009e0 R15: 
561c424f1d18
[Sat Jul 30 16:34:55 2022]  
[Sat Jul 30 16:34:55 2022] INFO: task mariadbd:9955 blocked for more than 120 
seconds.
[Sat Jul 30 16:34:55 2022]   Not tainted 5.18.0-2-amd64 #1 Debian 5.18.5-1
[Sat Jul 30 16:34:55 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Sat Jul 30 16:34:55 2022] task:mariadbdstate:D stack:0 pid: 9955 
ppid:  1289 flags:0x
[Sat Jul 30 16:34:55 2022] Call Trace:
[Sat Jul 30 16:34:55 2022]  
[...]


The message "mariadbd:1607 blocked for more than 120 seconds" will repeat.

MariaDB itself is running in a systemd-nspawn container. The container
storage is located on the volume for which snapshot creation will hang.



Hi


Lvm2 is *NOT* supported to be used within containers!

This requires some very specific 'system' modification and its in overal very 
problematic - so rule #1 is - always run lvm2 command on your hosting machine.



Now - you suggests you are able to reproduce this issue also on your bare 
metal hw - in this case run these 3 commands  before  'lvcreate'



# dmsetup table
# dmsetup info -c
# dmsetup ls --tree

# lvcreate 

If it blocks take again these:
# dmsetup table
# dmsetup info -c
# dmsetup ls --tree


You 'lvcreate -'  & 'dmesg' trace simply suggest that system is waiting to 
fulfill  'fsfreeze'  operation - it's unclear why it cannot be finished - 
maybe some problem with your 'raid' array ??


So far I do not see any bug on lvm2 side - all works from lvm2 side as 
expected - however it's unclear why your 'raid' is so slow ?


Note: you could always 'experiment' without lvm2 in the picture -
you can ran   'fsfreeze --freeze|--unfreeze'  yourself - to see whether even 
this command is able to finish  ?


Note2: if you system has lots of 'dirty' pages - it may potentially take a lot 
of time to 'fsfreeze' operation of a filesystem since all 'dirty' pages needs 
to be written to your disk..



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at 

Re: [linux-lvm] Problem with partially activate logical volume

2022-07-27 Thread Zdenek Kabelac

Dne 25. 07. 22 v 16:48 Ken Bass napsal(a):

(fwiw: I am new to this list, so please bear with me.)

Background: I have a very large (20TB) logical volume consisting of 3 drives. 
One of those drives unexpectedloy died (isn't that always the case :-)). The 
drive that failed happened to be the last PV. So I am assuming that there is 
still 2/3 of the data still intact and, to some extent, recoverable. Although, 
apparently the ext4 fs is not recognised.


I activated the LV partially (via -P). But running any utility on that (eg: 
dumpe2fs, e2fsck, ...) I get many of these  in dmesg:


"Buffer I/O error on dev dm-0, logical block xxx, async page read."  The 
thing is, the xxx block is on the missing drive/pv.


I have also tried some recovery software, but eventually get these same 
messages, and the data recovered is not really useful.


Please help! How can I get passed that dmesg error, and move on. 14TB 
recovered is better than 0.



Loosing such a large portion of device is always going to be a BIG problem.
Filesystem spreads metadata all over the place -  ExtX is somewhat better then 
BTree based FS like  XFS,BTRFS and may give you lot of your data back.


But that's why people should never underestimate how important is to keep 
reasonable fresh backups of their data - otherwise sooner or later there comes 
lesson like this one.


What you could try is to 'add' new PV to VG and use space for taking snapshot 
of your LV you want to repair - but this is somewhat complicated as you need 
to 'fix' VG first - which would ideally need to have some size of storage you 
just lost - but this gives you then fairly easy way forward.
(one way to do this is to use even 'virtual' storage over the loop back device 
- but that's for likely skilled admin.)


Lvm2 partial activation is designed to be used in a way to ACTIVATE LV - and 
copy the content to better/safe/secure location and there start to recover the 
storage.


Repairing storage in-place is usually straight road to hell - as there could 
be numerous way of recovery to try - but if your 1st. try actually destroys 
data even more, you can't retry with different strategy.


So depending on how much money and time you want to put into recovery of your 
data there are several different strategies possible  - considering storage 
space is relatively 'cheap' if your are data are really important.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm commands hanging when run from inside a kubernetes pod

2022-05-30 Thread Zdenek Kabelac

Dne 27. 05. 22 v 9:02 Abhishek Agarwal napsal(a):
When a kubernetes pod is scheduled on the node having lvm2 libraries already 
installed and trying to run lvm commands using those node binaries from inside 
the pod container, the commands hang and are waiting on something to complete. 
Although when ctrl+c is pressed the terminal session resumes and checking the 
final code for the execution returns a "0" error code and the commands 
operation is also carried out successfully.





lvm2 is *NOT* designed to be executed in/from a container.

It cannot work properly as it directly communicates with system's udevd - 
which you likely don't have running in your container.


You could kind of 'fake it' by running lvm2 wihout udev synchronization - but 
this will just open another cave of other problems (missing synchronization).


So your lvm2 command should be always executed on your hosting machine
(since it does control resources without containerization support - like 
devices) and then you should 'pass' created LV to  your container in some way.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] The lvm partiton size is not accurate

2022-05-20 Thread Zdenek Kabelac

Dne 18. 05. 22 v 12:35 Yu, Mingli napsal(a):

Hi Experts,

When use lvcreate to claim 72M partition, but it turns out to be 69M, more 
details as below, any hints?




lvm2 partition size *IS* 72M

DF reports amount of space in your filesystem - which will be always smaller 
then the actual size of partition itself.


I.e. try to do exact same steps with i.e. GPT partition size - you will get 
very much same results.


So simply said:

lvm2 partitions size >> size of filesystem user usable size

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Safety of DM_DISABLE_UDEV=1 and --noudevsync

2022-04-04 Thread Zdenek Kabelac

Dne 02. 04. 22 v 1:19 Demi Marie Obenour napsal(a):

Under what circumstances are DM_DISABLE_UDEV=1 and --noudevsync safe?
In Qubes OS, for example, I am considering using one or both of these,
but only for operations performed by qubesd.  systemd-udevd will still
be running, but it will be told to create no symlinks for the devices
these commands create or destroy.  systemd-udevd will still be in charge
of other devices, however, and other lvm2 commands may run that use
neither of these.


This env variable is not meant to be use on system with running udev.

So it's never safe - and it's only purpose is to be able to make some progress 
in some critical system condition.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Need information

2022-02-22 Thread Zdenek Kabelac

Dne 21. 02. 22 v 12:00 Gk Gk napsal(a):

Hi,

I work on cloud platforms and linux. I need some guidance with an issue 
observed with LVM.


How to forcefully kill a running lvm command like lvcreate ? I tried using 
kill -9 but it is not killing it. Only a reboot seems to does the trick. How 
to forcefully kill the lvcreate process or for that matter any running lvm 
commands ?


Also how to check the progress of an lvm command like lvcreate or lvremove ?


Hi

1. lvm2 should not freeze - it would be some horrible bug.

2. You can't kill any userspace app which is blocked in an uninterruptible 
kernel function - kill always wait till it gets to signal handler  (i.e. you 
are opening suspended device might be one such operation)


3. You need to tell us more about what you are doing - as without this there 
is no way to give meaningful help.



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Thin pool performance when allocating lots of blocks

2022-02-08 Thread Zdenek Kabelac

Dne 08. 02. 22 v 22:02 Demi Marie Obenour napsal(a):

On 2/8/22 15:37, Zdenek Kabelac wrote:

Dne 08. 02. 22 v 20:00 Demi Marie Obenour napsal(a):

Are thin volumes (which start as snapshots of a blank volume) efficient
for building virtual machine images?  Given the nature of this workload
(writing to lots of new, possibly-small files, then copying data from
them to a huge disk image), I expect that this will cause sharing to be
broken many, many times, and the kernel code that breaks sharing appears
to be rather heavyweight.  Furthermore, since zeroing is enabled, this
might cause substantial write amplification.  Turning zeroing off is not
an option for security reasons.

Is there a way to determine if breaking sharing is the cause of
performance problems?  If it is, are there any better solutions?


Hi

Usually the smaller the thin chunks size is the smaller the problem gets.
With current released version of thin-provisioning minimal chunk size is
64KiB. So you can't use smaller value to further reduce this impact.

Note - even if you do a lot of tiny 4KiB writes  - only the 'first' such write
into 64K area breaks sharing all following writes to same location no longer
have this penalty (also zeroing with 64K is less impactful...)

But it's clear thin-provisioning comes with some price - so if it's not good
enough from time constrains some other solutions might need to be explored.
(i.e. caching, better hw, splitting  FS into multiple partitions with
'read-only sections,)


Are the code paths that break sharing as heavyweight as I was worried
about?  Would a hypothetical dm-thin2 that used dm-bio-prison-v2 be
faster?



Biggest problem is the size of chunks - the smaller chunk you could use,
the less amplification you get. On the other hand the amount of metadata 
handling is increasing. Then there is a lot about parallelization, locking and 
disk synchronization.


If you are more interested in this topic, dive into kernel code.
Also I'd suggest to make some good benchmarking.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Thin pool performance when allocating lots of blocks

2022-02-08 Thread Zdenek Kabelac

Dne 08. 02. 22 v 20:00 Demi Marie Obenour napsal(a):

Are thin volumes (which start as snapshots of a blank volume) efficient
for building virtual machine images?  Given the nature of this workload
(writing to lots of new, possibly-small files, then copying data from
them to a huge disk image), I expect that this will cause sharing to be
broken many, many times, and the kernel code that breaks sharing appears
to be rather heavyweight.  Furthermore, since zeroing is enabled, this
might cause substantial write amplification.  Turning zeroing off is not
an option for security reasons.

Is there a way to determine if breaking sharing is the cause of
performance problems?  If it is, are there any better solutions?


Hi

Usually the smaller the thin chunks size is the smaller the problem gets.
With current released version of thin-provisioning minimal chunk size is 
64KiB. So you can't use smaller value to further reduce this impact.


Note - even if you do a lot of tiny 4KiB writes  - only the 'first' such write 
into 64K area breaks sharing all following writes to same location no longer 
have this penalty (also zeroing with 64K is less impactful...)


But it's clear thin-provisioning comes with some price - so if it's not good 
enough from time constrains some other solutions might need to be explored.
(i.e. caching, better hw, splitting  FS into multiple partitions with 
'read-only sections,)


For analysis of device hot points you could check 'dmstats' tool for DM devices.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-04 Thread Zdenek Kabelac

Dne 04. 02. 22 v 1:01 Demi Marie Obenour napsal(a):

On Thu, Feb 03, 2022 at 01:28:37PM +0100, Zdenek Kabelac wrote:

Dne 03. 02. 22 v 5:48 Demi Marie Obenour napsal(a):

On Mon, Jan 31, 2022 at 10:29:04PM +0100, Marian Csontos wrote:

On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
d...@invisiblethingslab.com> wrote:


On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:



If they need to use containerized software they should use containers like
i.e. Docker - if they need full virtual secure machine - it certainly has
it's price (mainly way higher memory consumption)
I've some doubts there is some real good reason to have quickly created VMs
as they surely are supposed to be a long time living entities
(hours/days...)


Simply put, Qubes OS literally does not have a choice.  Qubes OS is
intended to protect against very high-level attackers who are likely to


I'd say you are putting your effort into wrong place then.
AKA you effort placed in optimizing given chang is no where near to using 
things properly...



VMs and containers have its strength and weaknesses..
Not sure why some many people try to pretend VMs can be as efficient as
containers or containers as secure as VMs. Just always pick the right
tool...


Qubes OS needs secure *and* fast.  To quote the seL4 microkernel’s
mantra, “Security is no excuse for poor performance!”.


And who ever tells you he can get the same performance for VM as with 
container has no idea how OS works...


Security simply *IS* expensive (especially with Intel CPUs ;))

Educated user needs to pick the level he wants to pay for it.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-03 Thread Zdenek Kabelac

Dne 03. 02. 22 v 5:48 Demi Marie Obenour napsal(a):

On Mon, Jan 31, 2022 at 10:29:04PM +0100, Marian Csontos wrote:

On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
d...@invisiblethingslab.com> wrote:


On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:

Your VM usage is different from ours - you seem to need to clone and
activate a VM quickly (like a vps provider might need to do).  We
generally have to buy more RAM to add a new VM :-), so performance of
creating a new LV is the least of our worries.


To put it mildly, yes :).  Ideally we could get VM boot time down to
100ms or lower.



Out of curiosity, is snapshot creation the main culprit to boot a VM in
under 100ms? Does Qubes OS use tweaked linux distributions, to achieve the
desired boot time?


The goal is 100ms from user action until PID 1 starts in the guest.
After that, it’s the job of whatever distro the guest is running.
Storage management is one area that needs to be optimized to achieve
this, though it is not the only one.


I'm wondering from where those 100ms came from?

Users often mistakenly target for wrong technologies for their tasks.

If they need to use containerized software they should use containers like 
i.e. Docker - if they need full virtual secure machine - it certainly has it's 
price (mainly way higher memory consumption)
I've some doubts there is some real good reason to have quickly created VMs as 
they surely are supposed to be a long time living entities  (hours/days...)


So unless you want to create something for marketing purposes aka - my table 
is bigger then yours - I don't see the point.


For quick instancies of software apps I'd always recommend containers - which 
are vastly more efficient and scalable.


VMs and containers have its strength and weaknesses..
Not sure why some many people try to pretend VMs can be as efficient as 
containers or containers as secure as VMs. Just always pick the right tool...




Back to business. Perhaps I missed an answer to this question: Are the
Qubes OS VMs throw away?  Throw away in the sense like many containers are
- it's just a runtime which can be "easily" reconstructed. If so, you can
ignore the safety belts and try to squeeze more performance by sacrificing
(meta)data integrity.


Why does a trade-off need to be made here?  More specifically, why is it
not possible to be reasonably fast (a few ms) AND safe?


Security, safety and determinism always takes away efficiency.

The higher amount of randomness you can live with, the faster processing you 
can achieve - you just need to cross you fingers :)

(i.e. drop transaction synchornisation :))

Quite frankly - if you are orchestrating mostly same VMs, it would be more 
efficient, to just snapshot them with already running memory environment -
so instead of booting VM always from 'scratch', you restore/resume those VMs 
at some already running point - from which it could start deviate.

Why wasting CPU on processing over and over same boot
There you should hunt your miliseconds...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-02 Thread Zdenek Kabelac

Dne 02. 02. 22 v 3:09 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 06:43:13PM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

My biased advice would be to stay with lvm2. There is lot of work, many
things are not well documented and getting everything running correctly will
take a lot of effort  (Docker in fact did not managed to do it well and was
incapable to provide any recoverability)


What did Docker do wrong?  Would it be possible for a future version of
lvm2 to be able to automatically recover from off-by-one thin pool
transaction IDs?


Ensuring all steps in state-machine are always correct is not exactly simple.
But since I've not heard about off-by-one problem for a long while -  I 
believe we've managed to close all the holes and bugs in double-commit system

and metadata handling by thin-pool and lvm2 (for recent lvm2 & kernel)


It's difficult - if you would be distributing lvm2 with exact kernel version
& udev & systemd with a single linux distro - it reduces huge set of
troubles...


Qubes OS comes close to this in practice.  systemd and udev versions are
known and fixed, and Qubes OS ships its own kernels.


Systemd/udev evolves - so fixed today doesn't really mean same version will be 
there tomorrow.  And unfortunately systemd is known to introduce  backward 
incompatible changes from time to time...



I'm not familiar with QubesOS - but in many cases in real-life world we
can't push to our users latest - so we need to live with bugs and
add workarounds...


Qubes OS is more than capable of shipping fixes for kernel bugs.  Is
that what you are referring to?

not going to starting discussing this topic ;)


Chain filesystem->block_layer->filesystem->block_layer is something you most
likely do not want to use for any well performing solution...
But it's ok for testing...


How much of this is due to the slow loop driver?  How much of it could
be mitigated if btrfs supported an equivalent of zvols?


Here you are missing the core of problem from kernel POV aka
how the memory allocation is working and what are the approximation in kernel 
with buffer handling and so on.
So whoever is using  'loop' devices in production systems in the way described 
above has never really tested any corner case logic


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-31 Thread Zdenek Kabelac

Dne 31. 01. 22 v 12:02 Gionatan Danti napsal(a):

Il 2022-01-29 18:45 Demi Marie Obenour ha scritto:

Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?


I think that, if anything, it would be worse: a long discard during boot can 
be problematic, even leading to timeout on starting other services.

After all, blkdiscard should be faster then something done at higher level.

That said, I seem to remember that deleting a fat volume should automatically 
trim/discard it if issue_discard=1. Is this not true for thin volumes?



'issue_discard' relates only to the internal lvm2 logic when some extents 
become free for reuse (so i.e. after 'lvremove/lvreduce/vgremove...'.


However since with thin volumes no physical extents of VG are released (as the 
thin volume is releasing chunks from the thin-pool) - there is no discard 
issued by lvm.


Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 19:01 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 06:56:43PM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:

Then are always landing in upstream kernel once they are all validated &
tested (recent kernel already has many speed enhancements).


Thanks!  Which mailing list should I be watching?


lkml


You could easily run in parallel individual blkdiscards for your thin LVs
For most modern drives thought it's somewhat waste of time...

Those trimming tools should be used when they are solving some real
problems, running them just for fun is just energy & performance waste


My understanding (which could be wrong) is that periodic trim is
necessary for SSDs.


This was useful for archaic SSDs. Modern SSD/NVMe drives are much smarter...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):



Discard of thins itself is AFAIC pretty fast - unless you have massively
sized thin devices with many GiB of metadata - obviously you cannot process
this amount of metadata in nanoseconds (and there are prepared kernel
patches to make it even faster)


Would you be willing and able to share those patches?


Then are always landing in upstream kernel once they are all validated & 
tested (recent kernel already has many speed enhancements).





What is the problem is the speed of discard of physical devices.
You could actually try to feel difference with:
lvchange --discards passdown|nopassdown thinpool


In Qubes OS I believe we do need the discards to be passed down
eventually, but I doubt it needs to be synchronous.  Being able to run
the equivalent of `fstrim -av` periodically would be amazing.  I’m
CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
has something to say.


You could easily run in parallel individual blkdiscards for your thin LVs
For most modern drives thought it's somewhat waste of time...

Those trimming tools should be used when they are solving some real problems, 
running them just for fun is just energy & performance waste





Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
Keeping metadata on same hdd spindle as data is always going to feel slow
(in fact it's quite pointless to talk about performance and use hdd...)


That explains why I had such a horrible experience with my initial
(split between NVMe and HDD) install.  I would not be surprised if some
or all of the metadata volume wound up on the spinning disk.


With lvm2 user can always 'pvmove'  any LV to any desired PV.
There is not yet any 'smart' logic to do it automatically.


add support for efficient snapshots of data stored on a VDO volume, and
to have multiple volumes on top of a single VDO volume.  Furthermore,


We hope we will add some direct 'snapshot' support to VDO so users will not
need to combine both technologies together.


Does that include support for splitting a VDO volume into multiple,
individually-snapshottable volumes, the way thin works?


Yes - that's the plan - to have multiple VDO LV in a single VDOPool.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.




So brave developers may always write their own management tools for their
constrained environment requirements that will by significantly faster in
terms of how many thins you could create per minute (btw you will need to
also consider dropping usage of udev on such system)


What kind of constraints are you referring to?  Is it possible and safe
to have udev running, but told to ignore the thins in question?


Lvm2 is oriented more towards managing set of different disks,
where user is adding/removing/replacing them.  So it's more about
recoverability, good support for manual repair  (ascii metadata),
tracking history of changes,  backward compatibility, support
of conversion to different volume types (i.e. caching of thins, pvmove...)
Support for no/udev & no/systemd, clusters and nearly every linux distro
available... So there is a lot - and this all adds quite complexity.


I am certain it does, and that makes a lot of sense.  Thanks for the
hard work!  Those features are all useful for Qubes OS, too — just not
in the VM startup/shutdown path.


So once you scratch all this - and you say you only care about single disc
then you are able to use more efficient metadata formats which you could
even keep permanently in memory during the lifetime - this all adds great
performance.

But it all depends how you could constrain your environment.

It's worth to mention there is lvm2 support for 'external' 'thin volume'
creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin
volume creation, activation, deactivation of thins is left to external tool.
This has been used by docker for a while - later on they switched to
overlayFs I believe..


That indeeds sounds like a good choice for Qubes OS.  It would allow the
data and metadata LVs to be any volume type that lvm2 supports, and
managed using all of lvm2’s features.  So one could still put the
metadata on a RAID-10 volume while everything else is RAID-6, or set up
a dm-cache volume to store the data (please correct me if I am wrong).
Qubes OS has already moved to using a separate thin pool for virtual
machines, as it prevents dom0 (privileged management VM) from being run
out of disk space (by accident or malice).  That means that the thin
pool use for guests is managed only by Qubes OS, and so the standard
lvm2 tools do not need to touch it.

Is this a setup that you would recommend, and would be comfortable using
in production?  As far as metadata is concerned, Qubes OS has its own
XML file containing metadata about all qubes, which should suffice for
this purpose.  To prevent races during updates and ensure automatic
crash recovery, is it sufficient to store metadata for both new and old
transaction IDs, and pick the correct one based on the device-mapper
status line?  I have seen lvm2 get in an inconsistent state (transaction
ID off by one) that required manual repair before, which is quite
unnerving for a desktop OS.


My biased advice would be to stay with lvm2. There is lot of work, many things 
are not well documented and getting everything running correctly will take a 
lot of effort  (Docker in fact did not managed to do it well and was incapable 
to provide any recoverability)



One feature that would be nice is to be able to import an
externally-provided mapping of thin pool device numbers to LV names, so
that lvm2 could provide a (read-only, and not guaranteed fresh) view of
system state for reporting purposes.


Once you will have evidence it's the lvm2 causing major issue - you could 
consider whether it's worth to step into a separate project.




It's worth to mention - the more bullet-proof you will want to make your
project - the more closer to the extra processing made by lvm2 you will get.


Why is this?  How does lvm2 compare to stratis, for example?


Stratis is yet another volume manager written in Rust combined with XFS for
easier user experience. That's all I'd probably say about it...


That’s fine.  I guess my question is why making lvm2 bullet-proof needs
so much overhead.


It's difficult - if you would be distributing lvm2 with exact kernel version & 
udev & systemd with a sin

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):

Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?


Hi


Nope there is currently no support from lvm2 side for this.
Feel free to open RFE.


Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160




Thanks

Although your use-case Thinpool on top of VDO is not really a good plan and
there is a good reason behind why lvm2 does not support this device stack
directly (aka thin-pool data LV as VDO LV).
I'd say you are stepping on very very thin ice...


Thin pool on VDO is not my actual use-case.  The actual reason for the
ticket is slow discards of thin devices that are about to be deleted;


Hi

Discard of thins itself is AFAIC pretty fast - unless you have massively sized 
thin devices with many GiB of metadata - obviously you cannot process this 
amount of metadata in nanoseconds (and there are prepared kernel patches to 
make it even faster)


What is the problem is the speed of discard of physical devices.
You could actually try to feel difference with:
lvchange --discards passdown|nopassdown thinpool

Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
Keeping metadata on same hdd spindle as data is always going to feel slow
(in fact it's quite pointless to talk about performance and use hdd...)


you can find more details in the linked GitHub issue.  That said, now I
am curious why you state that dm-thin on top of dm-vdo (that is,
userspace/filesystem/VM/etc ⇒ dm-thin data (*not* metadata) ⇒ dm-vdo ⇒
hardware/dm-crypt/etc) is a bad idea.  It seems to be a decent way to


Out-of-space recoveries are ATM much harder then what we want.

So as long as user can maintain free space of your VDO and thin-pool it's ok. 
Once user runs out of space - recovery is pretty hard task (and there is 
reason we have support...)



add support for efficient snapshots of data stored on a VDO volume, and
to have multiple volumes on top of a single VDO volume.  Furthermore,


We hope we will add some direct 'snapshot' support to VDO so users will not 
need to combine both technologies together.


Thin is more oriented towards extreme speed.
VDO is more about 'compression & deduplication' - so space efficiency.

Combining both together is kind of harming their advantages.


https://access.redhat.com/articles/2106521#vdo recommends exactly this
use-case.  Or am I misunderstanding you?


There are many paths to Rome...
So as mentioned above - you need to pick performance/space effieciency.
And since you want to write your own  thin volume managing software, I'm 
guessing you care about performance a lot  (so we do - but with our given 
constrains that are limiting us to some level)...



Also I assume you have already checked performance of discard on VDO, but I
would not want to run this operation frequently on any larger volume...


I have never actually used VDO myself, although the documentation does
warn about this.


It's been purely related to the initial BZ description which cares a lot about 
thin discard performance and following comment adds VDO discard into same 
equation... :)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.




So brave developers may always write their own management tools for their
constrained environment requirements that will by significantly faster in
terms of how many thins you could create per minute (btw you will need to
also consider dropping usage of udev on such system)


What kind of constraints are you referring to?  Is it possible and safe
to have udev running, but told to ignore the thins in question?


Lvm2 is oriented more towards managing set of different disks,
where user is adding/removing/replacing them.  So it's more about 
recoverability, good support for manual repair  (ascii metadata),

tracking history of changes,  backward compatibility, support
of conversion to different volume types (i.e. caching of thins, pvmove...)
Support for no/udev & no/systemd, clusters and nearly every linux distro 
available... So there is a lot - and this all adds quite complexity.


So once you scratch all this - and you say you only care about single disc 
then you are able to use more efficient metadata formats which you could even 
keep permanently in memory during the lifetime - this all adds great performance.


But it all depends how you could constrain your environment.

It's worth to mention there is lvm2 support for 'external' 'thin volume' 
creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin 
volume creation, activation, deactivation of thins is left to external tool.
This has been used by docker for a while - later on they switched to overlayFs 
I believe..





It's worth to mention - the more bullet-proof you will want to make your
project - the more closer to the extra processing made by lvm2 you will get.


Why is this?  How does lvm2 compare to stratis, for example?


Stratis is yet another volume manager written in Rust combined with XFS for 
easier user experience. That's all I'd probably say about it...



However before you will step into these waters - you should probably
evaluate whether thin-pool actually meet your needs if you have that high
expectation for number of supported volumes - so you will not end up with
hyper fast snapshot creation while the actual usage then is not meeting your
needs...


What needs are you thinking of specifically?  Qubes OS needs block
devices, so filesystem-backed storage would require the use of loop
devices unless I use ZFS zvols.  Do you have any specific
recommendations?


As long as you live in the world without crashes, buggy kernels, apps  and 
failing hard drives everything looks very simple.

And every development costs quite some time & money.

Since you mentioned ZFS - you might want focus on using 'ZFS-only' solution.
Combining  ZFS or Btrfs with lvm2 is always going to be a painful way as those 
filesystems have their own volume management.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-29 Thread Zdenek Kabelac

Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):

Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?


Hi


Nope there is currently no support from lvm2 side for this.
Feel free to open RFE.


Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160




Thanks

Although your use-case Thinpool on top of VDO is not really a good plan and 
there is a good reason behind why lvm2 does not support this device stack 
directly (aka thin-pool data LV as VDO LV).

I'd say you are stepping on very very thin ice...

Also I assume you have already checked performance of discard on VDO, but I 
would not want to run this operation frequently on any larger volume...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-29 Thread Zdenek Kabelac

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.



Hi

Short answer: it depends ;)

Longer story:
If you want to create few thins per hour - than it doesn't really matter.
If you want to create few thins in a second - than the cost of lvm2 management 
is very high  - as lvm2 does far more work then just sending a simple ioctl 
(as it's called logical volume management for a reason)


So brave developers may always write their own management tools for their 
constrained environment requirements that will by significantly faster in 
terms of how many thins you could create per minute (btw you will need to also 
consider dropping usage of udev on such system)


It's worth to mention - the more bullet-proof you will want to make your 
project - the more closer to the extra processing made by lvm2 you will get.


However before you will step into these waters - you should probably evaluate 
whether thin-pool actually meet your needs if you have that high expectation 
for number of supported volumes - so you will not end up with hyper fast 
snapshot creation while the actual usage then is not meeting your needs...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-29 Thread Zdenek Kabelac

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):

Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?


Hi


Nope there is currently no support from lvm2 side for this.
Feel free to open RFE.

I guess this would possibly justify some form of support for 'writable' 
component activation.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?

2021-12-22 Thread Zdenek Kabelac

Dne 21. 12. 21 v 17:12 Tomas Dalebjörk napsal(a):
Thanks for explaining all that details about how a snapshot is formatted on 
the COW device.

I already know that part.



Well your messaging is then somewhat confusing.

I am more interested in how the disk containing the COW data can be merged 
back to an LV volume.
The second part only mentioned that it is possible, but not which steps are 
involved.


As documented in the manual.
To split a snapshot from its origin (our words detach) one can use:
lvconvert --splitsnapshot vg/s1
Right?

To reverse that process, according to the manual; one can use:
lvconvert -s vg/s1
Right?

But as I mentioned before, this requires that the vg/s1 exists as an object in 
the LVM metadata.

What if you are on a new server, that does not have vg/s1?
How to create that object or whatever you like to call this on the server?
The only way I got it is to use the
vgextend
lvcreate
lvconvert --splitsnapshot

And now reattach it, so that the actual merge can happen.
The object should exist now, so that the command: lvconvert -s vg/s1 can work



There is no problem with reattaching existing COW to any other LV -
it's plain  'lvconvert  -s -Zn -c xxx vg/origin   vg/snapcow'

this will rejoin former origin with former COW volume and avoids 'zeroing' 
metadata stored on COW.



But to have this usable - there must have been NO write access to the origin 
between the moment your snapshot was 'split' and 'reattached'

and it's documented in  'man lvconvert'


Or how can the object vg/s1 be created so that it can be referenced to by the 
lvconvert command?

The disk is formated as a COW device, and contains all of the data.
So how hard can it be to just reattach that volume to an empty or existing LV 
volume on the server?


There is no problem with this - however your  provided case just shown you 
have very small amount of data stored in snapshot.


So while snapshot has large size - it's been mostly unoccupied with data - so 
any merge into  'clean' origin is kind of meaningless in this particular case.



If it works on same server, why can't it work on any other new servers, as the 
COW device contains ALL the data needed (we make sure it contains all the data)


If you want to give it a try, just create a snapshot on a specific device
And change all the blocks on the origin, there you are, you now have a cow 
device containing all data needed.
How to move this snapshot device to another server, reattach it to an empty lv 
volume as a snapshot.


I'm aware how this works - being lvm2 developer...

I'm just trying to explain there is better and simpler way how to reach your 
goal.


Simply take a 'time consistent' snapshot of your origin.
(So it does not change while you take a 'dd' copy)

Take a 'DD' copy of such stable snapshot to a new LV.
And restore this 'copied' LV on your 2nd. server.

I simply don't follow why would you want to complicate all of this with 
snapshot merging - makes no sense and just cause a massive slowdown



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?

2021-12-21 Thread Zdenek Kabelac

Dne 21. 12. 21 v 15:44 Tomas Dalebjörk napsal(a):

hi

I think I didn’t explain this clear enough
Allthe lvm data is present in the snapshot that I provision from our backup 
system
I can guarantee that!

If I just mount that snapshot from our backup system, it works perfectly well

so we don’t need the origin volumes in other way than copying back to it
we just need to reanimate it as a cow volume
mentioning that all data has been changed
the cow is just referencing to the origin location, so no problem there
All our data is in the cow volume, not just the changes

just to compare
if you change just 1 byte on every chunksize in the origin volume, than the 
snapshot will contain all data, plus some meta data etc.
That is what I talk about here.
So how do I retach this volume to a new server?

as the only argument acceptable argument by the lvconvert is vg/s1 ?

That assumes that vg/s1 is present
so how to make it present?


Hi

As said in my previous post - the 'format' of data stored on COW storage 
(which is the 'real' meaning of snapshot LV) does NOT in any way resembles the 
'normal' LV.


So the COW LV could be really ONLY use together with 'snapshot' target.

The easiest way how to 'copy' this snapshot to normal LV is like this:


lvcreate -L size  -n newLV  vg

dd if=/dev/vg/snapshotLV  of=/dev/vg/newLV  bs=512K


(so with 'DD' you copy data in 'correct' format)

You cannot convert snapshot LV to 'normal' LV in any other way then to merge 
this snapshot LV into your origin LV  (so origin is gone)

(lvconvert --merge)

You can also  'split' snapshot COW LV and 'reattach' such snapshot to other LV 
- but this requires rather good knowledge about whole functioning of this 
snapshotting - so you know what can you do and what can you expect. But I'd 
likely recommend  'dd'.
You cannot use 'splitted' COW LV for i.e. filesystem - as it contains 'mixed' 
snapshot metadata and snapshot blocks.


Old snapshot meaning was - to take 'time consistent' snapshot of LV which then 
you can use for i.e. taking backup


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?

2021-12-21 Thread Zdenek Kabelac

Dne 19. 12. 21 v 17:43 Tomas Dalebjörk napsal(a):

Hi,

I am trying to understand how to convert a disk containing snapshot data.
This is how I tested this:
1. locate the snapshot testlv.211218.232255
root@debian10:/dev/mapper# lvs
   LV                   VG          Attr       LSize   Pool Origin Data% 
  Meta%  Move Log Cpy%Sync Convert

   home                 debian10-vg -wi-ao   1.00g
   root                 debian10-vg -wi-ao  <3.81g
   swap_1               debian10-vg -wi-ao 976.00m
   testlv               debian10-vg owi-aos--- 100.00m
testlv.211218.232255 debian10-vg swi-a-s--- 104.00m      testlv 1.44
root@debian10:/dev/mapper#



Hi

It looks like there is some sort of misunderstanding HOW these snapshots with 
lvm2 do work.


The snapshot LV does NOT contain whole copy of the origin LV (the one you've 
take its snapshot)


So while 'lvs' is presenting you the size of snapshot - it's not a 'regular' 
volume behind the scene. Instead it's a set of differentiating blocks from 
your origin stored in a way the came in use.


So while your snapshot looks it has size 104Mib and origin 100MiB - it means
there is just allocated max possible size of snapshot to store all different 
chunk + some extra metadata.  But ATM only 1.44% of this space is actually 
used to hold all these block (AKA ~1.5MiB or real storage was used so far)


There is basically NO WAY to use this 'COW' storage area without its original 
volume.


If you want to use such 'snapshot' elsewhere you simply need to *dd* such LV - 
which is presented (via magic world of DM targets) to user-space to appear 
like your original LV in the moment of snapshot.


But the actual 'raw' content stored in disk is in 'internal' form - not really 
usable outside (although the format is not really a complicated one)


Note - these so called 'thick' snapshot or 'old' snapshot are now better 
handled via thin-provisioning - giving much better performance - especially if 
you plan to keep snapshot for long term or have it in bigger size.


Hopefully it now makes it way more clear.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-18 Thread Zdenek Kabelac

Dne 18. 10. 21 v 17:04 David Teigland napsal(a):

On Mon, Oct 18, 2021 at 06:24:49AM +, Martin Wilck wrote:

I'd like to second Peter here, "RUN" is in general less fragile than
"IMPORT{PROGRAM}". You should use IMPORT{PROGRAM}" if and only if

  - the invoked program can work with incomplete udev state of a device
(the progrem should not try to access the device via
libudev, it should rather get properties either from sysfs or the
uevent's environment variables)
  - you need the result or the output of the program in order to proceed
with rules processing.


Those are both true in this case.  I can't say I like it either, but udev
rules force hacky solutions on us.  I began trying to use RUN several
months ago and I think I gave up trying to find a way to pass values from
the RUN program back into the udev rule (possibly by writing values to a
temp file and then doing IMPORT{file}).  The udev rule needs the name of
the VG to activate, and that name comes from the pvscan.  For an even
uglier form of this, see the equivalent I wrote for dracut:
https://github.com/dracutdevs/dracut/pull/1567/files

The latest version of the hybrid service+event activation is here
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-7

I've made it simple to edit lvm.conf to switch between:
- activation from fixed services only
- activation from events only
- activation from fixes services first, then from events

There are sure to be tradeoffs, we know that many concurrent activations
from events are slow, and fixed services which are more serialized could
be delayed from slow devices.  I'm still undecided on the best default
setting, i.e. which will work best for most people, and would welcome any
thoughts or relevant experience.



I've some testing for these issues - we are trimming some 'easy' to fix issues 
away (so git HEAD should be now be actually already somewhat bit faster).


The more generic solution with auto activation should likely try to 'active' 
as much found complete VGs as it can at any given moment in time.


ATM lvm2 suffers when it's being running massively parallel - this has not 
been yet fully analyzed - but there is certainly much better throughput if 
there is limitted amount of 'parallel' executed lvm2 commands.


Our goal ATM is to accelerate 'pvscan'.

We could think if there is some easy mechanism how to 'accumulate' complete 
VGs and activate all of them in single 'vgchange' command - and run the next 
after the running one is finished - this currently gives reasonable good 
'throughput' and should work without 'exceptional' case of being fast only once.


Another point for thinking is 'limiting' set of PVs for this activation 
command - so we avoid  repetitive validations of the whole system  - for this 
should be usable option  --devices|--devicesfile - but needs some thinking how 
to use this in smart way with the 'collected' activation.




Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Help restoring a corrupted PV partition ( 18th )

2021-10-18 Thread Zdenek Kabelac

Dne 18. 10. 21 v 20:08 Brian McCullough napsal(a):


I have had a disk go bad on me, causing me to lose one PV.

If I am not providing sufficient, or the proper, information, feel free
to ask for more.


I seem to have retrieved the partition using ddrescue and put it on to
a new drive, but it seems to be missing some label information, because
pvscan doesn't recognize it as a PV partition.

Using hexdump, I see the string " LVM2 " at 0x1004, but nothing before
that.  The whole 16 bytes is:

0x01000  16 d6 8e db 20 4c 56 4d  32 20 78 5b 35 41 25 72
 L  V  M  2



I find what appears to be an LVM2 vgconfig block starting at 0x01200,
extracted that to a file and was able to read the UUID that this PV
should have.  It is one of about a dozen that make up this VG.


On another machine, I dumped a PV partition, and find "LABLEONE" at
0x200, with the same " LVM2 " at 0x01000.

I was concerned that my dump was offset, but the comparison to the
"good" one suggests that that isn't the problem, but just the missing
"LABLEONE" and related information at 0x0200.


How to fix?

If I do a "pvcreate --uuid " would this fix that recovered partition
so that pvscan and friends can work properly, and I can finally boot
that machine?


Hi

It's quite important to be aware how the disk corruption happened.
Was this plain disk hw error -  or some crash of raid setup ?

Normally you could restore PV with this:

pvcreate --uuid   --restorefile  file_with_vg_backup  /dev/d
vgcfgrestore --restorefile   file_with_vg_backup   vgname


But if the content of device was scramble by some 'raid' bug - you might have 
problem to retrieve any usable data afterward.



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM cachepool inconsistency after power event

2021-10-05 Thread Zdenek Kabelac

Dne 05. 10. 21 v 13:34 Krzysztof Chojnowski napsal(a):

On Tue, Oct 5, 2021 at 06:35 AM Ming-Hung Tsai  wrote:

Could you help run cache_check independently, to see the error messages? Thanks.

# sudo lvchange -ay -f vg0/wdata_cachepool_cpool_cmeta
# sudo cache_check /dev/mapper/vg0-wdata_cachepool_cpool_cmeta
# sudo lvchange -an -f vg0/wdata_cachepool_cpool_cmeta


Thanks for your help, this is what I get:
$ sudo lvchange -ay -f vg0/wdata_cachepool_cpool_cmeta
Do you want to activate component LV in read-only mode? [y/n]: y
   Allowing activation of component LV.
$ sudo cache_check /dev/mapper/vg0-wdata_cachepool_cpool_cmeta
examining superblock
examining mapping array
   missing mappings [1, 0]:
 missing blocks
examining hint array
   missing mappings [1, 0]:
 missing blocks
examining discard bitset
$ echo $?
1
$ sudo lvchange -an -f vg0/wdata_cachepool_cpool_cmeta
$ sudo lvchange -ay   vg0/tpg1-wdata
   Check of pool vg0/wdata_cachepool_cpool failed (status:1). Manual
repair required!


Hello Krzystof

You need to repair your cache-pool metadata.

But before continuing with advices - what is the version of kernel lvm2 & your
device-mapper-persistent-data package (aka  'cache_check -V)

Component activation allows activation of your _cmeta LV - but only in 
read-only mode - so repair must go into a new LV.


Since cache_check reported  '1' exist code (as an error) - your metadata do 
require a fix.


lvconvert --repair should be able to handle this case - although likely 
without 'smart' placement' of fixed metadata (pvmove needed after metadata fix)


You can allocated your new metadata and easily cache_repair them.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] " Failed to find physical volume

2021-09-24 Thread Zdenek Kabelac

Dne 23. 09. 21 v 20:03 alessandro macuz napsal(a):

Thanks Zdenek,

so it might be that metadata is corrupted somehow and hence the pvs program 
doesn't recognize that partition as physical volume?
That may explain why lvmdiskscan reports physical disks (by just looking at 
the partition type 8e) and pvs completely ignores it.

Am I correct?



Hi


Yes - if your disk header part has lost/damaged its content - it will not be 
recognized as PV - thus completely ignored.


Note - the easiest is to check the verbose output of  'pvs -vvv' - where you 
could follow up what is command doing in relatively 'readable' form - if you 
can't follow it - just attach to the email for overlook why could be your disk 
eventually ignored.


Note - the other reason could be the device got filtered by some filter - but 
I assume you've not changed your configuration on your system ?



Zdenek



Le jeu. 23 sept. 2021 à 15:52, Zdenek Kabelac  a écrit :

Dne 22. 09. 21 v 18:48 alessandro macuz napsal(a):
> fdisk correctly identifies the extended partition as 8e.
> I wonder which kind of data lvmdiskscan and pvs use in order to list LVM
> physical volumes.
> Does PVS check some specific metadata within the partition without just
> relying on the type of partition displayed by fdisk?
>
>

Hi

Yes - PVs do have header signature keeping information about PV attributes
and also has the storage area to keep lvm2 metadata.

Partition flags known to fdisk are irrelevant.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] " Failed to find physical volume

2021-09-23 Thread Zdenek Kabelac

Dne 22. 09. 21 v 18:48 alessandro macuz napsal(a):

fdisk correctly identifies the extended partition as 8e.
I wonder which kind of data lvmdiskscan and pvs use in order to list LVM 
physical volumes.
Does PVS check some specific metadata within the partition without just 
relying on the type of partition displayed by fdisk?





Hi

Yes - PVs do have header signature keeping information about PV attributes
and also has the storage area to keep lvm2 metadata.

Partition flags known to fdisk are irrelevant.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] vgchange -a n --sysinit hangs without udevd

2021-09-23 Thread Zdenek Kabelac

Dne 22. 09. 21 v 9:19 Arkadiusz Miśkiewicz napsal(a):


Hello.

Linux 5.10.67, glibc 2.34, lvm 2.03.13, udevd 246, simple init script
run on shutdown which does

vgchange -a n --sysinit

no other processes are running (just init, my script and vgchange),
vgstorage is vg on md raid 10 on 4 hdd disks.

it hangs with

+ /sbin/vgchange -a n --sysinit --verbose --debug
   Failed to find sysfs mount point
   No proc filesystem found: skipping sysfs filter
   No proc filesystem found: skipping multipath filter
   File locking initialisation failed.
   Deactivating logical volume vgstorage/lvhome.
   Removing vgstorage-lvhome (253:0)
   Deactivated 1 logical volumes in volume group vgstorage.

Note that running
vgchange -a n --sysinit --verbose --debug
on fully running system just works fine:

# vgchange -a n --sysinit --verbose --debug
   Deactivating logical volume vgstorage/lvhome.
   Removing vgstorage-lvhome (253:0)
   Deactivated 1 logical volumes in volume group vgstorage.
   0 logical volume(s) in volume group "vgstorage" now active
#


so I've restarted udevd just before vgchange call in my script and it
works.

Other test:

fully running system
# udevd --version
246
# killall udevd
# vgchange -a n --sysinit --verbose --debug
   Deactivating logical volume vgstorage/lvhome.
   Removing vgstorage-lvhome (253:0)
   Deactivated 1 logical volumes in volume group vgstorage.
and hangs


Why is udevd needed for vgchange there? It wasn't needed to deactivate
vg before AFAIK.


strace


Hi

It does look like the initial detection of running udev within lvm2 code is 
doing something wrong.


It should recognize there is no udev running and behave like if  --noudevsync 
would be given.


I'll check what has changed.


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM tools segfault since 2.03.12

2021-09-06 Thread Zdenek Kabelac

On 02. 09. 21 21:10, Jean-Michel Pollion wrote:

Le jeudi 02 septembre 2021 à 19:21 +0200, Zdenek Kabelac a écrit :

Dne 02. 09. 21 v 17:49 Jean-Michel Pollion napsal(a):

Le jeudi 02 septembre 2021 à 16:11 +0200, Zdenek Kabelac a écrit :

Dne 02. 09. 21 v 9:34 Jean-Michel Pollion napsal(a):

Hello,

 ?


Yes, it works also with LC_ALL=C alone (without either LANG_C or
LC_COLLATE=C), I replaced LANG=C by LC_ALL=C in the Makefile and it
worked perfectly, as expected, as LC_ALL is supposed to supersede LANG.


Upstreamed with:

https://listman.redhat.com/archives/lvm-devel/2021-September/msg1.html

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM tools segfault since 2.03.12

2021-09-02 Thread Zdenek Kabelac

Dne 02. 09. 21 v 21:10 Jean-Michel Pollion napsal(a):

Le jeudi 02 septembre 2021 à 19:21 +0200, Zdenek Kabelac a écrit :

Dne 02. 09. 21 v 17:49 Jean-Michel Pollion napsal(a):

Le jeudi 02 septembre 2021 à 16:11 +0200, Zdenek Kabelac a écrit :

Dne 02. 09. 21 v 9:34 Jean-Michel Pollion napsal(a):

Hello,

I have the lvm2 tools segfaulting since 2.03.12 with a message
of
unsorted commands in cmds.h.
It turns out that in my locale and on my setup, the LANG=C
setting
before "sort -u" in tools/Makefile.in is not enough, I had to
patch
and
add LC_COLLATE=C too, or the rules will not count the
underscore
while
sorting, causing the segfault in the code (command.c IIRC).
This broke the boot on some of my servers, so I think it's a
rather
big
problem that perhaps can't be caught in reproducible builds.
Can this be corrected upstream or should I just modify my build
environment for LVM2?


Hi

Interesting - can you send a patch to Makefile  to include
LC_COLLATE=C
in case it does fixes your problem

I fixed it with a sed, the resulting patch is attached:
--- LVM2.2.03.13-orig/tools/Makefile.in 2021-08-11
17:37:43.0
+0200
+++ LVM2.2.03.13/tools/Makefile.in  2021-09-02
17:41:42.113702990
+0200
@@ -181,7 +181,7 @@
  ( cat $(srcdir)/license.inc && \
    echo "/* Do not edit. This file is generated by the
Makefile.
*/" && \
    echo "cmd(CMD_NONE, none)" && \
- $(GREP) '^ID:' $(srcdir)/command-lines.in | LANG=C
$(SORT) -u

$(AWK) '{print "cmd(" $$2 "_CMD, " $$2 ")"}' && \

+ $(GREP) '^ID:' $(srcdir)/command-lines.in | LANG=C
LC_COLLATE=C $(SORT) -u | $(AWK) '{print "cmd(" $$2 "_CMD, " $$2
")"}'
&& \
    echo "cmd(CMD_COUNT, count)" \
  ) > $@


And does it also work if you set  LC_ALL=C   (instead of LANG=C
LC_COLLATE=C) ?

Yes, it works also with LC_ALL=C alone (without either LANG_C or
LC_COLLATE=C), I replaced LANG=C by LC_ALL=C in the Makefile and it
worked perfectly, as expected, as LC_ALL is supposed to supersede LANG.



So thanks for checking,  I'll push this fix then.


Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] The size specified by "lvcreate -L" is not accurate

2021-09-02 Thread Zdenek Kabelac

Dne 02. 09. 21 v 15:24 Gionatan Danti napsal(a):

Il 2021-09-02 05:26 Yu, Mingli ha scritto:

Per
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/doc/RelNotes/v1.46.4.txt 


[1], after e2fsprogs upgrades to 1.46.4, the defaults for mke2fs now
call for 256 byte inodes for all file systems (with the exception of
file systems for the GNU Hurd, which only supports 128 byte inodes)
and use "lvcreate -L 50 -n lv_test1 vg_test && mke2fs
/dev/vg_test/lv_test1" and then continue to check the partiontion as
below.(use lvm2 2.03.11 for the test)
 # df -h | grep dev/mapper/vg_test-lv_test1

 /dev/mapper/vg_test-lv_test1   48M   14K   46M   1% /mnt/lv-test

 Though claim 50M as above, but it turns out to be only 48M.


I think that allocation are done in multiple of physical extent size which, by 
default, is at 4 MB.
50 is not a multiple of 4 while 48 is, so "lvcreate" probably rounded down the 
required size.

Using one or more "-v" should bring progressively more details.




Hi

As correctly pointed out by Gionatan - lvm2 allocates LVs in 'extent_size' 
allocation units.


So with default 4M - you could have either 48M or 52M.

In your case - you likely get 52M (as that's how lvm2 behaves - you get at 
least specified amount of space rounded up to nearest extent -  see output of 
'lvs' command.


However your formatted extX volume also does have some internal 'filesystem' 
metadata - so  LV of size 52M gives  reduced ext4 space to 48M in your case

as most likely 'ext4' header size 'cuts' away possibly 4M alignment
used by ext4.


Anyway - you if you care about max usage of every single byte on your storage 
- you could further tune/lower lvm2 metadata size + alignment + extent_size 
and also same you could tuning you could try with ext4 settings.


This way you could take back some wasted space on alignments - which normally 
is negligible amount of disk space on modern  high capacity sized drives.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM tools segfault since 2.03.12

2021-09-02 Thread Zdenek Kabelac

Dne 02. 09. 21 v 9:34 Jean-Michel Pollion napsal(a):

Hello,

I have the lvm2 tools segfaulting since 2.03.12 with a message of
unsorted commands in cmds.h.
It turns out that in my locale and on my setup, the LANG=C setting
before "sort -u" in tools/Makefile.in is not enough, I had to patch and
add LC_COLLATE=C too, or the rules will not count the underscore while
sorting, causing the segfault in the code (command.c IIRC).
This broke the boot on some of my servers, so I think it's a rather big
problem that perhaps can't be caught in reproducible builds.
Can this be corrected upstream or should I just modify my build
environment for LVM2?




Hi

Interesting - can you send a patch to Makefile  to include LC_COLLATE=C
in case it does fixes your problem.

Although LANG=C should IMHO set all LC_XXX settings - so it can be also a bug 
in glibc process ??


What locales are used in your environment ?

Can you attache resulting sorted  cmds.h in your environment ?

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Does LVM have any plan/schedule to support btrfs in fsadm

2021-06-25 Thread Zdenek Kabelac

Dne 25. 06. 21 v 7:28 heming.z...@suse.com napsal(a):

Hello Zdenek & David,

From URL: https://fedoraproject.org/wiki/Btrfs
The btrfs becomes default filesystem for desktops.

Do we have any plan to add btrfs code for scripts/fsadm.sh?

If the answer is yes. I could share a suse special patch, this patch
had been run about 4 years in suse products.



Hi

If you have some patches provided with some good usable testing (lvm2 test 
suite)  - it could be possibly merged.


On the other hand helping/suggesting users to use Btrfs on top of lvm2 has 
also its logical 'question' marks.  Since  btrfs users should probably be 
avoiding placing 'another' layer between real hw - when btrfs should be mostly 
capable handling lvm2 features in its 'very own' way. Each layer has it's own 
measurable costs. And yeah we do not want to be involved into btrfs related 
recovery cases, as we simply never understood its handling of attached disks.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-15 Thread Zdenek Kabelac

Dne 15. 06. 21 v 19:03 David Teigland napsal(a):

On Tue, Jun 08, 2021 at 10:39:37AM -0500, David Teigland wrote:

I think it would be an improvement to:

. Make obtain_device_list_from_udev only control how we get the device
   list. Then we can easily default to 0 and readdir /dev if it's better.

. Use both native md/mpath detection *and* udev info when it's readily
   available (don't wait for it), instead of limiting ourselves to one
   source of info.  If either source indicates an md/mpath component,
   then we consider it true.

The second point means we are free to change obtain_device_list_from_udev
as we wish, without affecting md/mpath detection.  It may also improve
md/mpath detection overall.


Here are the initial patches I'm testing (libmpathvalid not yet added)
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-device-info-1

devices: rework native and udev device info

. Make the obtain_device_list_from_udev setting
   affect only the choice of readdir /dev vs libudev
   for a device name listing.  The setting no longer
   controls if udev is used for device type checks.




While in the local testing it may appear devices on laptops are always fast,
in some cases it may actually be more expensive to check physical
device instead on checking content of udev DB.

So for some users this may result in performance regression as
udevDB is in ramdisk - while there are device where it's opening
may take seconds depending on operating status (disk suspend,
disk firmware upgrade)

(one of lvmetad aspects should have been to avoid waking suspended device)


. Change obtain_device_list_from_udev default to 0.
   A list of device names is obtained from readdir /dev
   by default, which is faster than libudev (especially
   with many devices.)


So we need at least backward compatible setting for those users
where the performance impact would cause regression.


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Zdenek Kabelac

Dne 08. 06. 21 v 17:47 Martin Wilck napsal(a):

On Di, 2021-06-08 at 10:39 -0500, David Teigland wrote:

. Use both native md/mpath detection *and* udev info when it's
readily
   available (don't wait for it), instead of limiting ourselves to one
   source of info.  If either source indicates an md/mpath component,
   then we consider it true.

Hm. You can boot with "multipath=off" which udev would take into
account. What would you do in that case? Native mpath detection would
probably not figure it out.

multipath-tools itself follows the "try udev and fall back to native if
it fails" approach, which isn't always perfect, either.


A third related improvement that could follow is to add stronger
native
mpath detection, in which lvm uses uses /etc/multipath/wwids,
directly or
through a multipath library, to identify mpath components.  This
would
supplement the existing sysfs and udev sources, and address the
difficult
case where the mpath device is not yet set up.


Please don't. Use libmpathvalid if you want to improve in this area.
That's what it was made for.


Problem is addition of another dependency here.

We may probably think about using  'dlopen' and if library is present use it, 
but IMHO  libmpathvalid should be integrated into libblkid  in some way  - 
linking another library to many other projects that needs to detect MP devices 
really complicates this a lot.  libblkid should be able to decode this and 
make things much cleaner.


I'd also vote for lvm2 plugin for blkid as forking thousands of process simply 
always will take a lot of time (but this would require quite some code 
shufling withing lvm codebase).


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Zdenek Kabelac

Dne 08. 06. 21 v 15:56 Peter Rajnoha napsal(a):

On Tue 08 Jun 2021 15:46, Zdenek Kabelac wrote:

Dne 08. 06. 21 v 15:41 Peter Rajnoha napsal(a):

On Tue 08 Jun 2021 13:23, Martin Wilck wrote:

On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:

On Mon 07 Jun 2021 16:48, David Teigland wrote:

If there are say 1000 PVs already present on the system, there
could be
real savings in having one lvm command process all 1000, and then
switch
over to processing uevents for any further devices afterward.  The
switch
over would be delicate because of the obvious races involved with
new devs
appearing, but probably feasible.

Maybe to avoid the race, we could possibly write the proposed
"/run/lvm2/boot-finished" right before we initiate scanning in
"vgchange
-aay" that is a part of the lvm2-activation-net.service (the last
service to do the direct activation).

A few event-based pvscans could fire during the window between
"scan initiated phase" in lvm2-activation-net.service's
"ExecStart=vgchange -aay..."
and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
finished",
but I think still better than missing important uevents completely in
this window.

That sounds reasonable. I was thinking along similar lines. Note that
in the case where we had problems lately, all actual activation (and
slowness) happened in lvm2-activation-early.service.


Yes, I think most of the activations are covered with the first service
where most of the devices are already present, then the rest is covered
by the other two services.

Anyway, I'd still like to know why exactly
obtain_device_list_from_udev=1 is so slow. The only thing that it does
is that it calls libudev's enumeration for "block" subsystem devs. We
don't even check if the device is intialized in udev in this case if I
remember correctly, so if there's any udev processing in parallel hapenning,
it shouldn't be slowing down. BUT we're waiting for udev records to
get initialized for filtering reasons, like mpath and MD component detection.
We should probably inspect this in detail and see where the time is really
taken underneath before we do any futher changes...


This remains me - did we already fix the anoying problem of 'repeated' sleep
for every 'unfinished' udev intialization?

I believe there should be exactly one sleep try to wait for udev and if it
doesn't work - go with out.

But I've seen some trace where the sleep was repeatedly for each device were
udev was 'uninitiated'.

Clearly this doesn't fix the problem of 'unitialized udev' but at least
avoid extremely lengthy sleeping lvm command.

The sleep + iteration is still there!

The issue is that we're relying now on udev db records that contain
info about mpath and MD components - without this, the detection (and
hence filtering) could fail in certain cases. So if go without checking
udev db, that'll be a step back. As an alternative, we'd need to call
out mpath and MD directly from LVM2 if we really wanted to avoid
checking udev db (but then, we're checking the same thing that is
already checked by udev means).



Few things here: I've already seen traces where we've been waiting for udev 
basically 'endlessly' - like if sleep actually does not help at all.


So either our command holds some lock - preventing 'udev' rule to finish -  or 
some other trouble is blocking it.


My point why we should wait 'just once' is - that if the 1st. sleep didn't 
help - likely all other next sleep for other devices won't help either.


So we may like report some 'garbage' if we don't have all the info from udev 
we need to - but at least it won't take so many minutes, and in some cases the 
device isn't actually needed for successful command completiion.


But of course we should figure out why udev isn't initialized in-time.


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Zdenek Kabelac

Dne 08. 06. 21 v 15:41 Peter Rajnoha napsal(a):

On Tue 08 Jun 2021 13:23, Martin Wilck wrote:

On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:

On Mon 07 Jun 2021 16:48, David Teigland wrote:

If there are say 1000 PVs already present on the system, there
could be
real savings in having one lvm command process all 1000, and then
switch
over to processing uevents for any further devices afterward.  The
switch
over would be delicate because of the obvious races involved with
new devs
appearing, but probably feasible.

Maybe to avoid the race, we could possibly write the proposed
"/run/lvm2/boot-finished" right before we initiate scanning in
"vgchange
-aay" that is a part of the lvm2-activation-net.service (the last
service to do the direct activation).

A few event-based pvscans could fire during the window between
"scan initiated phase" in lvm2-activation-net.service's
"ExecStart=vgchange -aay..."
and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
finished",
but I think still better than missing important uevents completely in
this window.

That sounds reasonable. I was thinking along similar lines. Note that
in the case where we had problems lately, all actual activation (and
slowness) happened in lvm2-activation-early.service.


Yes, I think most of the activations are covered with the first service
where most of the devices are already present, then the rest is covered
by the other two services.

Anyway, I'd still like to know why exactly
obtain_device_list_from_udev=1 is so slow. The only thing that it does
is that it calls libudev's enumeration for "block" subsystem devs. We
don't even check if the device is intialized in udev in this case if I
remember correctly, so if there's any udev processing in parallel hapenning,
it shouldn't be slowing down. BUT we're waiting for udev records to
get initialized for filtering reasons, like mpath and MD component detection.
We should probably inspect this in detail and see where the time is really
taken underneath before we do any futher changes...



This remains me - did we already fix the anoying problem of 'repeated' sleep 
for every 'unfinished' udev intialization?


I believe there should be exactly one sleep try to wait for udev and if it 
doesn't work - go with out.


But I've seen some trace where the sleep was repeatedly for each device were 
udev was 'uninitiated'.


Clearly this doesn't fix the problem of 'unitialized udev' but at least avoid 
extremely lengthy sleeping lvm command.


Zdene


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-07 Thread Zdenek Kabelac

Dne 07. 06. 21 v 17:48 Martin Wilck napsal(a):

On So, 2021-06-06 at 14:15 +0800, heming.z...@suse.com wrote:

1. During boot phase, lvm2 automatically swithes to direct activation
mode
("event_activation = 0"). After booted, switch back to the event
activation mode.

Booting phase is a speical stage. *During boot*, we could "pretend"
that direct
activation (event_activation=0) is set, and rely on lvm2-activation-
*.service
for PV detection. Once lvm2-activation-net.service has finished, we
could
"switch on" event activation.

I like this idea. Alternatively, we could discuss disabling event
activation only in the "coldplug" phase after switching root (i.e.
between start of systemd-udev-trigger.service and lvm2-
activation.service), because that's the critical time span during which
1000s of events can happen simultaneously.



Hello


In lvm2 we never actually suggested to use 'autoactivation' during the boot - 
this case doesn't make much sense - as it's already know ahead of time which 
device ID is needed to be activated. So whoever started to use autoactivation 
during boot - did it his own distro way.   Fedora  nor RHEL uses this.  On 
second note  David is currently trying to optimize and rework Dracut's booting 
as it didn't aged quite well and there are several weak points to be fixed.




Regards


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH] [PATCH] stable-2.02 - lvresize: deny operation on swap dev without force option

2021-03-24 Thread Zdenek Kabelac



- Original Message -
From: "Zhao Heming" 
To: linux-lvm@redhat.com, zkabe...@redhat.com, teigl...@redhat.com
Cc: "Zhao Heming" 
Sent: Wednesday, March 24, 2021 6:09:09 AM
Subject: [PATCH] [PATCH] stable-2.02 - lvresize: deny operation on swap dev 
without force option

When lvmetad is active on system and some memory pages of lvmetad
swapout, user may issue lvextend/lvresize on swap LV. The resize
operation will suspend & resume swap device (by reload dm table).
After suspended swap device, lvmetad will be in UN status for waiting
swapin pages from suspended swap dev. lvmetad will hung, then
lvresize will fail to conection to lvmetad. then the resize operation
will fail, lvresize leaves suspended swap, the logic dead lock is
happending.

 
Hi

It seems there is something wrong elsewhere.
There should be no contact to lvmetad while ANY devices are in suspend state.
Maybe something got reshuffled into wrong place.

Can you please provide  'lvextend -v'  when it hangs ?

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Removing bash as a dependency

2021-02-22 Thread Zdenek Kabelac

Dne 22. 02. 21 v 2:25 Drew Westrick napsal(a):

All,

I'm working on a project with a fairly strict no GPLv3 policy. While lvm2 is 
fine since it is GPLv2, the lvm2 dependency on bash, which is GPLv3, is 
causing a problem. It appears that bash is used mostly for test scripts as 
well as lvmdump.sh  and fsadm.sh . If we 
don't need to use those two utilities, would there be any other major issues 
with removing bash a a dependency if I were to go down that path?




Hi

Distributing lvm2 without those tools IMHO doesn't make much sense.

If there is anyone to care and select pieces which are not common in shells
and write an alternative code for them for non bash shells - I'd be fine.

We could use eventually configure to adapt them for other shell types.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Zdenek Kabelac

Dne 21. 02. 21 v 21:23 Martin Wilck napsal(a):

On Fri, 2021-02-19 at 23:47 +0100, Zdenek Kabelac wrote:


Right time is when switch is finished and we have rootfs with /usr
available - should be ensured by  lvm2-monitor.service and it
dependencies.


While we're at it - I'm wondering why dmeventd is started so early. dm-
event.service on recent installments has only "Requires=dm-
event.socket", so it'll be started almost immediately after switching
root. In particular, it doesn't wait for any sort of device
initialization or udev initialization.


Hi

Dmeventd alone does not depend on lvm2 in any way - it's the lvm2 plugin which 
then does all the 'scanning' for VGs/LVs and gets loaded when lvm2 connects to 
monitoring socket. That's also why dmeventd belongs to dm subsystem.


Dmeventd is nothing else then a process to check DM devices periodically - and 
can be used by i.e. dmraid or others...


So as such it doesn't need any devices - but it needs to be initialized early 
so it can accept connections from tools like lvm2 and starts to monitor a 
device without delaying command (as lvm2 wait for confirmation device is 
monitored).



I've gone through the various tasks that dmeventd is responsible for,
and I couldn't see anything that'd be strictly necessary during early
boot. I may be overlooking something of course. Couldn't the monitoring


As said - during ramdisk boot - monitor shall not be used (AFAIK - dracut is 
supposed to use disabled monitoring in it's modified copy of lvm.conf within 
ramdisk)


But we want to switch to monitoring ASAP when we switch to rootfs - so the 
'unmonitored' window is as small as possible - there are still same 'grey' 
areas in the correct logic thought...


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread Zdenek Kabelac

Dne 19. 02. 21 v 17:37 David Teigland napsal(a):

On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:

Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
autoactivation.
Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
No such file or directory


What's going on here? pvscan trying to start dmeventd ? Why ? There's a
dedicated service for starting dmeventd (lvm2-monitor.service). I can
see that running dmeventd makes sense as you have thin pools, but I'm
at a loss why it has to be started at that early stage during boot
already.

This is a curious message, it looks as if pvscan was running from an
environment (initramfs??) where dmeventd wasn't available. The message
is repeated, and after that, pvscan appears to hang...


I've found that when pvscan activates a VG, there's a bit of code that
attempts to monitor any LVs that are already active in the VG.  Monitoring
means interacting with dmeventd.  I don't know why it's doing that, it
seems strange, but the logic around monitoring in lvm seems ad hoc and in
need of serious reworking.  In this case I'm guessing there's already an
LV active in "sys", perhaps from direct activation in initrd, and when
pvscan activates that VG it attempts to monitor the already active LV.


The existing design for lvm2 rootfs using was like:

Activate 'LV' within ramdisk by dracut - which discovers rootfs VG/LV
and activates it (by rather 'brute-force' naive approach).

Such activation is WITHOUT monitoring - as ramdisk is without 'dmeventd'
and we do not want to 'lock' the binary from ramdisk into memory.

So once the system switches to rootfs - 'vgchange --monitor y' enables 
monitoring for all activated LVs from ramdisk and process continues.


Event based activation within ramdisk is a 3rd. party initiative by Arch linux 
and thus needs to be 'reinvented' with its own problems that arise from this.


So far - in lvm2 the current dracut method is more maintainable.


Another missing piece in lvm monitoring is that we don't have a way to
start lvm2-monitor/dmeventd at the right time (I'm not sure anyone even
knows when the right time is), so we get random behavior depending on if
it's running or not at a given point.  In this case, it looks like it
happens to not be running yet.  I sometimes suggest disabling lvm2-monitor
and starting it manually once the system is up, to avoid having it
interfere during startup.


Right time is when switch is finished and we have rootfs with /usr
available - should be ensured by  lvm2-monitor.service and it dependencies.

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] discuss: about master branch vgcreate parameter "--clustered"

2020-12-29 Thread Zdenek Kabelac

Dne 28. 12. 20 v 9:35 heming.z...@suse.com napsal(a):

Hello,

On master branch, clvmd had been removed, the "-c | --clustered" is deprecated.

```
~> sudo vgcreate -c n vg1 /dev/sda
The clustered option is deprecated, see --shared.
Run `vgcreate --help' for more information.
~> echo $?
3
```

It looks the -c is useless, it only shows deprecated info and exit.
Is it possible to remove the -c from lvm code?




Command keeps this option for backward compatibility.
So when the option is no longer supported - it should print the 'old usage'
was replaced with something else.

I don't quite understand why would you want to remove this logic
since we would report 'Invalid argument' anyway?

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] cache_check --clear-needs-check-flag does not clear needs_check flag?

2020-12-09 Thread Zdenek Kabelac

Dne 06. 12. 20 v 22:01 Dennis Schridde napsal(a):

Hello!

A cached logical volume of mine cannot be activated anymore:

$ sudo vgchange -ay
device-mapper: reload ioctl on   (253:6) failed: Invalid argument
0 logical volume(s) in volume group "vg_ernie" now active


dmesg logs:

device-mapper: cache: 253:6: unable to switch cache to write mode until
repaired.
device-mapper: cache: 253:6: switching cache to read-only mode
device-mapper: table: 253:6: cache: Unable to get write access to metadata,
please check/repair metadata.
device-mapper: ioctl: error adding target to table


The code in question seems to be: https://github.com/torvalds/linux/blob/v5.8/
drivers/md/dm-cache-target.c#L957-L964


Hence I set out to check the cache and, if it is clean, clear the needs_check
flag:

$ sudo lvchange -ay vg_ernie/lv_cache
Do you want to activate component LV in read-only mode? [y/n]: y
Allowing activation of component LV.


As said - by component activation you will get 'read-only' volume
thus you cannot do 'in-place' changes this way.




A bit puzzling is that the status of the needs_check flag appears to be
"unknown":

$ sudo lvs -a -o +lv_check_needed
LVVG  Attr  LSizePool
Origin  Data%   Meta%   Move Log Cpy%Sync Convert CheckNeeded
[lv_cache]vg_ernie CRi-a-C--- 232.88g
unknown
lv_system vg_ernie Cwi---C---   <1.82t [lv_cache]
[lv_system_corig]
unknown
[lv_system_corig] vg_ernie owi---C---   <1.82t
unknown


The live system I am running these commands from is a Fedora 33:

$ uname -a
Linux localhost-live 5.8.15-301.fc33.x86_64 #1 SMP Thu Oct 15 16:58:06 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux

$ sudo lvm version
LVM version:   2.03.10(2) (2020-08-09)
Library version: 1.02.173 (2020-08-09)
Driver version:   4.42.0


It seems you are using  'cvol' instead of 'cpool' solution which has been 
evolving and you will need a newer lvm2 version.


If you want to using caching with the version of lvm2 you have - you will
need to use cpools  (which are a bit faster anyway).

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Is TRIM and DISCARD needed for normal HDD ?

2020-11-26 Thread Zdenek Kabelac

Dne 25. 11. 20 v 16:37 Sreyan Chakravarty napsal(a):

Hi,

I am using thin LVM pools, but I have a normal hard disk and not a SSD.

Is there any reason to enable TRIM and/or DISCARD for my HDD ?

I have heard it is only useful for a SSD. Will it offer any advantages in my 
case ?




Hi

Thin-pool is created by default with 'passdown' TRIM/discard support.

This means - discard to thin LV  (i.e. fstrim of ext4 on thinLV) gets 
propagated to thin-pool, where it may deallocate full chunk when possible 
(i.e. if you use  256K chunk, WHOLE chunk must be free to have an effective 
discard).


With "passdown" mode - such released chunks are also then passed through to 
origin _tdata device - where again some 'alignment rules' of discardable 
regions applie (i.e. lot of SSD need 512KiB blocks).


If you have HDD - then clearly such discard stops at thin-pool level 
(automatically) and just releases chunks in thin-pool for future reuse.


"ignore" discard mode is usefull in the case you want to keep already 
'allocated' chunks for thin LV always there - and also in some case it may 
make timing more predictible - as discard requires processing - so it ma 
slowdown few things - but at the expense of more filled thin-pool


Hopefully this makes it clear.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [PATCH] man: document that 2 times poolmetadatasize is allocated for lvmthin

2020-10-28 Thread Zdenek Kabelac

Dne 20. 10. 20 v 21:19 Scott Moser napsal(a):

diff --git a/man/lvmthin.7_main b/man/lvmthin.7_main
index ce2343183..e21e516c9 100644
--- a/man/lvmthin.7_main
+++ b/man/lvmthin.7_main
@@ -1101,6 +1101,11 @@ specified with the --poolmetadatasize option.
When this option is not
  given, LVM automatically chooses a size based on the data size and chunk
  size.

+The space allocated by creation of a thinpool will include
+a spare metadata LV by default (see "Spare metadata LV").  As a result,
+the VG must have enough space for the --size option plus twice
+the specified (or calculated) --poolmetadata value.




Hi

The patch is unfortunately not really precise.

This applies only when you allocate first pool type volume in a VG.

IMHO better is to refer directly into 'lvmthin' man page for closer detail
if user is interested.

Majority of users simply does not care about negligible size of metadata
volume, so the information is rather for very experienced users and
should be in some dedicated section for them.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Why does thinpool take 2*poolmetadatasize space?

2020-10-20 Thread Zdenek Kabelac

Dne 20. 10. 20 v 18:33 Scott Moser napsal(a):

When I create an lvmthinpool with size S and poolmetadatasize P,
it reduces the available freespace by S+2P. I expected that to
be S+P. Where did the extra poolmetadatasize get used?

See below for example.
before lvcreate we had 255868 free, after we had 254588.
The difference is 1280.  (1024 + 2*128).




lvm2 preallocated hidden _pmspare volume which has the size of
biggest metadata LV in a VG.

Such LV is used for automated 'lvconvert --repair'.

If you don't want this LV to be created you can add

--poolmetadataspare y|n

but in this case you are fully in your own to figure out,
where to take the space for repaired metadata.

You can also at any time lvremove _pmspare LV if necessary.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM PV UUID problem

2020-10-09 Thread Zdenek Kabelac

Dne 09. 10. 20 v 15:12 Digimer napsal(a):

Hi all,

   I'm storing LVM information in a postgres database, and wanted to use
the UUID from the PVs / VGs / LVs as the UUIDs in the database. I
noticed when I tried to do this that postgres complained that the UUID
was not valid. I checked with an online UUID validator
(https://www.freecodeformat.com/validate-uuid-guid.php) and it also
reported as invalid.

Example;


# pvdisplay | grep UUID
   PV UUID   jLkli2-dEXx-5Y8n-pYlw-nCcy-9dFL-3B6jU3


   Is this a known issue?



Hi

At the time of lvm2 devel I believe UUID was just a unique identifier,
later some effort to standardize it came in.

But really you should NOT be using basically internal unique identifiers
in your DB - this are internal to DM/LVM work and might be changed at any time 
to something else.


User is supposed to use  'vgname' & 'lvname'  - so there you can put those
valid UUID sequences - although human readable strings are always nicer ;)

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] thin: pool target too small

2020-09-29 Thread Zdenek Kabelac

Dne 29. 09. 20 v 16:33 Duncan Townsend napsal(a):
On Sat, Sep 26, 2020, 8:30 AM Duncan Townsend > wrote:



 > > There were further error messages as further snapshots were attempted,

 > > but I was unable to capture them as my system went down. Upon reboot,
 > > the "transaction_id" message that I referred to in my previous message
 > > was repeated (but with increased transaction IDs).
 >
 > For better fix it would need to be better understood what has happened
 > in parallel while 'lvm' inside dmeventd was resizing pool data.



So the lvm2 has been fixed upstream to report more educative messages to
the user - although it still does require some experience in managing
thin-pool kernel metadata and lvm2 metadata.


To the best of my knowledge, no other LVM operations were in flight at
the time. The script that I use issues LVM commands strictly


In your case - dmeventd did 'unlocked' resize - while other command
was taking a snapshot - and it happened the sequence with 'snapshot' has
won - so until the reload of thin-pool - lvm2 has not spotted difference.
(which is simply a bad race cause due to badly working locking on your system)



Would it be reasonable to use vgcfgrestore again on the
manually-repaired metadata I used before? I'm not entirely sure what


You will need to vgcfgrestore - but I think you've misused my passed recoverd 
piece, where I've specifically asked to only replace specific segments of 
resized thin-pool within your latest VG metadata - since those likely have

all the proper mappings to thin LVs.

While you have taken the metadata from 'resize' moment - you've lost all
the thinLV lvm2 metadata for later created one.

I'll try to make one for you.


to look for while editing the XML from thin_dump, and I would very
much like to avoid causing further damage to my system. (Also, FWIW,
thin_dump appears to segfault when run with musl-libc instead of


Well - lvm2 is glibc oriented project - so users of those 'esoteric'
distribution need to be expert on its own.

If you can provide coredump or even better patch for crash - we might
replace the code with something better usable - but there is zero testing
with anything else then glibc...


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvresize cannot refresh LV size on on other hosts when extending LV with a shared lock

2020-09-29 Thread Zdenek Kabelac

Dne 29. 09. 20 v 10:46 Gang He napsal(a):

Hello List,

I am using lvm2 v2.03.10(or v2.03.05), I setup a lvm2-lockd based (three nodes) 
cluster.
I created PV, VG and LV, formated LV with a cluster file system (e.g. ocfs2).
So far, all the things work well, I can write the files from each node.
Next, I extended the online LV from node1, e.g.
ghe-tw-nd1# lvresize -L+1024M vg1/lv1
   WARNING: extending LV with a shared lock, other hosts may require LV refresh.
   Size of logical volume vg1/lv1 changed from 13.00 GiB (3328 extents) to 
14.00 GiB (3584 extents).
   Logical volume vg1/lv1 successfully resized.
   Refreshing LV /dev//vg1/lv1 on other hosts...

But, the other nodes cannot aware this LV size was changed, e.g.



lvmlockd  does not care about state of LV.

This used to be achieved through clvmd code - but such code is no longer 
available with 2.03 branch.


The assumed solution is - the user is supposed to write an engine on top of 
lvm2 using some 'clustering' solution  and and orchestrate this work himself

(i.e. in this case run  'lvchange --refresh' on other hosts himself
somehow in-sync)


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] thin: pool target too small

2020-09-24 Thread Zdenek Kabelac

Dne 23. 09. 20 v 21:54 Duncan Townsend napsal(a):

On Wed, Sep 23, 2020 at 2:49 PM Zdenek Kabelac  wrote:


Dne 23. 09. 20 v 20:13 Duncan Townsend napsal(a):

On Tue, Sep 22, 2020, 5:02 PM Zdenek Kabelac -tpool transaction_id (MAJOR:MINOR)
transaction_id is XXX, while expected YYY.

Set the transaction_id to the right number in the ASCII lvm2 metadata file.


I apologize, but I am back with a related, similar problem. After
editing the metadata file and replacing the transaction number, my
system became serviceable again. After making absolutely sure that
dmeventd was running correctly, my next order of business was to
finish backing up before any other tragedy happens. Unfortunately,
taking a snapshot as part of the backup process has once again brought
my system to its knees. The first error message I saw was:


Hi

And now you've hit an interesting bug inside lvm2 code - I've opened new BZ

https://bugzilla.redhat.com/show_bug.cgi?id=1882483

This actually explains few so far not well understood problems I've
seen before without good explanation how to hit them.


   WARNING: Sum of all thin volume sizes (XXX TiB) exceeds the size of
thin pool / and the size of whole volume group (YYY
TiB).
   device-mapper: message ioctl on  (MAJOR:MINOR) failed: File exists
   Failed to process thin pool message "create_snap 11 4".
   Failed to suspend thin snapshot origin /.
   Internal error: Writing metadata in critical section.
   Releasing activation in critical section.
   libdevmapper exiting with 1 device(s) still suspended.


So I've now quite simple reproducer for unhanded error case.
It's basically exposing mismatch between kernel (_tmeta) and lvm2
metadata content.  And lvm2 can handle this discovery better
than what you see now,


There were further error messages as further snapshots were attempted,
but I was unable to capture them as my system went down. Upon reboot,
the "transaction_id" message that I referred to in my previous message
was repeated (but with increased transaction IDs).


For better fix it would need to be better understood what has happened
in parallel while 'lvm' inside dmeventd was resizing pool data.

It looks like the 'other' lvm managed to create another snapshot
(and thus the DeviceID appeared to already exists - while it should not
according to lvm2 metadata before it hit problem with mismatch of
transaction_id.


I will reply privately with my lvm metadata archive and with my
header. My profuse thanks, again, for assisting me getting my system
back up and running.


So the valid fix would be to take 'thin_dump' of kernel metadata
(aka content of _tmeta device)
Then check what you have in lvm2 metadata and likely you will
find some device in kernel - for which you don't have match
in lvm2 metadata -  these devices would need to be copied
from your other sequence of lvm2 metadata.

Other maybe more simple way could be to just remove devices
from xml thin_dump and thin_restore those metadata that should should
now match lvm2.

The last issue is then to match 'transaction_id' with the number
stored in kernel metadata.

So not sure which way you want to go and how important those
snapshot (that could be dropped) are ?

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] thin: pool target too small

2020-09-23 Thread Zdenek Kabelac

Dne 23. 09. 20 v 20:13 Duncan Townsend napsal(a):
On Tue, Sep 22, 2020, 5:02 PM Zdenek Kabelac 
dmeventd does write its PID file into the correct directory in the 
post-initramfs root, so whatever's happening is some weird hybrid. I'll debug 
this further with my distro.


So I think to prevent repeated occurrence of this problem - you'll need
to ensure your system-booting will follow the pattern from distros
like Fedora.


I think for now, the easiest solution may be to try to stop dmeventd from 
being started by dracut.


Basically all you need to do for dracut (with reagards to dmeventd) is to 
setup inside dracut environemnt  'monitoring=0'  in /etc/lvm/lvm.conf there.

(so when it's copied system's lvm.conf there - replace with sed/awk...)

Also there is   'metadata_read_only=1' setting that can be useful for
dracut environment.

Dracut needs some bigger fixing on its own - but ATM we simply can't
provide set of features we would like to have.

I have encountered a further problem in the process of restoring my thin pool 
to a working state. After using vgcfgrestore to fix the mismatching metadata 
using the file Zdenek kindly provided privately, when I try to activate my 
thin LVs, I'm now getting the error message:


Thin pool -tpool transaction_id (MAJOR:MINOR) 
transaction_id is XXX, while expected YYY.

Set the transaction_id to the right number in the ASCII lvm2 metadata file.

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] thin: pool target too small

2020-09-22 Thread Zdenek Kabelac

Dne 21. 09. 20 v 15:47 Duncan Townsend napsal(a):

On Mon, Sep 21, 2020 at 5:23 AM Zdenek Kabelac  wrote:


Dne 21. 09. 20 v 1:48 Duncan Townsend napsal(a):

Hello!


Ahh, thank you for the reminder. My apologies for not including this
in my original message. I use Void Linux on aarch64-musl:


I had a problem with a runit script that caused my dmeventd to be
killed and restarted every 5 seconds. The script has been fixed, but


Kill dmeventd is always BAD plan.
Either you do not want monitoring (set to 0 in lvm.conf) - or
leave it to it jobs - kill dmeventd in the middle of its work
isn't going to end well...)


Thank you for reinforcing this. My runit script was fighting with
dracut in my initramfs. My runit script saw that there was a dmeventd
not under its control, and so tried to kill the one started by dracut.
I've gone and disabled the runit script and replaced it with a stub
that simply tried to kill the dracut-started dmeventd when it receives
a signal.


Ok - so from looking at the resulting 'mixture' of metadata you
have in your archive and what physically present on PV header are
and now noticing this is some 'not-so-standard' distro - I'm starting to 
suspect the reason of all these troubles.


Withing 'dracut' you shouldn't be firering dmeventd at all - monitoring
should be enabled  (by vgchange --monitor y) once you switch to your rootfs
so dmenventd is execute from your rootfs!

By letting 'dmeventd' running in your 'dracut world' - it lives in its
own environment and likely its very own locking dir.

That makes it very easy your dmeventd runs in parallel with your other lvm2 
commands - and since this way it's completely unprotected

(as file-locking is what it is) - as the resize is  operation for several
seconds it has happened your 'admins' command replaced whatever dmeventd was
doing.

So I think to prevent repeated occurrence of this problem - you'll need
to ensure your system-booting will follow the pattern from distros
like Fedora.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-22 Thread Zdenek Kabelac

Dne 22. 09. 20 v 12:38 nl6720 napsal(a):

On Tuesday, 22 September 2020 13:15:19 EEST Zdenek Kabelac wrote:

Dne 22. 09. 20 v 11:14 nl6720 napsal(a):

On Monday, 21 September 2020 21:51:39 EEST Zdenek Kabelac wrote:

Dne 21. 09. 20 v 16:14 nl6720 napsal(a):

Hi,

thin-pool is using LVs  - so this is again about handling the discard
on a _tdata LV and it is completely unrelated to issue_discards
setting.


from lvmthin(7):
"passdown: Process discards in the thin pool (as with nopassdown), and
pass the discards down the the underlying device.  This is the default
mode."

It's the "underlying device" that's confusing me.


All it means is -  _tdata LV is placed on some PV/device.

So the passdown means - TRIM goes through thinLV to thin-pool and then through
its _tdata LV it lands on 'underlying device' aka PV used for this LV.

If there is better and less confusing way how to describe this case,
feel free to submit better wording.

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-22 Thread Zdenek Kabelac

Dne 22. 09. 20 v 11:14 nl6720 napsal(a):

On Monday, 21 September 2020 21:51:39 EEST Zdenek Kabelac wrote:

Dne 21. 09. 20 v 16:14 nl6720 napsal(a):

Hi,

I wanted to know why the "issue_discards" setting isn't enabled by
default. Are there any dangers in enabling it or if not is there a
chance of getting the default changed?

Also it's not entirely clear to me if/how "issue_discards" affects
thin pool discard passdown.


Hi

Have you checked it's enclosed documentation in within
/etc/lvm/lvm.conf ?

issue_discards is PURELY & ONLY related to sending discard to removed
disk extents/areas after 'lvremove'.

It is't not in ANY way related to actual discard handling of the LV
itself. So if you have LV on SSD it is automatically processing
discards. From the same reason it's unrelated to discard processing
of thin-pools.

And finally why we prefer issue_discards to be disabled (=0) by
default. It's very simple - with lvm2 we try (when we can) to support
one-command-back restore - so if you do 'lvremove' - you can use
vgcfgrestore to restore previous metadata and you have your LV back
with all the data inside.

When you have issue_discards=1  - the device gets TRIM - so all the
data are discarded at device level - so when you try to restore your
previous metadata - well it's nice - but content is gone forever

If user can live with this 'risk' and prefers immediate discard -
perfectly fine - but it should be (IMHO) admin's  decision.

Regards

Zdenek



Thanks for your answer, so the reason it's not enabled by default is to
allow vgcfgrestore to function.

I have read /etc/lvm/lvm.conf and understand that issue_discards affects
things like lvremove. But I'd like to know, is it only for lvremove or
also lvreduce and lvconvert (--merge/--uncache)? And what is its


There is currently one known exception - pvmove - which is not trivial to 
resolve.  All other 'removals' go through standard extent release and

can be discarded when wanted (unless we missed some other use-case).


relation to thin_pool_discards; with issue_discards = 0 and
thin_pool_discards = passdown (both the defaults) how far down are the
discards passed?


thin-pool is using LVs  - so this is again about handling the discard on a 
_tdata LV and it is completely unrelated to issue_discards setting.




Lastly, there's no fstrim equivalent for trimming unused space in a PV,
right? To do that I'd need to lvcreate a LV occupying all free space and
then use `lvremove --config devices/issue_discards = 1`.


Well there is one easily 'scriptable'

You can simply allocate the free space in your VG (lvcreate -l100%FREE)
and then simply use  'blkdiscard /dev/vg/my_discardable_lv'
Once finished - release the LV.

We may eventually introduce some 'pollable' support as some discards can take 
extremely long time depending on type of a device.

However at this moment this is not really seen as priority...

Regards

Zdenek





___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-22 Thread Zdenek Kabelac

Dne 21. 09. 20 v 16:26 Mark Mielke napsal(a):
On Mon, Sep 21, 2020 at 10:14 AM nl6720 > wrote:


I wanted to know why the "issue_discards" setting isn't enabled by
default. Are there any dangers in enabling it or if not is there a
chance of getting the default changed?

Also it's not entirely clear to me if/how "issue_discards" affects thin
pool discard passdown.


Historically, there have been dangers. Even today, there might still be 
dangers - although, I believe Linux (and other OS) may disable the feature in 
hardware which is known to behave improperly.


If you do research and ensure you are using a good storage drive, there should 
not be any problems. I enable issue_discards on all systems I work with at 
home and work, and have not encountered any problems. But, I also don't buy 
cheap drives with questionable namebrands.


It's pretty common for settings such as these to be more conservative, to 
ensure that the users who are willing to accept the risk (no matter how small) 
can turn it on as an option, and the users who are unaware or might not have 
evaluated the risk, cannot blame the software vendors for losing their data. 
In the case of LVM - it's not LVM's fault that some drives might lose your 
data when discard is sent. But, users of LVM might blame LVM.


Hi

So let's repeat again -  issue_discards  has nothing to do with discard 
handling of the LV itself.


If the LV is sitting on top of SSD - it will receive discard/TRIM no matter
what is set in this issue_discard setting - it could be  0 or 1 and LV will be 
discarded.


Anyone can very easily try - create some small simple LV on top SSD - or 
thinLV then create i.e.  mkfs.ext4 and use 'fstrim -v'.

With discardable device you will get nice print about discarded region -
then play with issue_discards setting - there will be no change...

Setting issue_discards only matters when you run 'lvremove' and your VG is
maybe on some 'provisioned' storage and you pay for used space - in this case 
released extents in VG will receive TRIM - so VG will eat less physical

space on such storage.

So keeping this setting 0 is perfectly fine and allows users to revert many 
lvm2 operation if they've made a mistake.


When setting is 1 - most of lvm2 commands have NO way back - once done, all 
data is gove - which can be quite fatal if you have i.e. very similar LV names...


No distribution should be setting issue_discards to '1' by default - it should 
be always changed by admin so he is aware of consequences.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Why isn't issue_discards enabled by default?

2020-09-21 Thread Zdenek Kabelac

Dne 21. 09. 20 v 16:14 nl6720 napsal(a):

Hi,

I wanted to know why the "issue_discards" setting isn't enabled by
default. Are there any dangers in enabling it or if not is there a
chance of getting the default changed?

Also it's not entirely clear to me if/how "issue_discards" affects thin
pool discard passdown.


Hi

Have you checked it's enclosed documentation in within /etc/lvm/lvm.conf ?

issue_discards is PURELY & ONLY related to sending discard to removed disk 
extents/areas after 'lvremove'.


It is't not in ANY way related to actual discard handling of the LV itself. So 
if you have LV on SSD it is automatically processing discards. From the same 
reason it's unrelated to discard processing of thin-pools.


And finally why we prefer issue_discards to be disabled (=0) by default. It's 
very simple - with lvm2 we try (when we can) to support one-command-back 
restore - so if you do 'lvremove' - you can use vgcfgrestore to restore 
previous metadata and you have your LV back with all the data inside.


When you have issue_discards=1  - the device gets TRIM - so all the data
are discarded at device level - so when you try to restore your
previous metadata - well it's nice - but content is gone forever

If user can live with this 'risk' and prefers immediate discard - perfectly 
fine - but it should be (IMHO) admin's  decision.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



  1   2   3   4   >