from:"Zdenek Kabelac"

Re: [linux-lvm] Running lvm2 regression tests

2016-06-10 Thread Zdenek Kabelac


Dne 10.6.2016 v 05:08 Bruce Dubbs napsal(a):

I am having a problem with building LVM2.2.02.155 from source.

Actually, the build is OK:

./configure --prefix=/usr   \
--exec-prefix=  \
--enable-applib \
--enable-cmdlib \
--enable-pkgconfig  \
--enable-udev_sync

make
sudo make install

My problem comes when I try to run some checks:

sudo make -k check_local

In this case every test times out.  I killed the tests after two hours. The
journal shows things like:

ndev-vanilla:api/dbustest.sh started
ndev-vanilla:api/dbustest.sh timeout



dbus support is highly experimental and likely full of bugs.
It's not even supported - it's purely  proof-of-concept.

For now  use  'make check_local S=dbustest'  to skip this one...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] corruption on reattaching cache

2016-06-10 Thread Zdenek Kabelac


Dne 9.6.2016 v 22:24 Markus Mikkolainen napsal(a):

I seem to have hit the same snag as Mark describes in his post.

https://www.redhat.com/archives/linux-lvm/2015-April/msg00025.html

with kernel 4.4.6 I detached (--splitcache) a writeback cache from a mounted
lv which was then synchronized and detached. Then I reattached it and shortly
detached it again. What was interesting is that after the second detach it
synchronized AGAIN starting from 100% , and then I started getting filesystem
errors. I immediately shutdown, and forced an fsck , and didnt lose that much
data, but still had some stuff to correct.

It looked to me like a detached cache, being reattached will retain all cached
data on it, even though it was supposed to be written to the backing disk, and
then instead of marking it clean on attaching, it will continue serving old
data from the cache.




Yes - known issue,  --splitcache is rather for 'debugging' purposes.
Use --uncache  and create new cache when needed.

Splitted cache needs to be cleared on reattachment - but that needs further 
code rework.


The idea behind is - we want to support 'offline' writeback of data as ATM
cache target doesn't work well if there is any disk error - i.e. cache is in 
writeback mode and has 'error' sector - you can't clean such cache...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM says physical volumes are missing, but they are not

2016-05-31 Thread Zdenek Kabelac


Dne 30.5.2016 v 21:53 Phillip Susi napsal(a):

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 05/30/2016 04:15 AM, Zdenek Kabelac wrote:

Hi

Please provide full 'vgchage -ay -' trace of such activation
command.

Also specify which version of lvm2 is in use by your distro.


2.02.133-1ubuntu10.  I ended up fixing it by doing a vgcfgbackup,
manually editing the text file to remove the MISSING flags, and then


Hi

Instead of doing manual 'edit' you should have used:

vgextend --restoremissing  /dev/your_restored_pv

If you do a manual edit - there is quite large probability the restored
device will cause data inconsistencies for raid/mirrors


(And complementary command if you would have wanted to drop a missing PV:
vgreduce --removemissing)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] thin: 235:2 pool target (735616 blocks) too small: expected 809216

2016-05-31 Thread Zdenek Kabelac


Dne 31.5.2016 v 12:57 Brian J. Murrell napsal(a):

I have a Fedora 23 system running (presumably, since I can't boot it to
be 100% sure) 2.02.132 of LVM.  It has ceased to boot and reports:

device-mapper: resume ioctl on (253:2) failed: Invalid argument
Unable to resume laptop-pool00-tpool (253:2)
thin: Data device (dm-1) dicard unsupported: Disabling discard passdown.
thin: 235:2 pool target (735616 blocks) too small: expected 809216
table: 253:2: thin-pool preresume failed, error = -22

Any ideas what the problem could be?  I can't imagine why all of a
sudden the pool target would be smaller than it should be.

What further information can I provide to help debug (and very
hopefully repair) this system?


Hi

Well - could you post somewhere for download 1st. MB of your
PV device?

dd if=/dev/sdX of=/tmp/upload_me bs=1M count=1

Have you already tried to restore in some way ?

Regards


Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] what creates the symlinks in /dev/ ?

2016-06-24 Thread Zdenek Kabelac


Dne 23.6.2016 v 20:02 Chris Friesen napsal(a):

On 06/23/2016 11:21 AM, Zdenek Kabelac wrote:

Dne 23.6.2016 v 18:35 Chris Friesen napsal(a):



[root@centos7 centos]# vgscan --mknodes
  Configuration setting "snapshot_autoextend_percent" invalid. It's not part
of any section.
  Configuration setting "snapshot_autoextend_threshold" invalid. It's not part
of any section.


fix your lvm.conf   (uncomment sections)


  Reading all physical volumes.  This may take a while...
  Found volume group "chris-volumes" using metadata type lvm2
  Found volume group "centos" using metadata type lvm2
  Found volume group "cinder-volumes" using metadata type lvm2
  The link /dev/chris-volumes/chris-volumes-pool should have been created by


Ok - there seems to be internal bug in lvm2 - which incorrectly hints
link creation for this case.

There should not have been  /dev/vg/pool link - this is correctly marked
for udev - but incorrectly for udev validation.

However the bug is actually not so much important - it just links
to 'wrapper' device - and eventually we will resolve the problem even without
this extra device in table.


The problem that it causes for me is that when I run "vgchange -an
chris-volumes" it leaves the /dev/chris-volumes with a broken symlink in it
because udev doesn't remove the symlink added by vgscan.


Yep - as said - in normal circumstance  use should NOT run 'vgmknodes'
as the created links will not be known/visible to udev  - so this behavior is 
correct.



This causes the LVM OCF script in the "resource-agents" package to break,
because it is using the existance of the /dev/vg directory as a proxy for
whether the volume group is active (or really as you said earlier, whether
there are active volumes within the volume group).

I reported this as a bug to the "resource-agents" package developers, and they
said that they can't actually call lvm commands in their "status" routines
because there have been cases where clustered LVM hung when querying status,
causing the OCF script to hang and monitoring to fail.

Ultimately I'll see if I can work around it by not calling "vgscan --mknodes".


yes please,  start with this one...

vgmknodes are really supposed to be used on some unique urgent problem - not 
executed in script every hour...


But yes - lvm2 will need to fix link creation of pool-in-use...


Originally it was added in to fix some problems, but that was a while back so
things may behave properly now.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] what creates the symlinks in /dev/ ?

2016-06-23 Thread Zdenek Kabelac


Dne 23.6.2016 v 18:35 Chris Friesen napsal(a):

On 06/23/2016 02:34 AM, Zdenek Kabelac wrote:

Dne 22.6.2016 v 16:52 Chris Friesen napsal(a):

On 06/22/2016 03:23 AM, Zdenek Kabelac wrote:

Dne 21.6.2016 v 17:22 Chris Friesen napsal(a):

I'm using the stock CentOS7 version, I think.

  LVM version: 2.02.130(2)-RHEL7 (2015-12-01)
  Library version: 1.02.107-RHEL7 (2015-12-01)
  Driver version:  4.33.0

So are you saying that nobody should run "vgscan --mknodes" on a system
where
udev is managing the symlinks?


Yes - on such system this command should be used only in case of 'emergency',
udev doesn't work properly and you need links.

The links however will not be known to udev and likely whole system is
going to be crashing soon or is misconfigured in major way.


Okay, I'll see if I can get the call to vgscan removed.  But even so wouldn't
it make sense to have vgscan use the same logic as udev in terms of what
symlinks to make and where to make them to?


It *IS* using same logic.

If the link is not there - the bug is in your udev rules.

When udev is properly configured, vgscan should not show missing link.


It doesn't seem to work this way in practice on a stock CentOS system.  Here's
the sequence:

1) create a volume group:
"vgcreate chris-volumes /dev/loop2"
At this point there is no /dev/chris-volumes directory.



ONLY  active volume do have links

Never ever there is supposed to be directory entry for VG without any active LV.

There is not a term 'active VG' as such - it always is in connection with 
active LV - thus directory without any active LV inside is pure bug - if you 
see it - report it as regular BZ..





2) Create a thin pool in the volume group:
"lvcreate -L 1.8GB -T chris-volumes/chris-volumes-pool"

Now udev creates a /dev/chris-volumes directory with a link for the thin pool:
[root@centos7 centos]# ls -l /dev/chris-volumes
total 0
lrwxrwxrwx. 1 root root 7 Jun 23 12:22 chris-volumes-pool -> ../dm-9

3) Create a thin volume in the thin pool:
"lvcreate -V1G -T chris-volumes/chris-volumes-pool -n thinvolume"

Now the link for the thin pool itself has disappeared:
[root@centos7 centos]# ls -l /dev/chris-volumes
total 0
lrwxrwxrwx. 1 root root 8 Jun 23 12:23 thinvolume -> ../dm-11

(At this point /dev/mapper/chris--volumes-chris--volumes--pool-tpool points to
dm-9 and /dev/mapper/chris--volumes-chris--volumes--pool points to dm-10.)


correct


4) If I run "vgscan --mknodes", it re-creates the thin pool link, but pointing
to the /dev/mapper name instead of directly to the /dev/dm-*.  Also, it's
indirectly pointing to /dev/dm-10 where before it was pointing to /dev/dm-9:


Only 'final' resolving device does matter.

if link is /dev/vg/lv  ->   /dev/mapper/vg-lv   ->  /dev/dm-XXX
or   /dev/vg/lv  ->  /dev/dm-XXX  - it does not matter.

There are some 'tricks' related to thin-pool maintainance.

Unused  thin-pool is 'public' LV - has  /dev/vg/lv   link.
Used thin-pool by lvm2 is 'private'  LV -   doesn't have  /dev/vg/lv  link.

All device should always have entry in  /dev/mapper/ - either links
to real devices or direct nodes (on older systems)

lvm2 users are always supposed to use ONLY /dev/vg/lv  devices for access.






[root@centos7 centos]# vgscan --mknodes
  Configuration setting "snapshot_autoextend_percent" invalid. It's not part
of any section.
  Configuration setting "snapshot_autoextend_threshold" invalid. It's not part
of any section.


fix your lvm.conf   (uncomment sections)


  Reading all physical volumes.  This may take a while...
  Found volume group "chris-volumes" using metadata type lvm2
  Found volume group "centos" using metadata type lvm2
  Found volume group "cinder-volumes" using metadata type lvm2
  The link /dev/chris-volumes/chris-volumes-pool should have been created by


Ok - there seems to be internal bug in lvm2 - which incorrectly hints
link creation for this case.

There should not have been  /dev/vg/pool link - this is correctly marked
for udev - but incorrectly for udev validation.

However the bug is actually not so much important - it just links
to 'wrapper' device - and eventually we will resolve the problem even without 
this extra device in table.



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM and chain of snapshots

2016-01-20 Thread Zdenek Kabelac


Dne 20.1.2016 v 13:54 Марк Коренберг napsal(a):


2016-01-20 17:09 GMT+05:00 Zdenek Kabelac <zkabe...@redhat.com
<mailto:zkabe...@redhat.com>>:

ed sta



Thanks for the response, but I do not understand how thin-provisioning is
related to question i'm asking.

As far as I understand, if 20 snapshots are created even in thin-provisioning
mode, write to origin will be converted
to 21 writes. Does not it ? My scenario mean that no write multiplication
occurs while making "normal" operations in
userspace (i.e. not writing to snapshots, while origin is under heavy
write-load). Also, my scenario adds functionality
of "snapshot of snapshot" easily. The case I'm trying to discuss is something
like chain of qcow2 files used to make
live snapshots in KVM.



You cannot chain old-snaps this way (you cannot map old-snap over old-snap)
And no - it's not easy to  add one.
So your proposal would have worked for exactly 1 level
e.g.  you continue to write to snap - and you keep origin intact,
but you cannot map another 'snap' over this snap.

lvm2 is currently incapable of doing this - and it's fairly nontrivial to 
support this - and especially when we have thin-provisioning, noone is 
currently planning to extend old snapshot with such complicated feature.



Use case: having such snapshot every day. And after snapshot count exceed 30,
meld first snapshot into it's origin.


So you would need 30 chained snaps
And after 30 of them - you actually would need merging them - quite 
complicated...


This operation should be possible without any unmounting. After merging, that
snapshot should contain empty diff
and so may be eliminated from chain via replacing dmsetup tables.


It's much better to directly update 'origin' and just drop no longer needed 
snapshot.


The only major issue is - running  30 old snapshot it just not suitable for 
any use...



In other words, my proposal is not connected to low-level things in LVM. Yes,
all snapshots I describe can be
thin-provisioned. Just minimal logic, CLI and XML should extended.


In other words - you just described how thin-provisioning works
and there is no reason to reinvent the wheel again :)

So please for this use-case switch to thin-provisioning.

Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] problem with lvcreate and redirection

2016-02-19 Thread Zdenek Kabelac


Dne 19.2.2016 v 19:40 Lentes, Bernd napsal(a):

Hi,

i have a script in which i invoke lvremove and lvcreate. With lvremove i don't 
have proplems but with lvcreate.
I'm redirecting stdout and stderr to a file because the script is executed by 
cron and i'd like to have a look afterwards if everything went fine.
The command is: lvcreate -v -L 25G -n lv_root_snapshot -s vg1/lv_root > 
lvcreate_with_redirection.log 2>&1. Shell does not accept further commands 
afterwards, but host still responds to ping. You can have a look on 
lvcreate_with_redirection.log here: 
https://hmgubox.helmholtz-muenchen.de:8001/d/b4c7025bac/ .
System seems to stop while suspending.

last lines of the log:
==

...
 Creating vg1-lv_root_snapshot-cow
 Loading vg1-lv_root_snapshot-cow table (252:3)
 Resuming vg1-lv_root_snapshot-cow (252:3)
 Loading vg1-lv_root_snapshot table (252:1)
 Suspending vg1-lv_root (252:0) with filesystem sync with device flush
==




Unfortunately you can't do that if you log to the SAME volume you are 
suspending - i.e. you run your command from your root volume

which is also suspended.

We could likely 'buffer' the output while in suspend mode,
and throw out the output later - but as this is seen as 'debug' help it's 
assume user takes care and user place for logging which doesn't block.


So if you want to see logs - use something tmpfs location for it.

Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] problem with lvcreate and redirection

2016-02-20 Thread Zdenek Kabelac


Dne 20.2.2016 v 14:25 Lentes, Bernd napsal(a):



- Am 19. Feb 2016 um 21:31 schrieb Zdenek Kabelac zdenek.kabe...@gmail.com:


Dne 19.2.2016 v 19:40 Lentes, Bernd napsal(a):

Hi,

i have a script in which i invoke lvremove and lvcreate. With lvremove i don't
have proplems but with lvcreate.
I'm redirecting stdout and stderr to a file because the script is executed by
cron and i'd like to have a look afterwards if everything went fine.
The command is: lvcreate -v -L 25G -n lv_root_snapshot -s vg1/lv_root >
lvcreate_with_redirection.log 2>&1. Shell does not accept further commands
afterwards, but host still responds to ping. You can have a look on
lvcreate_with_redirection.log here:
https://hmgubox.helmholtz-muenchen.de:8001/d/b4c7025bac/ .
System seems to stop while suspending.

last lines of the log:
==

...
  Creating vg1-lv_root_snapshot-cow
  Loading vg1-lv_root_snapshot-cow table (252:3)
  Resuming vg1-lv_root_snapshot-cow (252:3)
  Loading vg1-lv_root_snapshot table (252:1)
  Suspending vg1-lv_root (252:0) with filesystem sync with device flush
==




Unfortunately you can't do that if you log to the SAME volume you are
suspending - i.e. you run your command from your root volume
which is also suspended.

We could likely 'buffer' the output while in suspend mode,
and throw out the output later - but as this is seen as 'debug' help it's
assume user takes care and user place for logging which doesn't block.

So if you want to see logs - use something tmpfs location for it.

Regards


Zdenek



Hi Zdenek,

thanks for your answer. I redirected the output already to another partition, 
but system still stopped. Maybe because the partition is on the same disk as 
the lv ? Whole disk is suspended ? It's not possible to suspend just a 
partition ? I will try with tmpfs or something else.
But does that mean that redirection is generally not possible when suspending a 
lv ? Or just in my case because i invoke lvcreate ? What is about other 
programs having redirections during the suspend ? And why does it work when i 
just rediredt stdout ? When i redirect stdout and stderr system stops.




Your 'partition' layout isn't clear - but anyway - debugging command when you 
run it from a partition you try to suspend is 'tricky'.

Suspend basically stops any read/write operation from suspended device.
So you just need to be sure there is nothing on logging path which might be 
blocked.


So is your command blocked when you log into ramdisk ?
Are you using some recent version of lvm2 ?


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] disabling udev_sync and udev_rules

2016-03-16 Thread Zdenek Kabelac


Dne 16.3.2016 v 00:52 Steven Dake (stdake) napsal(a):



On 3/15/16, 3:56 PM, "linux-lvm-boun...@redhat.com on behalf of Zdenek
Kabelac" <linux-lvm-boun...@redhat.com on behalf of
zdenek.kabe...@gmail.com> wrote:


Dne 15.3.2016 v 23:31 Serguei Bezverkhi (sbezverk) napsal(a):

Hello folks,

While trying to make lvm work within a docker container I came across
an issue when all lvcreate/lvremove got stuck indefinetly or until
control-c. When I checked process I noticed lvm was waiting on one
semaphore, I found that other folks hit similar issue and they fixed it
by setting  udev_sync and udev_rules to 0. It also helped my case too.

I would greatly appreciate if you could share your thought if this
change in future can potentially have any negative impact.

Thank you


Hi


To 'unblock' stuck processes waiting on udev cookie - you could run:

'dmsetup udevcomplete_all'


However the key question is - how you could get stuck.
That may need further debugging.

You would need to expose your OS  version and also version of lvm2 in use.

Non working cookies are bad - and disabling udev sync is even more bad
idea...


Zdeknek,

To expand on what Serguei is doing, he is working on a patch to add
LVM2+Iscsi in a container for the Cinder (block storage AAS) project in



Hi

Well - this should be the 1st. sentence in the initial email reporting the 
problem.


lvm2 DOES NOT (and CANNOT) work properly inside container.

Devices are not 'containerized' resource.
This is common bug in 'Docker-land' understanding of Linux kernel.
That's where the hacks like not using 'udev' sync comes from.


OpenStack.  He is doing this in the upstream repository here:

http://github.com/openstack/klla

The LVM processes are running within a container.  I suspect if the
process is stuck on a semaphore it has something to do with semaphores not
being shared with the host OS, because containers naturally create a
contained environment.  There are solutions for things like sockets, but
not necessarily for things like semaphores for the container to
communicate with the host OS.

Is there another mechanism besides semaphores to get lvm2 to communicate
with udev?  Turning off udev sync side-steps the problem because then udev
is not in the picture.  Some people in our community think this is a
security risk, although we assume the servers are completely secure.

Your advice welcome on how to solve the problem would be mighty nice :)

To see the change in full, check out:



The proper way to resolve this is - to have some 'system' service doing
device for you and then transporting such device to your container.
Some sort of super-controller daemon.

Device creation is controlled by udev - which runs in your core system.
It's this udev which is processing kernel event and completes cookie and 
unblocks lvm2 command.


But user really should not confuse what is cgrouped process supposed to be 
doing - it really cannot create device (unlike in full virtual VM) - it has 
wide impact over the whole system - so there must be 'upper-level' process 
controlling this in some way and resolving i.e. name conflicts - sync in the 
system you have just one name space - not per-container namespace - and there 
are more and more troubles ahead...


Anyway - my first advice is to active device as service and pass properly 
created device back to your container via some protocol.


Regards


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Problems in lvmconfig.8, lvm-dumpconfig.8, lvreduce.8, lvresize.8, vgchange.8

2016-03-01 Thread Zdenek Kabelac


Dne 29.2.2016 v 12:35 e...@thyrsus.com napsal(a):

This is automatically generated email about markup problems in a man
page for which you appear to be responsible.  If you are not the right
person or list, please tell me so I can correct my database.

See http://catb.org/~esr/doclifter/bugs.html for details on how and
why these patches were generated.  Feel free to email me with any
questions.  Note: These patches do not change the modification date of
any manual page.  You may wish to do that by hand.

I apologize if this message seems spammy or impersonal. The volume of
markup bugs I am tracking is over five hundred - there is no real
alternative to generating bugmail from a database and template.

--
  Eric S. Raymond


Hi

Thanks for report.

I've tried to fix them (though differently) since proposed changes
were not always correct.


--- lvreduce.8-unpatched2016-02-29 05:58:16.401086143 -0500
+++ lvreduce.8  2016-02-29 06:00:39.252717531 -0500
@@ -16,7 +16,7 @@
  .RB [ \-\-noudevsync ]
  .RB { \-l | \-\-extents
  .RI [ \- ] LogicalExtentsNumber [ % { VG | LV | FREE | ORIGIN "}] |"
-.RB [ \-L | \-\-size
+.RB [ \-L | \-\-size ]


Here we use a different logic - otherwise we would have a full man page of []
chars.

So ATM whenever there is short & long option format - we are not using [] 
around them and we just use [] around the whole option.


e.g.:

[-l|--extents  ExtentsNumber]



Problems with vgchange.8:

Invalid option format - cannot have optional prefix in token,
it confuses anything trying to do syntactic parsing.

--- vgchange.8-unpatched2016-02-29 06:18:37.129937328 -0500
+++ vgchange.8  2016-02-29 06:24:27.177036131 -0500
@@ -40,7 +40,7 @@
  .IR MaxPhysicalVolumes ]
  .RB [ \-\-metadataprofile
  .IR ProfileName ]
-.RB [ \-\- [ vg ] metadatacopies ]
+.RB [ \-\-vgmetadatacopies
  .IR NumberOfCopies | unmanaged | all ]
  .RB [ \-P | \-\-partial ]
  .RB [ \-s | \-\-physicalextentsize
@@ -234,7 +234,7 @@
  in the volume group unless the logical volume itself has its own profile
  attached. See \fBlvm.conf\fP(5) for more information about \fBmetadata 
profiles\fP.
  .TP
-.BR \-\- [ vg ] metadatacopies " " \fINumberOfCopies | \fIunmanaged | \fIall
+.BR \-\- vgmetadatacopies " " \fINumberOfCopies | \fIunmanaged | \fIall



Not really sure what we can do with these - since we have couple other similar 
one.


lvm2 does support  'optional' prefix so the user can use shorter variant.

--vgmetadatacopies  or just  --metadatacopies
--raidmaxrecoveryrate  or  just --maxrecoveryrate

We can probably just put both option in full form to the list - but this
form  --[vg]metadatacopies just looks easier to read and saves space ;)

So are we violating some basic 'man-page-writting' rule - or just the catb 
parser is not capable of reading it?


IMHO I've already checked many many other man pages and there is a lot of 
different styles - so it's quite unclear how to write it properly.


As of now - page for 'lvcreate.8' is basically the style we are heading to.
(and lot's of pages still needs to be updated).


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] 2 questions on LVM cache

2016-04-27 Thread Zdenek Kabelac


On 27.4.2016 12:50, Xen wrote:

1. Does LVM cache support discards of the underlying blocks (in the cache)
when the filesystem discards the blocks?


It does in a devel branch - hopefully will be upstreamed shortly...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Testing ThinLVM metadata exhaustion

2016-04-25 Thread Zdenek Kabelac


On 25.4.2016 10:59, Gionatan Danti wrote:

Il 23-04-2016 10:40 Gionatan Danti ha scritto:

On 22/04/2016 16:04, Zdenek Kabelac wrote:



I assume you miss newer kernel.
There was originally this bug.

Regards

Zdenek


Hi Zdenek,
I am running CentOS 6.7 fully patched, kernel version
2.6.32-573.22.1.el6.x86_64

Should I open a BZ report for it is RH aware of the problem on RH/CentOS 6.7?

Thanks.


Hi,
sorry for the bump, by I really like to understand if current RHEL6 and RHEL7
kernels are affected by this serious bug and, if it is the case, if Red Hat is
aware of that.

I understand this is not the better place to ask about a specific
distribution, but I see many RedHat peoples here ;)

If this really is the wrong place, can someone point me to the right one (RH
Bugzilla?).
Thanks.



bugzilla.redhat.com

Anyway - 6.8 will likely be your solution.

Thin-provisioning is NOT supposed to be used at 'corner' cases - we improve 
them, but older version simply had more of them as there was always clearly 
communicated do not over-provision if you can't provide the space.


Out-of-space  is not equal if you run out of your filesystem space - you can't 
expect things will continue to work nicely - the cooperation of block layer 
with filesystem and metadata resilience are continually improved.


We have actually even seen users 'targeting' to hit full-pool as a part of 
regular work-flow - bad bad plan...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] auto umount at thin meta 80%

2016-05-23 Thread Zdenek Kabelac


On 22.5.2016 19:07, Xen wrote:

I am not sure if, due to my recent posts ;-) I would still be allowed to write
here. But, perhaps it is important regardless.

I have an embedded LVM. The outer volume is a cached LVM, that is to say, two
volumes are cached from a different PV. One of the cached volumes contains LUKS.

The LUKS, when opened, contains another PV, VG, and a thin pool.

The thin pool contains about 4-5 partitions and is overprovisioned. Only one
volume is in use now.

This thin volume called "Store" is almost the full size of the thin pool, but
not quite:

storevault Vwi-aotz-- 400.00g thin40.37

It currently stores a backup, I was writing some backup scripts.

The volume apparently got umounted when the meta of the thinpool reached 80%.

10:47:21 lvm[4657]: WARNING: Thin pool vault-thin-tpool metadata is now 80.14%
full.
10:47:21 lvm[4657]: Request to lookup VG vault in lvmetad gave response
Connection reset by peer.
10:47:21 lvm[4657]: Volume group "vault" not found
10:47:21 lvm[4657]: Failed to extend thin pool vault-thin-tpool.


At this moment -   any failure in  'lvresize' execution leads to immediate 
umount - tool went ballistic and so takes 'drastic' action to prevent further

damage.

See here again:

https://www.redhat.com/archives/linux-lvm/2016-May/msg00064.html

ATM there is some new code which unconditionally always connects to lvmetad
even it's not necessary at all - and it's potential source of many other
troubles - so fixes here are in progress  -   set 'use_lvmetad=0'  if it's 
still a problem in your case.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-24 Thread Zdenek Kabelac


On 24.5.2016 15:45, Gionatan Danti wrote:

Il 18-05-2016 15:47 Gionatan Danti ha scritto:


One question: I did some test (on another machine), deliberately
killing/stopping the lvmetad service/socket. When the pool was almost
full, the following entry was logged in /var/log/messages

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

So it appears than when lvmetad is gracefully stopped/not running,
dmeventd correctly resort to device scanning. On the other hand, in
the previous case, lvmetad was running but returned "Connection
refused". Should/could dmeventd resort to device scanning in this case
also?

...

Very probable. So, after a LVM update, is best practice to restart the
machine or at least the dmeventd/lvmetad services?

One more, somewhat related thing: when thin pool goes full, is a good
thing to remount an ext3/4 in readonly mode (error=remount-ro). But
what to do with XFS which, AFAIK, does not support a similar
readonly-on-error policy?

It is my understanding that upstream XFS has some improvements to
auto-shutdown in case of write errors. Did these improvements already
tickle to production kernels (eg: RHEL6 and 7)?

Thanks.


Sorry for the bump, I would really like to know your opinions on the above



Dmeventd should not talk to lvmetad at all - I'm saying this for years

There are some very very hard to fix (IMHO) design issues - and locking 
lvmetad in memory would be just one of wrong (IMHO) ways forward


Anyway - let's see how it evolves here as there are further troubles
with lvmetad & dmeventd - see i.e. here:

https://bugzilla.redhat.com/show_bug.cgi?id=1339210

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-17 Thread Zdenek Kabelac

On 17.5.2016 15:09, Gionatan Danti wrote:

Well yeah - ATM we rather take 'early' action and try to stop any user
on overfill thin-pool.

It is a very reasonable standing

Basically whenever 'lvresize' failed - dmeventd plugin now tries
to unconditionally umount any associated thin-volume with
thin-pool above threshold.

For now - plugin 'calls' the tool - lvresize --use-policies.
If this tool FAILs for ANY reason -> umount will happen.

I'll probably put in 'extra' test that 'umount' happens
with >=95% values only.

dmeventd itself has no idea if there is configure 100 or less - it's
the lvresize to see it - so even if you set 100% - and you have enabled
monitoring - you will get umount (but no resize)

Ok, so the "failed to resize" error is also raised when no actual resize
happens, but the call to the "dummy" lvresize fails. Right?

Yes - in general - you've witnessed general tool failure,
and dmeventd is not 'smart' to recognize the reason of failure.

Normally this 'error' should not happen.

And while I'd even say there could have been a 'shortcut'
without even reading VG 'metadata' - since there is profile support,
it can't be known (100% threshold) without actually reading metadata
(so it's quite tricky case anyway)

Well 'lvmetad' shall not crash, ATM this may kill commands - and further
stop processing - as we rather 'stop' further usage rather than allowing
to cause bigger damage.

So if you have unusual system/device setup causing 'lvmetad' crash -
open BZ,
and meawhile set 'use_lvmetad=0' in your lvm.conf till the bug is fixed.

My 2 cents are that the last "yum upgrade", which affected the lvm tools,
needed a system reboot or at least the restart of the lvm-related services
(dmeventd and lvmetad). The strange thing is that, even if lvmetad crashed, it
should be restartable via the lvm2-lvmetad.socket systemd unit. Is this a
wrong expectation?

Assuming you've been bitten by this one:

https://bugzilla.redhat.com/1334063

possibly? targeted by this commit:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=7ef152c07290c79f47a64b0fc81975ae52554919

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-17 Thread Zdenek Kabelac


On 17.5.2016 19:17, Xen wrote:

Strange, I didn't get my own message.


Zdenek Kabelac schreef op 17-05-2016 11:43:


There is no plan ATM to support boot from thinLV in nearby future.
Just use small boot partition - it's the safest variant - it just hold
kernels and ramdisks...


That's not what I meant. Grub-probe will fail when the root filesystem is on
thin, thereby making impossible the regeneration of your grub config files in
/boot/grub.

It will try to find the device for mounted /, and not succeed.

Booting thin root is perfectly possible, ever since Kubuntu 14.10 at least (at
least januari 2015).


We aim for a system with boot from single 'linear' with individual
kernel + ramdisk.

It's simple, efficient and can be easily achieved with existing
tooling with some 'minor' improvements in dracut to easily allow
selection of system to be used with given kernel as you may prefer to
boot different thin snapshot of your root volume.


Sure but won't happen if grub-update bugs on thin root.

I'm not sure why we are talking about this now, or what I asked ;-).



The message behind is - bootting from 'linear' LVs, and no msdos partions...
So right from a PV.
Grub giving you 'menu' from bootable LVs...
BootableLV combined with selected 'rootLV'...


Complexity of booting right from thin is very high with no obvious benefit.


I understand. I had not even been trying to achieve yet, although it has or
might have principal benefit, the way doing away with partitions entirely
(either msdos or gpt) has a benefit on its own.

But as you indicate, you can place boot on non-thin LVM just fine, so there is
not really that issue as you say.


But for me, a frozen volume would be vastly superior to the system locking up.


You miss the knowledge how the operating system works.

Your binary is  'mmap'-ed for a device. When the device holding binary
freezes, your binary may freeze (unless it is mlocked in memory).

So advice here is simple - if you want to run unfreezable system -
simply do not run this from a thin-volume.


I did not run from a thin-volume, that's the point.

In my test, the thin volumes were created on another harddisk. I created a
small partition, put a thin pool in it, put 3 thin volumes in it, and then
overfilled it to test what would happen.


It's the very same issue if you'd have used 'slow' USB device - you may slow 
down whole linux usage - or in similar way building 4G .iso image.


My advice - try lowering  /proc/sys/vm/dirty_ration -   I'm using '5'



The best advice we have - 'monitor' fullness - when it's above - stop
using such system and ensure there will be more space -  there is
noone else to do this task for you - it's the price you pay for
overprovisioning.


The point is that not only as an admin (for my local systems) but also as a
developer, there is no point in continuing a situation that could be mitigated
by designing tools for this purpose.

There is no point for me if I can make this easier by automating tools for
performing these tasks, instead of doing them by hand. If I can create tools
or processes that do, what I would otherwise have needed to do by hand, then
there is no point in continuing to do it by hand. That is the whole point of
"automation" everywhere.


Policies are hard and it's not quite easy to have some universal,
that fits everyone needs here.

On the other hand it's relatively easy to write some 'tooling' for your
particular needs - if you have nice 'walled' garden you could easily target 
it...


"Monitoring" and "stop using" is a process or mechanism that may very well be
encoded and be made default, at least for my own systems, but by extension, if
it works for me, maybe others can benefit as well.


Yes - this part will be extended and improved over the time.
Already few BZ exists...
It just takes time



I am not clear why a forced lazy umount is better, but I am sure you have your
reason for it. It just seems that in many cases, an unwritable but present
(and accessible) filesystem is preferable to none at all.


Plain simplicity - umount is simple sys call, while 'mount -o remount,ro' is 
relatively complicated resource consuming process.  There are some technical 
limitation related to usage operations like this behind 'dmeventd' - so it 
needs some redesigning for these new needs




I do not mean any form of differentiation or distinction. I mean an overall
forced read only mode on all files, or at least all "growing", for the entire
volume (or filesystem on it) which would pretty much be the equivalent of
remount,ro. The only distinction you could ever possibly want in there is to
block "new growth" writes while allowing writes to existing blocks. That is
the only meaningful distinction I can think of.

Of course, it would be pretty much equivalent to a standard mount -o
remount,ro, and would still depend on thin pool information.



To give some 'light' where is the 'core o

Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-16 Thread Zdenek Kabelac


On 15.5.2016 12:33, Gionatan Danti wrote:

Hi list,
I had an unexptected filesystem unmount on a machine were I am using thin
provisioning.



Hi

Well yeah - ATM we rather take 'early' action and try to stop any user
on overfill thin-pool.




It is a CentOS 7.2 box (kernel 3.10.0-327.3.1.el7, lvm2-2.02.130-5.el7_2.1),
with the current volumes situation:
# lvs -a
   LV   VG Attr   LSize  Pool Origin
Data%  Meta%  Move Log Cpy%Sync Convert
   000-ThinPool vg_storage twi-aotz-- 10.85t 74.06  33.36
   [000-ThinPool_tdata] vg_storage Twi-ao 10.85t
   [000-ThinPool_tmeta] vg_storage ewi-ao 88.00m
   Storage  vg_storage Vwi-aotz-- 10.80t 000-ThinPool 74.40
   [lvol0_pmspare]  vg_storage ewi--- 88.00m
   root vg_system  -wi-ao 55.70g
   swap vg_system  -wi-ao  7.81g

As you can see, thin pool/volume is at about 75%.

Today I found the Storage volume unmounted, with the following entries in
/var/log/message:
May 15 09:02:53 storage lvm[43289]: Request to lookup VG vg_storage in lvmetad
gave response Connection reset by peer.
May 15 09:02:53 storage lvm[43289]: Volume group "vg_storage" not found
May 15 09:02:53 storage lvm[43289]: Failed to extend thin
vg_storage-000--ThinPool-tpool.
May 15 09:02:53 storage lvm[43289]: Unmounting thin volume
vg_storage-000--ThinPool-tpool from /opt/storage.


Basically whenever  'lvresize' failed - dmeventd plugin now tries
to unconditionally umount any associated thin-volume with
thin-pool above threshold.



What puzzle me is that both thin_pool_autoextend_threshold and
snap_pool_autoextend_threshold are disabled in the lvm.conf file
(thin_pool_autoextend_threshold = 100 and snap_pool_autoextend_threshold =
100). Moreover, no custom profile/policy is attached to the thin pool/volume.


For now  -  plugin   'calls'  the tool - lvresize --use-policies.
If this tool FAILs for ANY  reason ->  umount will happen.

I'll probably put in 'extra' test that 'umount' happens
with  >=95% values only.

dmeventd  itself has no idea if there is configure 100 or less - it's the 
lvresize to see it - so even if you set 100% - and you have enabled

monitoring  - you will get umount (but no resize)




To me, it seems that the lvmetad crashed/had some problems and the system,
being "blind" about the thin volume utilization, put it offline. But I can not
understand the "Failed to extend thin vg_storage-000--ThinPool-tpool", and I
had *no* autoextend in place.


If you strictly don't care about any tracing of thin-pool fullness,
disable  monitoring in lvm.conf.




I rebooted the system and the Storage volume is now mounted without problems.
I also tried to write about 16 GB of raw data to it, and I have no problem.
However, I can not understand why it was put offline in the first place. As a
last piece of information, I noted that kernel & lvm was auto-updated two days
ago. Maybe it is related?

Can you give me some hint of what happened, and how to avoid it in the future?


Well 'lvmetad' shall not crash, ATM this may kill commands - and further stop 
processing - as we rather 'stop' further usage rather than allowing

to cause bigger damage.

So if you have unusual system/device setup causing  'lvmetad' crash - open BZ,
and meawhile  set   'use_lvmetad=0' in your lvm.conf till the bug is fixed.

Regards

Zdenek




___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM Merge

2016-05-16 Thread Zdenek Kabelac


On 16.5.2016 13:14, Alasdair G Kergon wrote:

On Sun, May 15, 2016 at 04:35:44AM +, Tom Jay wrote:

I've posted a question to the debian-user mailing list, but am yet to receive a 
response.
I am running Debian 7.9 64-bit with kernel version 3.2.0 and would like to use 
the 'lvconvert --merge' feature, but do not have 'snapshot-merge' support in 
the kernel. Does anyone have any idea how to enable this?
The original post is here: 
https://lists.debian.org/debian-user/2016/05/msg00496.html.


So you don't have any snapshot support in your running kernel.
But your kernel is modular, and if you attempt a snapshot operation,
lvm2 will try to load the snapshot module if it is available.

To try this manually, do 'modprobe dm-snapshot' (see 'man modprobe')
then retry 'dmsetup targets'.



Please always attach version of lvm2  and kernel in use.

Debian version has some local patches for 'modprobe' usage - so maybe check
it 'auto' modprobe works properly in your case ?
(rmmod dm-snapshot  -  and check if lvcreate -s  still works)

Are you build your own kernel - or do you use  'distro' debian kernel ?


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM Merge

2016-05-18 Thread Zdenek Kabelac


On 18.5.2016 10:13, Tom Jay wrote:

Hello, This question was in fact answered by the debian-kernel mailing list,
who informed me that the documentation was probably wrong. The kernel does
actually support merging volumes, just the 'dmsetup targets' output only
refers to output from modules that are already loaded, not what is supported.
Using 'lvconvert --merge' works without problems. This is the original thread:
https://lists.debian.org/debian-kernel/2016/05/msg00186.html. Thanks! Tom


Hi

The issue is - properly compiled  'lvm2'  does  call 'modprobe' internally.
So if you have failing  'lvconvert --merge' because  snapshot-merge module
was not loaded and you will load it by your own modprobe call - then your lvm2 
tool is miscompiled (compiled without modprobe support...)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-18 Thread Zdenek Kabelac

On 18.5.2016 03:34, Xen wrote:

Zdenek Kabelac schreef op 18-05-2016 0:26:

On 17.5.2016 22:43, Xen wrote:

Zdenek Kabelac schreef op 17-05-2016 21:18:

I don't know much about Grub, but I do know its lvm.c by heart now almost :p.

lvm.c by grub is mostly useless...

Then I feel we should take it out and not have grub capable of booting LVM
volumes anymore at all, right.

It's not properly parsing and building lvm2 metadata - it's a 'reverse
engineered' code to handle couple 'most common' metadata layouts.

But it happens most users are happy with it.

So for now using 'boot' partition is advised until proper lvm2 metadata
parser becomes integral part of Grub.

ATM user needs to write his own monitoring plugin tool to switch to
read-only volumes - it's really as easy as running bash script in loop.

So you are saying every user of thin LVM must individually, that means if
there are a 10.000 users, you now have 10.000 people needing to write the same

Only very few of them will write something - and they may propose their
scripts for upstream inclusion...

I take it by that loop you mean a sleep loop. It might also be that logtail
thing and then check for the dmeventd error messages in syslog. Right? And

dmeventd is also 'sleep loop' in this sense (although smarter...)

First hit is CentOS. Second link is reddit. Third link is Redhat. Okay it
should be "lvm guide" not "lvm book". Hasn't been updated since 2006 and no
advanced information other than how to compile and install

Dammed Google, he knows about you, that you like Centos and reddit :)
I do get quite different set of links :)

I mean: http://tldp.org/HOWTO/LVM-HOWTO/. So what people are really going to
know this stuff except the ones that are on this list?

We do maintain man pages - not feeling responsible for any HOWTO/blogs around
the world.

And of course you can learn a lot here as well:
https://access.redhat.com/documentation/en/red-hat-enterprise-linux/

How to find out about vgchange -ay without having internet access.

Now just imagine you would need to configure your network from command line
with broken NetworkManager package

maybe a decade or longer. Not as a developer mostly, as a user. And the thing
is just a cynical place. I mean, LOOK at Jira:

https://issues.apache.org/jira/browse/log4j2/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel

Being cynical myself - unsure what's better in URL name issues.apache.org
compared bugzilla.redhat.com... Obviously we do have all sorts of flags in RHBZ.

Well the question was not asking for your 'technical' proposal, as you
have no real idea how it works and your visions/estimations/guesses
have no use at all (trust me - far deeper thinking was considered so
don't even waste your time to write those sentences...)

Well you can drop the attitude you know. If you were doing so great, you would
not be having a total lack of all useful documentation to begin with. You
would not have a system that can freeze the entire system by default, because
"policy" is apparently not well done.

Yep - and you probably think you help us a lot to realize this...

But you may a bit 'calm down' - we really know all the troubles and even far
more then you can even think of - and surprise - we actively work on them.

I think the commands themselves and their way of being used, is outstanding,
they are intuitive, they are much better than many other systems out there
(think mdadm). It takes hardly no pain to remember how to use e.g. lvcreate,

Design simply takes time - and many things are tried...

Of course Red Hat could have been cooking something for 10 years secretly
before going public - but the philosophy is - upstream first, release often
and only released code does matter.

So yeah - some people are writing novels on lists and some others are writing
a useful code

You are *already* integrating e.g. extfs to more closely honour the extent
boundaries so that it is more efficient. What I am saying is not at all out of

There is a fundamental difference to 'read' geometry once during 'mkfs' time,
and do it every time we each write through the whole device stack ;)

When you fail to write an ordinary (non-thin) block device - this
block is then usually 'unreadable/error' - but in thinLV case - upon
read you get previous 100% valid' content - so you may start to
imagine where it's all heading.

So you mean that "unreadable/error" signifies some form of "bad sector" error.
But if you fail to write to thinLV, doesn't that mean (in our case there) that
the block was not allocated by thinLV? That means you cannot read from it
either. Maybe bad example, I don't know.

I think we are heading to big 'reveal' how thinp works.

You have thin volume T and its snapshot S.

You write to block 10 of device T.

As there is snapshot S - your write to device T needs to go to a ne

Re: [linux-lvm] LVM feature parity with MD

2016-05-03 Thread Zdenek Kabelac


On 2.5.2016 13:03, Fabian Herschel wrote:

Can LVM doing a delta sync and does LVM have something like a bitamp to show
which parts are in sync and which needs a sync?



Hi

lvm uses small metadata LV with each raid1 leg - which holds the bitmap about 
individual regionsize areas in sync.


So when there is 'crash' only async areas are synced again after array restart.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Not informative error message, when trying to change unallocatable physical volume

2016-05-10 Thread Zdenek Kabelac


On 8.5.2016 17:01, Илья Бока wrote:

#lvcreate -L600M -n test vg-arch
   Insufficient free space: 150 extents needed, but only 0 available

but fixed with|
#pvchange --allocatable y /dev/mapper/lvm
|
|maybe possibly change message to more usable?



Hi

Thanks for report.

I'd say that making PV not allocatable should also be then reflected in 
vgs/pvs  as not a 'free' space  (e.g. having a VG with non-allocatable PVS 
should show 0 empty space)

But this change would be surely for a discussion since various
backward compatibility issue needs to be though through...

Recent lvm2 tool shows  'pvs' attribute 'u' with non-allocatable PVs
so at least 'quick' look at pvs output shows this easily.

(that's why I think it would be quite 'straightforward' to see 0 free space in 
"VG" and then check  'pvs' and seeing 'u' flag)



The error message you would like to see changed is however at different level.
Its deeply embedded in allocator - so  either we would need to provide some
'hint' message checking if there is  0 free allocatable size, but still some 
space in (u)sed but not allocatable PV  - or the allocator would need to know 
about extra 'unallocatable' space and reflect this in error message.



Anyway  - please open BZ as it's not a simple change and needs some thinking.

https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Thin Pool Performance

2016-04-19 Thread Zdenek Kabelac


Dne 19.4.2016 v 03:05 shankha napsal(a):

Hi,
Please allow me to describe our setup.

1) 8 SSDS with a raid5 on top of it. Let us call the raid device : dev_raid5
2) We create a Volume Group on dev_raid5
3) We create a thin pool occupying 100% of the volume group.

We performed some experiments.

Our random write operations dropped  by half and there was significant
reduction for
other operations(sequential read, sequential write, random reads) as
well compared to native raid5

If you wish I can share the data with you.

We then changed our configuration from one POOL to 4 POOLS and were able to
get back to 80% of the performance (compared to native raid5).

To us it seems that the lvm metadata operations are the bottleneck.

Do you have any suggestions on how to get back the performance with lvm ?

LVM version: 2.02.130(2)-RHEL7 (2015-12-01)
Library version: 1.02.107-RHEL7 (2015-12-01)




Hi


Thanks for playing with thin-pool, however your report is largely incomplete.

We do not see you actual VG setup.

Please attach  'vgs/lvs'  i.e. thin-pool zeroing (if you don't need it keep it 
disabled), chunk size (use bigger chunks if you do not need snapshots), number 
of simultaneously active thin volumes in single thin-pool (running hundreds of 
loaded thinLV is going to loose battle on locking) , size of thin pool 
metadata LV -  is this LV located on separate device (you should not use RAID5 
with metatadata)

and what kind of workload you try on ?

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshots & data security

2016-07-20 Thread Zdenek Kabelac


Dne 19.7.2016 v 17:28 Scott Sullivan napsal(a):

Hello,

Could someone please clarify if there is a legitimate reason to worry about
data security of a old (removed) LVM snapshot?

For example, when you lvremove a LVM snapshot, is it possible for data to be
recovered if you create another LVM and it happens to go into the same area as
the old snapshot we lvremoved?

If this helps clarify, do we have to worry about security scrubbing a LVM
snapshot for data security ?



lvm2 is 'volume manage'  - not a security tool to obfuscate data on your disk 
- this is 'admins' task.


So if you do care about 'data' content you give to your user in LV - it's
then 'admins' jobs to 'clear-up' all space before LV is given to user.

i.e.   'lvcreate  &&  dd if=/dev/zero'

lvm2 does not care about 'data' content - it's metadata management tool.

Now if you are paranoid and you care purely about 'erasing' data from your 
snapshot - you can use  'lvconvert --splitsnapshot' and then erase again with 
'dd'  your COW volume you get from split.


Saying all this - you can try to use 'thin-provisioning' instead,
which has built-in option of zeroing  provisioned blocks - so whenever your 
provisioned LV gets a 'new block' - it's unwritten part are always zeroed - so 
there is no 'data-leak'.


And finally - if you are using modern filesystem like ext4 or XFS - they are 
tracking written area - so  'fs' user cannot actually read 'unwritten' data.


And 2nd. finally  - for paronid admin consider 'data' encryption

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM problem after upgrade to Ubuntu 16.04 (Part 2)

2016-06-30 Thread Zdenek Kabelac


Dne 30.6.2016 v 11:10 Scott Hazelhurst napsal(a):


Dear all

I sent the email below about three weeks ago. Today I upgraded another
machine — I had exactly the same thing happen and I appear to have lost
some LVM volumes

Hmm. At least I  know it’s a problem and can backup the LVM volumes before
doing upgrade but it’s hugely annoying.

Scott




Dear all

I upgraded a server from Ubuntu 15.10 to 16.04. All went smoothly and the
machine appeared in good state. However, there seems to be a problem with
LVM. I have one volume group with several logical volumes. lvs showed all
the logical volumes but only 2 of the 7 volumes were active (and appearing
in /dev/mapper).

I activated the other volumes using lvchange -ay and then I can see all
logical volumes in /dev/mapper.



Hi

Upstream lvm2 has a bad message for you - your complains should be stricly
targeted to  Ubuntu/Debian lvm2 maintainers - so please open bug for them.

Solving booting sequence is very  'distro-specific' topic completely unrelated
to lvm2 -

as you say - when you manually use  'lvchange -ay' everything works - thus
your booting sequence missed to activate volumes




However, there appear to a problem with some of the LVs which are used as
the images for virtual machines (qemu-kvm). The VMs don't boot. Moreover,
previously I could mount the sub-partitions of the LV by doing a kpartx -a
on the logical volume. That no longer works (e.g. kpartx -l
/dev/mapper/cloud-VNAME shows nothing, while it used to show the 3
partitions of the underlying virtual machine).


Again - likely distro-specific case.
kpartx has nothing in common with



One of the LVs didn't have a problem -- this just contained a data disk.

This particular server is not a serious problem for me because  I can clone
the VM from elsewhere, but I am about to start an upgrade of all our other
machines and it will be very time-consuming if I have to do this with each
machine.

Does anyone have any ideas about what went wrong? How the situation could
be remedied



Yep - untuned/untested release

Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshots & data security

2016-08-16 Thread Zdenek Kabelac


Dne 27.7.2016 v 21:17 Stuart Gathman napsal(a):

On 07/19/2016 11:28 AM, Scott Sullivan wrote:


Could someone please clarify if there is a legitimate reason to worry
about data security of a old (removed) LVM snapshot?

For example, when you lvremove a LVM snapshot, is it possible for data
to be recovered if you create another LVM and it happens to go into
the same area as the old snapshot we lvremoved?

If this helps clarify, do we have to worry about security scrubbing a
LVM snapshot for data security ?


Another idea: if your VG is on SSD, and properly aligned, then DISCARD
on the new LV will effectively zero it as far as any guest VMs are
concerned.  (The data is still on the flash until erased by the
firmware, of course.)  If VG and PE size do not align with the SSD erase
block, then you can still zero the "edges" of the new LV, which is much
faster (and less wear on the SSD) than zeroing the whole thing.  You
could always read-verify that the data is actually all zero.


Yes - as already suggested -  once you create a new LV -
you can  'blkdicard /dev/vg/lv'

Note - SSD may not always ensure blocks are zeroed - they could just
move trimmed block into reuse list with undefined content.

Anyway - lvm2 is not tool for data protection and it's upto system admin
to ensure there are no data leaks.

So pick the solution which fits best your needs - lvm2 provides all the 
tooling for it.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] creating DD copies of disks

2016-09-20 Thread Zdenek Kabelac


Dne 17.9.2016 v 20:02 Lars Ellenberg napsal(a):

On Sat, Sep 17, 2016 at 04:40:36PM +0200, Xen wrote:

Lars Ellenberg schreef op 17-09-2016 15:49:

On Sat, Sep 17, 2016 at 09:29:16AM +0200, Xen wrote:

I want to ask again:

What is the proper procedure when duplicating a disk with DD?


depends on what you define as "proper",
what the desired outcome is supposed to look like.

What exactly are you trying to do?

If you intend to "clone" PVs of some LVM2 VG,
and want to be able to activate that on the same system
without first deactivating the "original",
I suggest:

1) create consistent snapshot(s) or clone(s) of all PVs
2) import them with "vgimportclone",
which is a shell script usually in /sbin/vgimportclone,
that will do all the neccessary magic for you,
creating new "uuid"s and renaming the vg(s).


Right so that would mean first duplicating partition tables etc.

I will check that out some day. At this point it is already done, mostly. I
didn't yet know you could do that, or what a "clone" would be, so thank you.


No. You check that out *now*.
It does not matter how you create your "duplicates" "clones" "snapshots"
whatever you name them. If you want, use dd. No one really cares.
What matters is that they are consistent.

Then, if you want to attach them, both original and "duplicate",
you need to change uuids of PV and VG, and the VG name.

And vgimportclone is a script that does all necessary steps for you.

So no, you don't have to write scripts,
or figure out the necessary steps.

Someone else did that for you already.
Just use it.


Hi

It might be worth to mention here -  lvm2  is your LEAST problem when you have 
duplicate device in your system.


Udevd  nor  systemd can cope with those device AT ALL.
(you will actually hardly find tool which can handle them).

So once a duplicated identifier appears in your system - all related symlinks
in /dev/by*  or  related attached systemd services  becomes just a random 
unpredictable 'garbage'.


In general user-space tools are unprepared to deal with 'identical' devices.
So IMHO lvm2 is still 'lighting years' ahead here.

You could prepare device filters to actually make your 'duplicate' device 
invisible to lvm2  (as I always HIGHLY recommend to use 'white-list' filter

and let lvm2 process only  'expected' device).
Recent version of lvm2 also do try to 'smartly' guess which one of duplicated 
device is actually in use to further increase protection of a user...


However lvmetad is yet not fully equipped for this - but at least it will 
automatically turn itself off now...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] lvm2 tarballs vanishing

2016-09-07 Thread Zdenek Kabelac


Dne 7.9.2016 v 11:29 Marcus Meissner napsal(a):

Hi,

Is there a specific reasons that old LVM2 tarballs are removed from
ftp://sources.redhat.com/pub/lvm2/

e.g. there are wide gaps in the series 133-143 144-147 147-152 152-154 154-162 
162-165



Yep,  some releases  are 'developmental' check-points,
others were found with some rather 'unwanted' bugs/features.

Anyway if you still really want to get those 
'obsoleted'/'unsupportable'/'broken'  (pick the word you like) files for 
studying purpose - see:


ftp://sources.redhat.com/pub/lvm2/releases/

eventually use git with proper tag   v2_02_XXX

But please never-ever try to do any distro release with those tarballs

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] metadata too large for circular buffer - any way to increase PV metadatasize?

2016-09-29 Thread Zdenek Kabelac

Dne 28.9.2016 v 15:51 Charles Koprowski napsal(a):

 On Sun, 10 Aug 2008 18:45:16 +0100, John Leach wrote:

On Thu, 2008-08-07 at 20:41 +0100, Alasdair G Kergon wrote:
> To remove the metadata areas you need to:
>
>get an up-to-date metadata backup (vgcfgbackup)
>
>   pvcreate --restorefile pointing at a copy of that backup file
requesting 0 metadata areas
> and specifying the same uuid as it had before
>
>   vgcfgrestore from the backup file
>
This doesn't look like something you can do with the volume group active
(with cluster lvm anyway):
[root testnode0 ~]# pvcreate --restorefile san-metadata -u
fTLglk-j1C1-02Z7-8k6l-DTAm-2WNj-9ZGT19 --metadatacopies 0 /dev/hdb -ff
Really INITIALIZE physical volume "/dev/hdb" of volume group "san" [y/n]? y
  Can't open /dev/hdb exclusively.  Mounted filesystem?
Is that right, or am I missing something?

Hi John,

I know it's been a long time but, Did you found out a solution for this ?

I'm trying to follow the same scenario to increase the metadatasize and got
stuck at the same point.

Hi

I've no real idea what's the original problem  - but here is clear error on 
user side. This message:

"Can't open /dev/hdb exclusively.  Mounted filesystem?"

indicates that device is NOT UNUSED and it has to be unused for this operation 
- i.e. you MAY NOT run this operation while there are active LVs from this PV 
(one of possible reasons why  'hdb' cannot be opened exclusively)

To increase 'metadata' size - you ether have to add some NEW PV with
much bigger metadata size space (and such operation still needs some
extra space to proceed with existing size). Then disable existing metadata
areas - so you will end-up of storing bigger MDA only to this new bigger MDA 
on a new PV -  note - it's quite risky plan to leave MDA only on single PV in 
multi-PV VG  - so I'd not advice this for any serious use.

You may PV via pvmove to a new bigger PV.
(adviced operation - though it may take its time...)

There is NOT lvm2  native support for  'online' resize of PV mda size - and 
while it's possible to do this operation manually - it's quite complex task - 
so I'd not advice to do this either - unless you have FULL backup of everything.

So back to your original problem - please describe  EXACTLY what is your 
problem and attach  outputs of  pvs,vgs,lvs  and what you want to achieve.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] very slow sequential writes on lvm raid1 (bitmap?)

2016-11-08 Thread Zdenek Kabelac


Dne 7.11.2016 v 16:58 Alexander 'Leo' Bergolth napsal(a):

On 11/07/2016 11:22 AM, Zdenek Kabelac wrote:

Dne 7.11.2016 v 10:30 Alexander 'Leo' Bergolth napsal(a):

I am experiencing a dramatic degradation of the sequential write speed
on a raid1 LV that resides on two USB-3 connected harddisks (UAS
enabled), compared to parallel access to both drives without raid or
compared to MD raid:

- parallel sequential writes LVs on both disks: 140 MB/s per disk
- sequential write to MD raid1 without bitmap: 140 MB/s
- sequential write to MD raid1 with bitmap: 48 MB/s
- sequential write to LVM raid1: 17 MB/s !!

According to the kernel messages, my 30 GB raid1-test-LV gets equipped
with a 61440 bit write-intent bitmap (1 bit per 512 byte data?!) whereas
a default MD raid1 bitmap only has 480 bit size. (1 bit per 64 MB).
Maybe the dramatic slowdown is caused by this much too fine grained
bitmap and its updates, which are random IO?

Is there a way to configure the bitmap size?


Can you please provide some results with  '--regionsize'  changes ?
While   '64MB' is quite 'huge'  for resync I guess 'the current' default
picked region size is likely very very small in same cases.


Ah - thanks. Didn't know that --regionsize is also valid for --type raid1.

With --regionsize 64 MB, the bitmap has the same size as the default
bitmap created by mdadmin and write performance is also similar:

*** --regionsize 1M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 63,957 s, 16,4 MB/s
*** --regionsize 2M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 39,1517 s, 26,8 MB/s
*** --regionsize 4M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 32,8275 s, 31,9 MB/s
*** --regionsize 16M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 30,2903 s, 34,6 MB/s
*** --regionsize 32M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 30,1452 s, 34,8 MB/s
*** --regionsize 64M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 21,6208 s, 48,5 MB/s
*** --regionsize 128M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 14,2028 s, 73,8 MB/s
*** --regionsize 256M
1048576000 bytes (1,0 GB, 1000 MiB) copied, 11,6581 s, 89,9 MB/s


Is there a way to change the regionsize for an existing LV?



I'm afraid there is not yet support for runtime 'regionsize' change
other then rebuilding array.

But your numbers are really the item to think about.

Lvm2 surely should pick here more sensible default value.

But md raid still seems to pay too big price even with 64M - there
is likely some room for improvement here I'd say...

Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] auto_activation_volume_list in lvm.conf not honored

2016-11-25 Thread Zdenek Kabelac


Dne 24.11.2016 v 15:02 Stefan Bauer napsal(a):

Hi Peter,

now all make sense. On this ubuntu machine upstartd with udev is taking care of 
vgchange.

After some digging, /lib/udev/rules.d/85-lvm2.rules shows, that vgchange is 
only executed with -a y

We will test this on weekend but I'm certain now, that this was the problem.

We wanted to keep things simple and do not use clvmd. We have master/slave 
setup without concurrent write/read.


Hi

Wondering what are you trying to do when you say  'not use clvmd'.

If you are working with 'shared' storage and manipulating  same VG from 
multiple nodes (i.e. activation) - it's not so easy to go without really good 
locking manager.


If you don't like clvmd - maybe you could take a look at lvmlockd & sanlock.

But you should be aware there are NOT many people who are able to ensure 
correct locking of lvm2 commands -  it's really not just 'master/slave'.

There are big number of error cases which do not proper locking around
the whole cluster.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] very slow sequential writes on lvm raid1 (bitmap?)

2016-11-18 Thread Zdenek Kabelac


Dne 8.11.2016 v 16:15 Alexander 'Leo' Bergolth napsal(a):

On 11/08/2016 10:26 AM, Zdenek Kabelac wrote:

Dne 7.11.2016 v 16:58 Alexander 'Leo' Bergolth napsal(a):

On 11/07/2016 11:22 AM, Zdenek Kabelac wrote:
Is there a way to change the regionsize for an existing LV?


I'm afraid there is not yet support for runtime 'regionsize' change
other then rebuilding array.


Unfortunately even rebuilding (converting to linear and back to raid1)
doesn't work.

lvconvert seems to ignore the --regionsize option and use defaults:

lvconvert -m 0 /dev/vg_sys/lv_test
lvconvert --type raid1 -m 1 --regionsize 128M /dev/vg_sys/lv_test

[10881847.012504] mdX: bitmap initialized from disk: read 1 pages, set
4096 of 4096 bits

... which translates to a regionsize of 512k for a 2G volume.




Hi

After doing some simulations here -

What is the actually USB device type used here?

Aren't you trying to deploy some attached SD-Card/USB-Flash  as your secondary 
leg?


Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] deadlock during lvm snapshot

2016-11-29 Thread Zdenek Kabelac


Dne 28.11.2016 v 12:58 Tomaz Beltram napsal(a):

Hi,

I'm doing backup of a running mongodb using LVM snapshot. Sometimes I
run into a deadlock situation and kernel reports blocked tasks for jbd2,
mongod, dmeventd and my tar doing backup.

This happens very rarely (one in a thousand) but the effect is rather
severe as mongodb stops working. I also can't remove and unmount the
snapshot. I have attached syslog of two occurrences. The stack traces of
the blocked tasks are very similar so I suspect a potential race
condition between the affected tasks.

Sep 15 17:06:53 dss2 kernel: [719277.567904] INFO: task jbd2/dm-2-8:9048
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568130] INFO: task mongod:23239
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568267] INFO: task mongod:23242
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568350] INFO: task mongod:23243
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568397] INFO: task dmeventd:12427
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568523] INFO: task
kworker/u16:2:31890 blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568713] INFO: task tar:12446
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567614] INFO: task jbd2/dm-2-8:9048
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567731] INFO: task mongod:23239
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567870] INFO: task mongod:23240
blocked for more than 120 seconds.

Nov 25 17:10:56 dss2 kernel: [282360.865020] INFO: task jbd2/dm-4-8:878
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.865624] INFO: task mongod:1652
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.866279] INFO: task mongod:1655
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.866771] INFO: task mongod:1656
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.867294] INFO: task dmeventd:3504
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.867783] INFO: task
kworker/u16:2:8016 blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.868351] INFO: task tar:3560 blocked
for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.868865] INFO: task
kworker/u16:1:5561 blocked for more than 120 seconds.
Nov 25 17:12:56 dss2 kernel: [282480.868656] INFO: task jbd2/dm-4-8:878
blocked for more than 120 seconds.
Nov 25 17:12:56 dss2 kernel: [282480.869015] INFO: task mongod:1652
blocked for more than 120 seconds.

Full syslogs of two occurrences are attached.

I have Ubuntu 16.04.1 (lvm2 2.02.133-1ubuntu10) with mongod 3.2.9 on a
64bit system.




Please switch to newer version of lvm2.

Sequence with  snapshot activation had been reworked to minimize possibility 
to hit this kernel race - race is still there even with the latest kernel,

but in the real world you should not have a much chance to hit it.
If you still do - please report again - we will take more closer look and your 
workflow.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Again: duplicate PVs - filter rules

2017-01-14 Thread Zdenek Kabelac


Dne 14.1.2017 v 06:41 kn...@knebb.de napsal(a):

Hi all,

I thought my filter rules where fine now. But they are not.

I have in /etc/lvm/lvm.conf:

 filter = [ "r|/dev/sdb|","r|/dev/sdc|" ]

I scan my PVs:

[root@backuppc ~]# pvscan --cache
  WARNING: PV AvK0Vn-vAdJ-K4nf-0N1x-u1fR-dlWG-dJezdg on /dev/sdc was
already found on /dev/drbd1.
  WARNING: Disabling lvmetad cache which does not support duplicate PVs.
  WARNING: Not using lvmetad because duplicate PVs were found.
[root@backuppc ~]# pvscan
  WARNING: PV AvK0Vn-vAdJ-K4nf-0N1x-u1fR-dlWG-dJezdg on /dev/sdc was
already found on /dev/drbd1.
  WARNING: Disabling lvmetad cache which does not support duplicate PVs.
  WARNING: Not using lvmetad because duplicate PVs were found.
  PV /dev/sda2VG cl  lvm2 [15,00 GiB / 0free]
  PV /dev/drbd1   VG testlvm2 [3,00 GiB / 0free]
  Total: 2 [17,99 GiB] / in use: 2 [17,99 GiB] / in no VG: 0 [0   ]

So question again, why does it accept my /dev/sdc as PV discarding
filter rules?


'pvscan --cache' is different from other lvm commands - it controls filling
of 'lvmetad' from udev rules.

So if you plan to use lvmetad - you need to have uniqueness on 'global' level.

So that's why there is global_filter.

So you should copy  your filter  to  global_filter.

Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] deadlock during lvm snapshot target

2016-11-29 Thread Zdenek Kabelac


Dne 29.11.2016 v 11:28 Tomaz Beltram napsal(a):

On 29. 11. 2016 09:38, Zdenek Kabelac wrote:


Please switch to newer version of lvm2.

Sequence with snapshot activation had been reworked to minimize possibility
to hit this kernel race - race is still there even with the latest kernel, but 
in the real world you should not have a much chance to hit it.

Thanks for the explanation. I reported this to Ubuntu launchpad. Please add
your votes.

https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1645636




Yep just a minor comment - it's not 'fixed' - but the chance to hit the race 
have been considerably decreased - but we still have an example where it could 
be demonstrated (and it's not just 'dm' - it's more generic - see Mikulas's post)


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM filter ignored?

2017-01-02 Thread Zdenek Kabelac


Dne 2.1.2017 v 20:20 kn...@knebb.de napsal(a):

Hi,



So I tried:
filter = [ "r|sdb.*/|","a|drbd0/|","a|.*/|" ]



Just check the 'comment' for filters in lvm.conf -
there is stated you should NOT mix  'r' & 'a' together
(there is even no need for this in 99.%)


Wellit writes "be careful mixing" but in the examples mixing is
shown. Appeared not to be s important.


Yep looking at the example it's actually bad one ;) will be fixed.




- either use plain list of rejects  'r'
- or set 'white-list' with all 'a'  finished with 'r .*'

To to be exact in your case:

Use just:[ "r|/dev/sdb|" ]


I set this not as you recommended and indeed it appears to be working,


When you reject a device - and then you accept everything - device will
be accepted via 'different' link/mknod whatever it finds...
And there are now many links to reference the same device.

When you just reject  sdb  - you will reject all it's symlink (implicit)
But when you then in the same filter row later  'accept'  sdb as some 
'/dev/disk-by...' you explicitly overrule this 'implicit' reject.


By default what is NOT reject is 'accepted' by default in sence 'it's been
not rejected' by any previous filter.

It's almost similar to having explicit 'a|.*|' at the end, but will not accept 
rejected 'implicit' symlinks ;)


OK - enough - I'm pretty sure noone can actually understand this :)

Clear message is -  use  either  WHITE-LIST,  or  plain REJECT LIST
follow this rule and you have always clear logic.

That's basically all you need ;)



Thanks, even I do not have a clue what I did wrong as I remember I had
it this way. But I moght be wrong as it did not work as I configured it.


As said - it's hard to explain :)
And no - I've not been designing this logic...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID

2017-03-20 Thread Zdenek Kabelac


Dne 20.3.2017 v 10:47 Gionatan Danti napsal(a):

Hi all,
any comments on the report below?

Thanks.


Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may  change).

Not sure for which other comment you look for.

Zdenek





On 09/03/2017 16:33, Gionatan Danti wrote:

On 09/03/2017 12:53, Zdenek Kabelac wrote:


Hmm - it would be interesting to see your 'metadata' -  it should be
still
quite good fit 128M of metadata for 512G  when you are not using
snapshots.

What's been your actual test scenario ?? (Lots of LVs??)



Nothing unusual - I had a single thinvol with an XFS filesystem used to
store an HDD image gathered using ddrescue.

Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
512GB volume with 128 KB chunks? My testing suggests something
different. For example, give it a look at this empty thinpool/thinvol:

[root@gdanti-laptop test]# lvs -a -o +chunk_size
  LV   VGAttr   LSize   Pool Origin Data%
Meta%  Move Log Cpy%Sync Convert Chunk
  [lvol0_pmspare]  vg_kvmewi--- 128.00m
 0
  thinpool vg_kvmtwi-aotz-- 500.00g 0.00
0.81 128.00k
  [thinpool_tdata] vg_kvmTwi-ao 500.00g
 0
  [thinpool_tmeta] vg_kvmewi-ao 128.00m
 0
  thinvol  vg_kvmVwi-a-tz-- 500.00g thinpool0.00
 0
  root vg_system -wi-ao  50.00g
 0
  swap vg_system -wi-ao   3.75g
 0

As you can see, as it is a empty volume, metadata is at only 0.81% Let
write 5 GB (1% of thin data volume):

[root@gdanti-laptop test]# lvs -a -o +chunk_size
  LV   VGAttr   LSize   Pool Origin Data%
Meta%  Move Log Cpy%Sync Convert Chunk
  [lvol0_pmspare]  vg_kvmewi--- 128.00m
 0
  thinpool vg_kvmtwi-aotz-- 500.00g 1.00
1.80 128.00k
  [thinpool_tdata] vg_kvmTwi-ao 500.00g
 0
  [thinpool_tmeta] vg_kvmewi-ao 128.00m
 0
  thinvol  vg_kvmVwi-a-tz-- 500.00g thinpool1.00
 0
  root vg_system -wi-ao  50.00g
 0
  swap vg_system -wi-ao   3.75g
 0

Metadata grown by the same 1%. Accounting for the initial 0.81
utilization, this means that a near full data volume (with *no*
overprovisionig nor snapshots) will exhaust its metadata *before* really
becoming 100% full.

While I can absolutely understand that this is expected behavior when
using snapshots and/or overprovisioning, in this extremely simple case
metadata should not be exhausted before data. In other words, the
initial metadata creation process should be *at least* consider that a
plain volume can be 100% full, and allocate according.

The interesting part is that when not using MD, all is working properly:
metadata are about 2x their minimal value (as reported by
thin_metadata_size), and this provide ample buffer for
snapshotting/overprovisioning. When using MD, the bad iteration between
RAID chunks and thin metadata chunks ends with a too small metadata volume.

This can become very bad. Give a look at what happens when creating a
thin pool on a MD raid whose chunks are at 64 KB:

[root@gdanti-laptop test]# lvs -a -o +chunk_size
  LV   VGAttr   LSize   Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
  [lvol0_pmspare]  vg_kvmewi--- 128.00m
0
  thinpool vg_kvmtwi-a-tz-- 500.00g 0.00   1.58
64.00k
  [thinpool_tdata] vg_kvmTwi-ao 500.00g
0
  [thinpool_tmeta] vg_kvmewi-ao 128.00m
0
  root vg_system -wi-ao  50.00g
0
  swap vg_system -wi-ao   3.75g
0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.


But as said - there is no guarantee of the size to fit for any possible
use case - user  is supposed to understand what kind of technology he is
using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.


True, but this trivial case should really works without
tuning/monitoring. In short, let fail gracefully on a simple case...


Otherwise you would have to simply always create 16G metadata LV if you
do not want to run out of meta

Re: [linux-lvm] stripped LV with segments vs one segment

2017-04-10 Thread Zdenek Kabelac


Dne 10.4.2017 v 11:29 lejeczek napsal(a):

hi there

I could not extend my stripped LV, had 3 stripes and wanted to add one more.
Only way LVM let me do it was where I ended up with this:

  --- Segments ---
  Logical extents 0 to 751169:
Typestriped
Stripes3
Stripe size16.00 KiB
Stripe 0:
  Physical volume/dev/sdd
  Physical extents0 to 250389
Stripe 1:
  Physical volume/dev/sde
  Physical extents0 to 250389
Stripe 2:
  Physical volume/dev/sdc
  Physical extents0 to 250389

  Logical extents 751170 to 1001559:
Typelinear
Physical volume/dev/sdf
Physical extents0 to 250389

1st question - was this really the only way LVM would extend?
2nd - is there performance penalty with segments like above vs one stripped
segment?



Hi


Not really sure what you aim to do.

If you have LV segment with 3 stripes - you have to keep also extension using 
3 stripes -  you can't  have 1st. halve of LV spanning 3 disk and add there a 
new LV segment as linear - as listed in this post.


Both segments must by striped.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] stripped LV with segments vs one segment

2017-04-10 Thread Zdenek Kabelac


Dne 10.4.2017 v 13:16 lejeczek napsal(a):



On 10/04/17 12:03, Zdenek Kabelac wrote:

Dne 10.4.2017 v 11:29 lejeczek napsal(a):

hi there

I could not extend my stripped LV, had 3 stripes and wanted to add one more.
Only way LVM let me do it was where I ended up with this:

  --- Segments ---
  Logical extents 0 to 751169:
Typestriped
Stripes3
Stripe size16.00 KiB
Stripe 0:
  Physical volume/dev/sdd
  Physical extents0 to 250389
Stripe 1:
  Physical volume/dev/sde
  Physical extents0 to 250389
Stripe 2:
  Physical volume/dev/sdc
  Physical extents0 to 250389

  Logical extents 751170 to 1001559:
Typelinear
Physical volume/dev/sdf
Physical extents0 to 250389

1st question - was this really the only way LVM would extend?
2nd - is there performance penalty with segments like above vs one stripped
segment?



Hi


Not really sure what you aim to do.

If you have LV segment with 3 stripes - you have to keep also extension
using 3 stripes -  you can't  have 1st. halve of LV spanning 3 disk and add
there a new LV segment as linear - as listed in this post.

Both segments must by striped.

Regards

Zdenek


I had 3 stripe LV, you know, three PVs, and wanted the LV to have 4 stripes,
wanted to add 4th PV, you can see it from above lvdisplay.
I tried these and each time it errored:
$ lvextend -v -i 4 -l+100%free dellH200.InternalB/0
$ lvextend -v -i 4 -l+100%pv dellH200.InternalB/0 /dev/sdf


You can't request stripe 4  (needs 4 disks)  and pass just single  /dev/sdf 
device.



Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Zdenek Kabelac


Dne 13.4.2017 v 15:52 Xen napsal(a):

Stuart Gathman schreef op 13-04-2017 14:59:


If you are going to keep snapshots around indefinitely, the thinpools
are probably the way to go.  (What happens when you fill up those?
Hopefully it "freezes" the pool rather than losing everything.)


My experience is that the system crashes.

I have not tested this with a snapshot but a general thin pool overflow 
crashes the system.


Within half a minute, I think.

It is irrelevant whether the volumes had anything to do with the operation of 
the system; ie. some mounted volumes that you write to that are in no other 
use will crash the system.


Hello

Just let's repeat.

Full thin-pool is NOT in any way comparable to full filesystem.

Full filesystem has ALWAYS room for its metadata - it's not pretending it's 
bigger - it has 'finite' space and expect this space to just BE there.


Now when you have thin-pool - it cause quite a lot of trouble across number of 
layers.  There are solvable and being fixed.


But as the rule #1 still applies - do not run your thin-pool out of space - it 
will not always heal easily without losing date - there is not a simple 
straighforward way how to fix it (especially when user cannot ADD any new 
space he promised to have)


So monitoring pool and taking action ahead in time is always superior solution 
to any later  postmortem systems restores.



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-14 Thread Zdenek Kabelac


Dne 14.4.2017 v 11:07 Gionatan Danti napsal(a):

Il 14-04-2017 10:24 Zdenek Kabelac ha scritto:


But it's currently impossible to expect you will fill the thin-pool to
full capacity and everything will continue to run smoothly - this is
not going to happen.


Even with EXT4 and errors=remount-ro?


While usage of 'remount-ro' may prevent any significant damage of filesystem
as such - since the 1st. problem detected by ext4 stops - it's still not quite 
trivial to proceed easily further.


The problem is not with 'stopping' access - but to gain the access back.

So in this case - you need to run 'fsck' - and this fsck usually needs more 
space - and the complexity starts with - where to get this space.


In the the 'most trivial' case - you have the space in 'VG' - you just extend 
thin-pool and you run 'fsck' and it works.


But then there is number of cases ending with the case that you run out of 
metadata space that has the maximal size of ~16G so you can't even extend it, 
simply because it's unsupported to use any bigger size.


So while every case has some way forward how to proceed - none of them could 
be easily automated.


And it's so much easier to monitor and prevent this to happen compared with 
solving these thing later.


So all is needed is - user is aware what he is using and does proper action 
and proper time.






However there are many different solutions for different problems -
and with current script execution - user may build his own solution -
i.e.  call
'dmsetup remove -f' for running thin volumes - so all instances get
'error' device   when pool is above some threshold setting (just like
old 'snapshot' invalidation worked) - this way user will just kill
thin volume user task, but will still keep thin-pool usable for easy
maintenance.



Interesting. However, the main problem with libvirt is that its pool/volume 
management fall apart when used on thin-pools. Basically, libvirt does not 
understand that a thinpool is a container for thin volumes (ie: 
https://www.redhat.com/archives/libvirt-users/2014-August/msg00010.html)


Well lvm2 provides the low-level tooling here

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-23 Thread Zdenek Kabelac


Dne 22.4.2017 v 18:32 Xen napsal(a):

Gionatan Danti schreef op 22-04-2017 9:14:

Il 14-04-2017 10:24 Zdenek Kabelac ha scritto:

However there are many different solutions for different problems -
and with current script execution - user may build his own solution -
i.e.  call
'dmsetup remove -f' for running thin volumes - so all instances get
'error' device   when pool is above some threshold setting (just like
old 'snapshot' invalidation worked) - this way user will just kill
thin volume user task, but will still keep thin-pool usable for easy
maintenance.



This is a very good idea - I tried it and it indeed works.


So a user script can execute dmsetup remove -f on the thin pool?

Oh no, for all volumes.

That is awesome, that means a errors=remount-ro mount will cause a remount 
right?


Well 'remount-ro' will fail but you will not be able to read anything
from volume as well.

So as said - many users many different solutions are needed...

Currently lvm2 can't support that much variety and complexity...





However, it is not very clear to me what is the best method to monitor
the allocated space and trigger an appropriate user script (I
understand that versione > .169 has %checkpoint scripts, but current
RHEL 7.3 is on .166).

I had the following ideas:
1) monitor the syslog for the "WARNING pool is dd.dd% full" message;


This is what my script is doing of course. It is a bit ugly and a bit messy by 
now, but I could still clean it up :p.


However it does not follow syslog, but checks periodically. You can also 
follow with -f.


It does not allow for user specified actions yet.

In that case it would fulfill the same purpose as > 169 only a bit more poverly.


One more thing: from device-mapper docs (and indeed as observerd in my
tests), the "pool is dd.dd% full" message is raised one single time:
if a message is raised, the pool is emptied and refilled, no new
messages are generated. The only method I found to let the system
re-generate the message is to deactiveate and reactivate the thin pool
itself.


This is not my experience on LVM 111 from Debian.

For me new messages are generated when:

- the pool reaches any threshold again
- I remove and recreate any thin volume.

Because my system regenerates snapshots, I now get an email from my script 
when the pool is > 80%, every day.


So if I keep the pool above 80%, every day at 0:00 I get an email about it :p. 
Because syslog gets a new entry for it. This is why I know :p.


The explanation here is simple - when you create a new thinLV - there is 
currently full suspend - and before 'suspend' pool is 'unmonitored'

after resume again monitored - and you get your warning logged again.


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-23 Thread Zdenek Kabelac


Dne 23.4.2017 v 07:29 Xen napsal(a):

Zdenek Kabelac schreef op 22-04-2017 23:17:

That is awesome, that means a errors=remount-ro mount will cause a remount 
right?


Well 'remount-ro' will fail but you will not be able to read anything
from volume as well.


Well that is still preferable to anything else.

It is preferable to a system crash, I mean.

So if there is no other last rather, I think this is really the only last 
resort that exists?


Or maybe one of the other things Gionatan suggested.


Currently lvm2 can't support that much variety and complexity...


I think it's simpler but okay, sure...

I think pretty much anyone would prefer a volume-read-errors system rather 
than a kernel-hang system.


I'm just currious -  what the you think will happen when you have
root_LV as thin LV and thin pool runs out of space - so 'root_LV'
is replaced with 'error' target.

How do you think this will be ANY different from hanging your system ?



It is just not of the same magnitude of disaster :p.


IMHO reboot is still quite fair solution in such case.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] convert LV to physical device _in_place_?

2017-07-13 Thread Zdenek Kabelac


Dne 13.7.2017 v 15:25 Matthias Leopold napsal(a):

hi,

i'm fiddling around with LVM backed KVM raw disks, that i want to use 
_directly_ in oVirt virtualization (as "Direct LUN"). i would like to avoid 
"importing", dd, etc. if possible. in the KVM origin system exists a mapping 
of one iSCSI whole device (no partitions) to one PV to one VG to one LV per 
KVM disk. i can now present these iSCSI devices to the oVirt hosts, where i 
can only use them as "Direct LUN" without the LVM layer (i guess). so all i 
would need is to remove the LVM metadata from the iSCSI device preserving the 
"content" of the LV. pvremove seems to be too "intrusive" for this... i know 
this sounds rather naive and i didn't find any recipes for it. everybody talks 
about dd to a new device, which of course works (i tried it), but is exactly 
what i want to avoid when migrating very large disks (TB). i hope someone 
understands my concern and maybe has a solution





Hi

Yep, it really sounds a bit naive :)

PV has 'header' so the real 'data' are shifted by  PV header+lvm2 metadata.
and also LV does not need to be sequential.

However if you have been having a single 'segment' LV and and you calculate
proper skipping offset (typically 1MB) you can try to use such device directly 
without lvm2 with a loop device mapping -  see losetup --offset option - but 
you would need to use this everytime you want to access such storage - so 
there is not a single magic 'flip' which would make an LV to appear like

regular device.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-26 Thread Zdenek Kabelac


Dne 26.4.2017 v 09:26 Gionatan Danti napsal(a):

Il 24-04-2017 23:59 Zdenek Kabelac ha scritto:

If you set  '--errorwhenfull y'  - it should instantly fail.


It's my understanding that "--errorwhenfull y" will instantly fail writes 
which imply new allocation requests, but writes to already-allocated space 
will be completed.


yes you understand it properly.



It is possible, without messing directly with device mapper (via dmsetup), to 
configure a strict "read-only" policy, where *all* writes (both to allocated 
or not allocated space) will fail?


Nope it's not.



It is not possible to do via lvm tools, what/how device-mapper target should 
be used?


At this moment it's not possible.
I do have some plans/idea how to workaround this in user-space but it's 
non-trivial - especially on recovery path.


It would be possible to 'reroute' thin to dm-delay and then write path to 
error and read path leave as is - but it's adding many new states to handle,

to ATM it's in queue...

Using 'ext4' with remount-ro is fairly easy to setup and get exactly this
logic.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-26 Thread Zdenek Kabelac


Dne 26.4.2017 v 10:10 Gionatan Danti napsal(a):


I'm not sure this is sufficient. In my testing, ext4 will *not* remount-ro on 
any error, rather only on erroneous metadata updates. For example, on a 
thinpool with "--errorwhenfull y", trying to overcommit data with a simple "dd 
if=/dev/zero of=/mnt/thinvol bs=1M count=1024 oflag=sync" will cause I/O 
errors (as shown by dmesg), but the filesystem is *not* immediately remounted 
read-only. Rather, after some time, a failed journal update will remount it 
read-only.


You need to use 'direct' write more - otherwise you are just witnessing issues 
related with 'page-cache' flushing.


Every update of file means update of journal - so you surely can lose some 
data in-flight - but every good software needs to the flush before doing next 
transaction - so with correctly working transaction software no data could be 
lost.




XFS should behave similarly, with the exception that it will shutdown the 
entire filesystem (ie: not even reads are allowed) when metadata errors are 
detected (see note n.1).


Yep - XFS is slightly different - but it gets improved, however some new 
features are not enabled by default and user needs to enabled them.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-26 Thread Zdenek Kabelac


Dne 26.4.2017 v 15:37 Gionatan Danti napsal(a):


On 26/04/2017 13:23, Zdenek Kabelac wrote:


You need to use 'direct' write more - otherwise you are just witnessing
issues related with 'page-cache' flushing.

Every update of file means update of journal - so you surely can lose
some data in-flight - but every good software needs to the flush before
doing next transaction - so with correctly working transaction software
no data could be lost.


I used "oflag=sync" for this very reason - to avoid async writes, However, 
let's retry with "oflat=direct,sync".


This is the thinpool before filling:

[root@blackhole mnt]# lvs
   LV   VGAttr   LSize  Pool Origin Data%  Meta% Move Log 
Cpy%Sync Convert

   thinpool vg_kvmtwi-aot---  1.00g 87.66  12.01
   thinvol  vg_kvmVwi-aot---  2.00g thinpool43.83
   root vg_system -wi-ao 50.00g
   swap vg_system -wi-ao  7.62g

[root@blackhole storage]# mount | grep thinvol
/dev/mapper/vg_kvm-thinvol on /mnt/storage type ext4 
(rw,relatime,seclabel,errors=remount-ro,stripe=32,data=ordered)



Fill the thin volume (note that errors are raised immediately due to 
--errorwhenfull=y):


[root@blackhole mnt]# dd if=/dev/zero of=/mnt/storage/test.2 bs=1M count=300 
oflag=direct,sync

dd: error writing ‘/mnt/storage/test.2’: Input/output error
127+0 records in
126+0 records out
132120576 bytes (132 MB) copied, 14.2165 s, 9.3 MB/s

 From syslog:

Apr 26 15:26:24 localhost lvm[897]: WARNING: Thin pool vg_kvm-thinpool-tpool 
data is now 96.84% full.
Apr 26 15:26:27 localhost kernel: device-mapper: thin: 253:4: reached low 
water mark for data device: sending event.
Apr 26 15:26:27 localhost kernel: device-mapper: thin: 253:4: switching pool 
to out-of-data-space (error IO) mode
Apr 26 15:26:34 localhost lvm[897]: WARNING: Thin pool vg_kvm-thinpool-tpool 
data is now 100.00% full.


Despite write errors, the filesystem is not in read-only mode:



But you get correct 'write' error - so from application POV - you get failing
transaction update/write - so app knows  'data' were lost and should not 
proceed with next transaction - so it's in line with  'no data is lost' and 
filesystem is not damaged and is in correct state (mountable).



Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-24 Thread Zdenek Kabelac


Dne 24.4.2017 v 15:49 Gionatan Danti napsal(a):



On 22/04/2017 23:22, Zdenek Kabelac wrote:

ATM there is even bug for 169 & 170 - dmeventd should generate message
at 80,85,90,95,100 - but it does it only once - will be fixed soon...


Mmm... quite a bug, considering how important is monitoring. All things 
considered, what do you feel is the better approach to monitor? It is 
possibile to register for dmevents?


Not all that big once - you get 1 WARNING always.
And releases  169 & 170 are clearly marked as developer releases - so they are 
meant for testing and discovering these bugs...



Not seen metadata error for quite long time...
Since all the updates are CRC32 protected it's quite solid.


Great! Are the metadata writes somehow jounaled or are written in-place?


Surely there is journal


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Sector size is assumed 512

2017-05-12 Thread Zdenek Kabelac


Dne 11.5.2017 v 12:39 Tomasz Lasko napsal(a):

Hi,

I'm not a part of the list or the project, just a random guy dropping by to 
say I found one suspicious thing:


after looking for what 's' size stands for, I found that your lvmcmdline.c 
source code 
 probably 
assumes that sector size is 512, but there are various sector sizes out in 
world (both for the hardware sector size and logical disk interfaces like 
SCSI) especially more and more popular 4096 byte sector size.


I wonder if apart of lvmcmdline.c above, also other parts of your software 
assume that sector size will always be 512. If yes, then I suggest rethinking 
if 4k sectors might break some operations in LVM.


By the way, I understand that when specifying command line parameter sector 
size, then it is the same for small 's' as capital 'S', right?  And the same 
goes for bytes ('b' is the same as 'B'), right?




Hi

Sector size is 'fixed'  for  dm devices for 512b.

If you work with 4K sector disks you need to use/specify size multiplied by 8.

It would get really messy if you would have multiple different PVs in a VG 
with  different sector size and you would want to print i.e. size of LV in 
sectors.


So nope - we can't change this fixed 'base' value.

Normally users do work size in  MiB or GiB eventually TiB.

lvm2 never advises users to go to sector level precision for numerous reason.

If you care about single sector - lvm2 is likely not a tool to be considered 
for usage.



Also note - default alignment is 4MiB - so unless you would override this 
extent-size below 4K, you can't actually hit issues with wrong alignment

(and likely >99% users never change this setting)

And BTW it's explicitly mentioned that 'sector is 512b' it has not attachment 
to physical device sector size...


Regards


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] on discards

2017-05-15 Thread Zdenek Kabelac


Dne 15.5.2017 v 05:16 Xen napsal(a):

On discards,

I have a thin pool that filled up when there was enough space, because the 
filesystem hadn't issued discards.


I have ext4 mounted with the discard option, I hope, because it is in the list 
of default mount options of tune2fs:


Default mount options:user_xattr acl discard

What can I do to ensure that it won't happen again?

Is the filesystem really mounted with discard?

It doesn't say so anywhere.




Hi


I'd not suggest to mount fs with discard - but rather run 'fstrim' in your 
cron once a while.


Standalone 'small trim' on deleted file is quite inefficient and will not 
resolve aggregation of continuous blocks for proper trimming.


For thin-pool - min size of trim is i.e. whole 64K chunk, - when it sits on 
SSD you need to even trim much multiple pool chunks to free something on your SSD.


When you run 'fstrim'   whole region of unused blocks is trimmed - to it's 
always biggest possible chunk.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-05-16 Thread Zdenek Kabelac


Dne 16.5.2017 v 09:53 Gionatan Danti napsal(a):

On 15/05/2017 17:33, Zdenek Kabelac wrote:> Ever tested this:


mount -o errors=remount-ro,data=journal ?


Yes, I tested it - same behavior: a full thinpool does *not* immediately put 
the filesystem in a read-only state, even when using sync/fsync and 
"errorwhenfull=y".


Hi

Somehow I think you've rather made a mistake during your test (or you have 
buggy kernel). Can you take full log of your test  show all options are

properly applied

i.e. dmesg log +  /proc/self/mountinfo report showing all options used for 
mountpoint and kernel version in use.


IMHO you should get something like this in dmesg once your pool gets out of 
space and starts to return error on write:



Aborting journal on device dm-4-8.
EXT4-fs error (device dm-4): ext4_journal_check_start:60: Detected aborted 
journal
EXT4-fs (dm-4): Remounting filesystem read-only



Clearly when you specify 'data=journal'  even write failure of data will cause 
journal error and thus remount-ro reaction (it least on my box does it) - but 
such usage is noticeable slower compared with 'ordered' mode.



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] lvm.conf parameter for PVs to use for lvcreate ?

2017-05-29 Thread Zdenek Kabelac


Dne 23.5.2017 v 17:57 Oliver Rath napsal(a):

Hi list,

if I use mulitiple PVs, Im able to select these PVs which should be used
by my lv, i.e.

lvcreate --size 10G --name mylv myvg /dev/sda3 /dev/sdb3

Is it possible to set these "/dev/sda3 /dev/sdb3" as default in lvm.conf
if nothing is explicitly given?




You can set which PV is not 'allocatable'
(see 'man pvchange --allocatable)

This should be approximately what you want aim to have.

Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM send/receive support?

2017-06-05 Thread Zdenek Kabelac


Dne 5.6.2017 v 10:48 Gionatan Danti napsal(a):

Hi Zdenek,
thanks for pointing me to thin_delta - very useful utility. Maybe I can code 
around it...


As an additional question, is direct lvm2 support for send/receive planned, or 
not?


Hi

It will certainly happen - just not sure in which year...
ATM higher prio tasks are on the table.


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM send/receive support?

2017-06-06 Thread Zdenek Kabelac


Dne 5.6.2017 v 10:11 Gionatan Danti napsal(a):

Hi all,
I wonder if using LVM snapshots (even better: lvmthin snapshots, which are way 
faster) we can wire something like zfs or btrfs send/receive support.


In short: take a snapshot, do a full sync, take another snapshot and sync only 
the changed blocks.


I know that exists an utility called lvmsync[1], but it seem not maintained 
anymore. Any suggestion on how to do something similar?



Hi

Unfortunately there is not yet direct lvm2 support - but you can do it 
manually via  thin-pool and it support tools.


Look at the tool  'thin_delta'  which you can use exactly for the purpose you 
describe.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume

2017-09-14 Thread Zdenek Kabelac


Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a):



On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac <zkabe...@redhat.com> wrote:

Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a):

Distribution: centos-release-7-3.1611.el7.centos.x86_64
Kernel: Linux 3.10.0-514.26.2.el7.x86_64
LVM: 2.02.166(2)-RHEL7 (2016-11-16)
Volume group consisted of an 8-drive SSD (500G drives) array, plus an 
additional SSD of the same size.  The array had 64 k stripes.
Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with 
metadata volume 16G.  data was entirely on the 8-drive raid, metadata was 
entirely on the 9th drive.
Virtual volume “thin” was 300 GB.  I also filled it with dd so that it would be 
fully provisioned before the test.
Volume “thick” was also 300GB, just an ordinary volume also entirely on the 
8-drive array.
Four tests were run directlyagainst each volume using fio-2.2.8, random read, 
random write, sequential read, sequential write.  Single thread, 4k blocksize, 
90s run time.


Hi

Can you please provide output of:

lvs -a -o+stripes,stripesize,seg_pe_ranges

so we can see how is your stripe placed on devices ?


Sure, thank you for your help:
# lvs -a -o+stripes,stripesize,seg_pe_ranges
   LV   VG Attr   LSize   Pool Origin Data%  Meta%  
Move Log Cpy%Sync Convert #Str Stripe PE Ranges
   [lvol0_pmspare]  volgr0 ewi---  16.00g   
 1 0  /dev/md127:867328-871423
   thickvolgr0 -wi-a- 300.00g   
 1 0  /dev/md127:790528-867327
   thin volgr0 Vwi-a-t--- 300.00g thinpool100.00
 0 0
   thinpool volgr0 twi-aot---   3.00t 9.77   0.13   
 1 0  thinpool_tdata:0-786431
   [thinpool_tdata] volgr0 Twi-ao   3.00t   
 1 0  /dev/md127:0-786431
   [thinpool_tmeta] volgr0 ewi-ao  16.00g   
 1 0  /dev/sdb4:0-4095

md127 is an 8-drive RAID 0

As you can see, there’s no lvm striping; I rely on the software RAID underneath 
for that.  Both thick and thin lvols are on the same PV.


SSD typically do needs ideally write 512K chunks.


I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect that 
to have any impact on a single threaded test using 4k request size.  Is there a 
hidden relationship that I’m unaware of?



Yep - it seems the setup in this case is the best fit.

If you can reevaluate different setups you may possibly get much higher 
throughput.


My guess would be - the best targeting layout should be probably striping no 
more then 2-3 disks and use bigger striping block.


And then just 'join' 'smaller' arrays together in lvm2 in 1 big LV.





(something like  'lvcreate -LXXX -i8 -I512k vgname’)


Would making lvm stripe on top of an md that already stripes confer any 
performance benefit in general, or for small (4k) requests in particular?


Rule #1 - try to avoid 'over-combining' things together.
 - measure performance from 'bottom'  upward in your device stack.
If the underlying devices gives poor speed - you can't make it better by any 
super0smart disk-layout on top of it.






Wouldn't be 'faster' to just concatenate 8 disks together instead of striping - 
or stripe only across 2 disk - and then you concatenate 4 such striped areas…


For sustained throughput I would expect striping of 8 disks to blow away 
concatenation — however, for small requests I wouldn’t expect any advantage.  
On a non-redundant array, I would expect a single threaded test using 4k 
requests is going to end up reading/writing data from exactly one disk 
regardless of whether the underlying drives are concatenated or stripes.

It always depends which kind of load you expect the most.

I suspect spreading 4K blocks across 8 SSD is likely very far away from ideal 
layout.


Any SSD is typically very bad with 4K blocks -  it you want to 'spread' the 
load on mores SSDs  do not use less the 64K stripe chunks per SSD - this gives 
you (8*64)  512K stripe size.


As for thin-pool chunksize -  if you plan to use lots of snapshots - keep the 
value lowest possible - 64K  or 128K thin-pool chunksize.


But I'd still suggest to reevaluate/benchmark setup where you will use much 
lower number of SSD for load spreading - and use bigger strip chunks per each 
device.  This should nicely improve performance in case of 'bigger' writes

and not that much slow things down with  4K loads



What is the best choice for handling 4k request sizes?


Possibly NVMe can do a better job here.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume

2017-09-14 Thread Zdenek Kabelac


Dne 14.9.2017 v 11:00 Zdenek Kabelac napsal(a):

Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a):



On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac <zkabe...@redhat.com> wrote:



md127 is an 8-drive RAID 0

As you can see, there’s no lvm striping; I rely on the software RAID 
underneath for that.  Both thick and thin lvols are on the same PV.


SSD typically do needs ideally write 512K chunks.


I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect 
that to have any impact on a single threaded test using 4k request size.  Is 
there a hidden relationship that I’m unaware of?



Yep - it seems the setup in this case is the best fit.


Sorry my typo here - is NOT ;)


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-15 Thread Zdenek Kabelac


Dne 15.9.2017 v 09:34 Xen napsal(a):

Zdenek Kabelac schreef op 14-09-2017 21:05:



But if I do create snapshots (which I do every day) when the root and boot 
snapshots fill up (they are on regular lvm) they get dropped which is nice,


old snapshot are different technology for different purpose.


Again, what I was saying was to support the notion that having snapshots that 
may grow a lot can be a problem.



lvm2 makes them look the same - but underneath it's very different (and it's 
not just by age - but also for targeting different purpose).


- old-snaps are good for short-time small snapshots - when there is estimation 
for having low number of changes and it's not a big issue if snapshot is 'lost'.


- thin-snaps are ideal for long-time living objects with possibility to take 
snaps of snaps of snaps and you are guaranteed the snapshot will not 'just 
dissapear' while you modify your origin volume...


Both have very different resources requirements and performance...

I am not sure the purpose of non-thin vs. thin snapshots is all that different 
though.


They are both copy-on-write in a certain sense.

I think it is the same tool with different characteristics.


That are cases where it's quite valid option to take  old-snap of thinLV and 
it will payoff...


Even exactly in the case you use thin and you want to make sure your temporary 
snapshot will not 'eat' all your thin-pool space and you want to let snapshot die.


Thin-pool still does not support shrinking - so if the thin-pool auto-grows to 
big size - there is not a way for lvm2 to reduce the thin-pool size...




That's just the sort of thing that in the past I have been keeping track of 
continuously (in unrelated stuff) such that every mutation also updated the 
metadata without having to recalculate it...


Would you prefer to spend all you RAM to keep all the mapping information for 
all the volumes and put very complex code into kernel to parse the information 
which is technically already out-of-data in the moment you get the result ??


In 99.9% of runtime you simply don't need this info.

But the purpose of what you're saying is that the number of uniquely owned 
blocks by any snapshot is not known at any one point in time.


As long as 'thinLV' (i.e. your snapshot thinLV) is NOT active - there is 
nothing in kernel maintaining its dataset.  You can have lots of thinLV active 
and lots of other inactive.



Well pardon me for digging this deeply. It just seemed so alien that this 
thing wouldn't be possible.


I'd say it's very smart ;)

You can use only very small subset of 'metadata' information for individual 
volumes.


It becomes a rather big enterprise to install thinp for anyone!!!


It's enterprise level software ;)

Because to get it running takes no time at all!!! But to get it running well 
then implies huge investment.


In most common scenarios - user knows when he runs out-of-space - it will not 
be 'pleasant' experience - but users data should be safe.


And then it depends how much energy/time/money user wants to put into 
monitoring effort to minimize downtime.


As has been said - disk-space is quite cheap.
So if you monitor and insert your new disk-space in-time  (enterprise...)  you 
have less set of problems - then if you try to fight constantly with 100% full 
thin-pool...


You have still problems even when you have 'enough' disk-space ;)
i.e. you select small chunk-size and you want extend thin-pool data volume 
beyond addressable capacity -  each chunk-size has its final maximum data size


That means for me and for others that may not be doing it professionally or in 
a larger organisation, the benefit of spending all that time may not weigh up 
to the cost it has and the result is then that you keep stuck with a deeply 
suboptimal situation in which there is little or no reporting or fixing, all 
because the initial investment is too high.


You can always use normal device - it's really about the choice and purpose...




While personally I also like the bigger versus smaller idea because you don't 
have to configure it.


I'm still proposing to use different pools for different purposes...

Sometimes spreading the solution across existing logic is way easier,
then trying to achieve some super-inteligent universal one...


Script is called at  50% fullness, then when it crosses 55%, 60%, ...
95%, 100%. When it drops bellow threshold - you are called again once
the boundary is crossed...


How do you know when it is at 50% fullness?


If you are proud sponsor of your electricity provider and you like the
extra heating in your house - you can run this in loop of course...



Threshold are based on  mapped size for whole thin-pool.

Thin-pool surely knows all the time how many blocks are allocated and free for
its data and metadata devices.


But didn't you just say you needed to process up to 16GiB to know this 
information?


Of course thin-pool has to be aware how much free space it has

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-15 Thread Zdenek Kabelac

Dne 15.9.2017 v 10:15 matthew patton napsal(a):

From the two proposed solutions (lvremove vs lverror), I think I would
prefer the second one.

I vote the other way. :)
First because 'remove' maps directly to the DM equivalent action which brought
this about. Second because you are in fact deleting the object - ie it's not
coming back. That it returns a nice and timely error code up the stack instead
of the kernel doing 'wierd things' is an implementation detail.

It's not that easy.

lvm2 cannot just 'lose' the volume which is still mapped IN table (even if it
will an error segment)

So the result of operation will be some 'LV' in the lvm2 metadata.
which could be possibly flagged for 'automatic' removal later once it's no
longer hold in use.

There could be seen 'some' similarity between snapshot marge - where lvm2
also maintains some 'fictional' volumes internally..

So 'lvm2' could possibly 'mask' device as 'removed' - or it can keep it
remapped to error target - which could be possibly usable for other things.

Not to say 'lverror' might have a use of it's own as a "mark this device as in an error
state and return EIO on every OP". Which implies you could later remove the flag and IO
could resume subject to the higher levels not having already wigged out in some fashion.
However why not change the behavior of 'lvchange -n' to do that on it's own on a previously
activated entry that still has a ref count > 0? With '--force' of course

'lvrerror' can be also used for 'lvchange -an' - so not such 'lvremoval' and
it could be used for other volumes (not just things) possibly -

so you get and lvm2 mapping of 'dmsetup wipe_table'

('lverror' would be actually something like 'lvconvert --replacewitherror'
- likely we would not add a new 'extra' command for this conversion)

With respect to freezing or otherwise stopping further I/O to LV being used by
virtual machines, the only correct/sane solution is one of 'power off' or
'suspend'. Reaching into the VM to freeze individual/all filesystems but
otherwise leave the VM running assumes significant knowledge of the VM's
internals and the luxury of time.

And 'suspend' can be dropped from this list ;) as so far lvm2 is seeing a
device left in suspend after command execution as a serious internal error,

and there is long list of good reasons for not leaking suspend devices.

Suspend is designed as short-living 'state' of device - it's not meant to be
held suspend for undefined amount of time - it cause lots of troubles to
various /dev scanning softwares (lvm2 included) - and as such it's has
racy built-in :)

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume

2017-09-14 Thread Zdenek Kabelac


Dne 14.9.2017 v 12:57 Gionatan Danti napsal(a):

On 14/09/2017 11:37, Zdenek Kabelac wrote:

Sorry my typo here - is NOT ;)


Zdenek


Hi Zdenek,
as the only variable is the LVM volume type (fat/thick vs thin), why the thin 
volume is slower than the thick one?


I mean: all other things being equal, what is holding back the thin volume?




So few more question:

What is '/dev/sdb4'  ? - is it also some fast SSD ?

([thinpool_tmeta] volgr0 ewi-ao  16.00g  1 0  /dev/sdb4:0-4095
 - just checking to be sure your metadata device is not placed on rotational 
storage device)...


What is your thin-pool chunk size - is it 64K ?
- if your raise thin-pool chunk size up - is it getting any better ?


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Zdenek Kabelac


Dne 13.9.2017 v 20:43 Xen napsal(a):


There is something else though.

You cannot set max size for thin snapshots?



We are moving here in right direction.

Yes - current thin-provisiong does not let you limit maximum number of blocks 
individual thinLV can address (and snapshot is ordinary thinLV)


Every thinLV can address  exactly   LVsize/ChunkSize  blocks at most.


This is part of the problem: you cannot calculate in advance what can happen, 
because by design, mayhem should not ensue, but what if your predictions are off?


Great - 'prediction' - we getting on the same page -  prediction is big 
problem


Being able to set a maximum snapshot size before it gets dropped could be very 
nice.


You can't do that IN KERNEL.

The only tool which is able to calculate real occupancy - is user-space 
thin_ls tool.


So all you need to do is to use the tool in user-space for this task.


This behaviour is very safe on non-thin.

It is inherently risky on thin.




(I know there are already some listed in this
thread, but I’m wondering about those folks that think the script is
insufficient and believe this should be more standard.)


You really want to be able to set some minimum free space you want per volume.

Suppose I have three volumes of 10GB, 20GB and 3GB.

I may want the 20GB volume to be least important. The 3GB volume most 
important. The 10GB volume in between.


I want at least 100MB free on 3GB volume.

When free space on thin pool drops below ~120MB, I want the 20GB volume and 
the 10GB volumes to be frozen, no new extents for 30GB volume.


I want at least 500MB free on 10GB volume.

When free space on thin pool drops below ~520MB, I want the 20GB volume to be 
frozen, no new extents for 20GB volume.



So I would get 2 thresholds and actions:

- threshold for 3GB volume causing all others to be frozen
- threshold for 10GB volume causing 20GB volume to be frozen

This is easily scriptable and custom thing.

But it would be nice if you could set this threshold in LVM per volume?


This is the main issue - these 'data' are pretty expensive to 'mine' out of 
data structures.


That's the reason why thin-pool is so fast and memory efficient inside the 
kernel - because it does not need to all those details about how much data 
thinLV eat from thin-pool - kernel target simply does not care - it only cares 
about referenced chunks


It's the user space utility which is able to 'parse' all the structure
and take a 'global' picture. But of course it takes CPU and TIME and it's not 
'byte accurate'  -  that's why you need to start act early on some threshold.




But the most important thing is to freeze or drop snapshots I think.

And to ensure that this is default behaviour?


Why you think this should be default ?

Default is to auto-extend thin-data & thin-metadata when needed if you set 
threshold bellow 100%.


We can discuss if it's good idea to enable auto-extending by default - as we 
don't know if the free space in VG is meant to be used for thin-pool or there 
is some other plan admin might have...



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] pvscan: bugs in manpage and implementation

2017-09-18 Thread Zdenek Kabelac


Dne 18.9.2017 v 09:52 Tom Hale napsal(a):

Hi,

MAN PAGE

In http://man7.org/linux/man-pages/man8/pvscan.8.html I see the
following issues:

* The string "-a--activate" appears several times. Should be:
   "-a|--activate"

* "-a|--activate y|n|ay" is mentioned, but later on:
"Only ay is applicable." Please remove "y|n|".

PROGRAM

# pvscan --activate ay
   Command does not accept option: --activate ay.

The message is confusing. It would be better to say "--activate requires
--cache"

In fact, why not drop the "ay" argument all together and just allow
"--cache --activate"?


Hi

In general - we use internal logic for parsing options and its parameters and 
in this case the options 'takes' more parameters then 'pvscan' can accept.


We could possibly add 'extra' special  option  e.i. :
--pvscanautoactivation

but we've considered at given time it's not worth since normally no user is 
ever supposed to use it in regular user space (since noone likes to see more 
and more individal options doing nearly the same thing)


This option is mostly targeted for execution inside  udev rules  - and since 
there was some 'evolution' how to make this working - it's now not really 
worth to make it more complicated - since we would have to keep old option 
present anyway for backward compatibility.


Regards

Zdenek





___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"

2017-09-18 Thread Zdenek Kabelac


Dne 18.9.2017 v 21:07 Gionatan Danti napsal(a):

Il 18-09-2017 20:55 David Teigland ha scritto:

It's definitely an irritation, and I described a configurable alternative
here that has not yet been implemented:

https://bugzilla.redhat.com/show_bug.cgi?id=1465974

Is this the sort of topic where we should start making use of
https://github.com/lvmteam/lvm2/issues ?


Hi, the proposed solution on BZ entry seems somewhat too "invasive" to me.
I think it generally is a good idea to warn the users about possibly 
unexptected behavior (ie: full pool).


However, for the specific use-case of taking read-only snapshots, something as 
simple as "--silencepoolsizewarn" flag (or similar configuration variable) 
would do the trick without distrupting legitimate warnings.





We can possibly print WARNING message only with the 1st. thin LV  which causes 
overprovisioining of thin-pool.


As  for 'read-only' snapshot - it really doesn't matter - the overprovisioned 
pool can in the 'worst' case end in out-of-space condition (i.e. fully 
rewritten origin)


Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"

2017-09-19 Thread Zdenek Kabelac


Dne 19.9.2017 v 10:49 Gionatan Danti napsal(a):

On 18/09/2017 23:10, matthew patton wrote:
If the warnings are not being emitted to STDERR then that needs to be fixed 
right off the bat.


The line with WARNINGs are written on STDERR, at least con recent LVM version.


'lvs -q blah' should squash any warnings.
'lvcreate' frankly shouldn't output anything unless invoked with '-v' 
anyway. So therefore '-q' should also squash warnings.


I'm not sure this is the right approach. I do not expect "-q" to remove *all* 
warnings, because warnings can be quite important.





There are couple troubles - there are always a 'skilled  X  unskilled' users.
and also there are varying distro maintainers..

We are already facing some troubles when some distributions do make a changes 
to default configuration files which are not really ideal for 'default' users 
(in other words opinions about defaults are different - moreover based on 
wrong blog post from users...)


So we can introduce  i.e.  'novice_user = 1' in lvm.conf  - but it might be 
effectively dropped when package maintainer decides this way lvm2 makes less 
annoying messages around some commands.


But in case of thin-pool it's better to warn-ahead  instead of facing troubles 
later  (as full thin-pool is not going to be pleasant experience...)


So we are looking for some solution which cannot be easily 'hidden' for 
everyone - so we do not end with reports about 'hidden over-provisioning' 
causing system malfunctioning.  Yet it should be 'easy' to bypass it for 
skilled admin.



IMHO the most convenient in my eyes is a usage of some sort of 'envvar'
LVM_SUPPRESS_POOL_WARNINGS


Since we already use similar logic to bypass i.e.  FD close() warnings
LVM_SUPPRESS_FD_WARNINGS
LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-21 Thread Zdenek Kabelac


Dne 21.9.2017 v 16:49 Xen napsal(a):


However you would need LVM2 to make sure that only origin volumes are marked 
as critical.


'dmeventd' executed binary - which can be a simple bash script called at 
threshold level can be tuned to various naming logic.


So far there is no plan to enforce 'naming' or 'tagging' since from user base 
feedback we can see numerous ways how to deal with large volume naming 
strategies often made by external tools/databases -  so enforcing i.e. 
specific tag would require changes in larger systems - so when it's compared 
with rather simple tuning script of bash script...



I actually think that if I knew how to do multithreading in the kernel, I 
could have the solution in place in a day...


If I were in the position to do any such work to begin with... :(.

But you are correct that error target is almost the same thing.


It's the most 'safest'  - avoids any sort of further possibly damaging of 
filesystem.


Note - typical 'fs' may remount 'ro' at reasonable threshold time - the 
precise points depends on workload. If you have 'PB' arrays - surely leaving 
5% of free space is rather huge, if you work with GB on fast operation SSD - 
taking action at 70% might be better.


If anytime 'during' write  users hits 'full pool' - there is currently no 
other way then to stop using FS  - there are numerous way -


You can replace device with 'error'
You can replace device with 'delay' that splits reads to thin and writes to 
error

There is just not any way-back - FS should be checked  (i.e. full FS could be 
'restored' by deleting some files, but in the full thin-pool case  'FS' needs 
to get consistent first - so focusing on solving full-pool is like preparing 
for missed battle - focus should go into ensuring you not hit full pool and on 
the 'sad' occasion of 100% full pool - the worst case scenario is not all the 
bad - surely way better then 4 year old experience with old kernel and old 
lvm2



What you would have to implement is to TAKE the space FROM them to
satisfy writing task to your 'active' volume and respect
prioritization...


Not necessary. Reserved space is a metric, not a real thing.

Reserved space by definition is a part of unallocated space.


How is this different from having VG with 1TB where you allocate for your 
thin-pool only i.e. 90% for thin-pool and you have 10% of free space for 
'extension' of thin-pool for your 'critical' moment.


I'm still not seeing any difference - except you would need to invest lot of 
energy into handling of this 'reserved' space inside kernel.


With actual versions of lvm2 you can handle these tasks at user-space and 
quite early before you reach 'real' out-of-space condition.



In other words - tuning 'thresholds' in userspace's 'bash' script will
give you very same effect as if you are focusing here on very complex
'kernel' solution.


It's just not very complex.

You thought I wanted space consumption metric for all volumes including 
snapshots and then invididual attribution of all consumed space.



Maybe you can try existing proposed solutions first and showing 'weak' points 
which are not solvable by it ?


We all agree we could not store 10G thin volume into 1G thin-pool - so there 
will be always the case of having  'full pool'.


Either you could handle reserves of 'early' remount-ro  or you keep some 
'spare' LV/space in VG you attach to thin-pool 'when' needed...
Having such 'great' level of free choice here is IMHO big advantage as it's 
always 'admin' to decide how to use available space in the best way -  instead 
of keeping 'reserves' somewhere hidden in kernel


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Restoring snapshot gone bad

2017-09-22 Thread Zdenek Kabelac


Dne 22.9.2017 v 08:03 Mauricio Tavares napsal(a):

I have a lv, vmzone/desktop that I use as drive for a kvm guest;
nothing special here. I wanted to restore its snapshot so like I have
done many times before I shut guest down and then

lvconvert --merge vmzone/desktop_snap_20170921
   Logical volume vmzone/desktop is used by another device.
   Can't merge over open origin volume.
   Merging of snapshot vmzone/desktop_snap_20170921 will occur on next 
activation
  of vmzone/desktop.

What is it really trying to tell me? How to find out which other
device is using it?



Hi


When you want merge a snapshot - origin must be an unused volume (so not 
opened/mounted anywhere).


Merging process 'copies/restores' modified blocks in your origin volume - thus 
i.e. filesystem would not be able to handle 'changes' happening underneath its 
hands..


So if there are not condition which let you start snapshot merge, the 
operation is delayed - likely your 'next' activation of your origin volume in 
which case it's pretty sure there is no user  so the merge can be started

(or you can 'lvchange --refresh')



lvdisplay tells me that

lvdisplay /dev/vmzone/desktop
   --- Logical volume ---
   LV Path/dev/vmzone/desktop
   LV Namedesktop
   VG Namevmzone
   LV UUID3hcB1L-rIRf-PHZQ-I55F-ZXhT-SnSZ-vThO8U
   LV Write Accessread/write
   LV Creation host, time duocismj01e9se, 2017-06-29 15:07:12 -0400
   LV snapshot status source of
  desktop_snap_20170921 [active]
   LV Status  available
   # open 2



here you can see 'non-zero' open count


   LV Size100.00 GiB
   Current LE 25600
   Segments   1
   Allocation inherit
   Read ahead sectors auto
   - currently set to 256
   Block device   252:6

When I do plain old lvs (or lvs -a -o +devices), the attribute entry
for the desktop looks like

   desktopvmzoneOwi-aos--- 100.00g

According to https://linux.die.net/man/8/lvs the "O" in Owi-aos--- means it
is merging a snapshot. But, what is its status? Based on how long it
has been that way, I think it is hung but I do not know what is
causing this hangup.


Nothing is hanging - it's just postponed for next opportunity...
Progress of merging can be checked easily with command  'lvs'.
Note: while the merging is in progress - you can already use 'merged origin'
so i.e.  if you merge snapshot of you 'root'  volume - on reboot and next 
activation you can already use 'merged' result while actually copying is 
processed in the background and you can check its progress percentage.


Regards

Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-21 Thread Zdenek Kabelac


Dne 21.9.2017 v 12:22 Xen napsal(a):

Hi,

thank you for your response once more.

Zdenek Kabelac schreef op 21-09-2017 11:49:


Hi

Of course this decision makes some tasks harder (i.e. there are surely
problems which would not even exist if it would be done in kernel)  -
but lots of other things are way easier - you really can't compare
those


I understand. But many times lack of integration of shared goal of multiple 
projects is also big problem in Linux.


And you also have project that do try to integrate shared goals like btrfs.


However if we *can* standardize on some tag or way of _reserving_ this 
space, I'm all for it.


Problems of a desktop user with 0.5TB SSD are often different with
servers using 10PB across multiple network-connected nodes.

I see you call for one standard - but it's very very difficult...


I am pretty sure that if you start out with something simple, it can extend 
into the complex.


We hope community will provide some individual scripts...
Not a big deal to integrate them into repo dir...


We have spend really lot of time thinking if there is some sort of
'one-ring-to-rule-them-all' solution - but we can't see it yet -
possibly because we know wider range of use-cases compared with
individual user-focused problem.


I think you have to start simple.


It's mostly about what can be supported 'globally'
and what is rather 'individual' customization.



You can never come up with a solution if you start out with the complex.

The only thing I ever said was:
- give each volume a number of extents or a percentage of reserved space if 
needed


Which can't be deliver with current thinp technology.
It's simply too computational invasive for our targeted performance.

The only deliverable we have is - you create a 'cron' job that does hard 
'computing' once in a while - and makes some 'action' when individual 
'volumes' goes out of their preconfigured boundaries.  (often such logic is 
implemented outside of lvm2 - in some DB engine - since   lvm2 itself is 
really NOT a high performing DB - the ascii format has it's age)


You can't get this 'percentage' logic online in kernel (aka while you update 
individual volume).




- for all the active volumes in the thin pool, add up these numbers
- when other volumes require allocation, check against free extents in the pool


I assume you possibly missed this logic of thin-p:

When you update origin - you always allocate FOR origin, but allocated chunk
remains claimed by snapshots (if there are any).

So if snapshot shared all pages with the origin at the beginning (so basically 
consumed only some 'metadata' space and 0% real exclusive own space)  - after 
full rewrite of the origin your snapshot suddenly 'holds' all the old chunks 
(100% of its size)


So when you 'write' to ORIGIN - your snapshot which becomes bigger in terms of 
individual/exclusively owned chunks - so if you have i.e. configured snapshot 
to not  consume more then  XX% of your pool - you would simply need to recalc 
this with every update on shared chunks


And as has been already said - this is currently unsupportable 'online'

Another aspect here is - thin-pool has  no idea about 'history' of volume 
creation - it doesn't not know  there is volume X being snapshot of volume Y - 
this all is only 'remembered' by lvm2 metadata  -  in kernel - it's always 
like  -  volume X  owns set of chunks  1...

That's all kernel needs to know for a single thin volume to work.

You can do it with 'reasonable' delay in user-space upon 'triggers' of global 
threshold  (thin-pool fullness).



- possibly deny allocation for these volumes



Unsupportable in 'kernel' without rewrite and you can i.e. 'workaround' this 
by placing 'error' targets in place of less important thinLVs...


Imagine you would get pretty random 'denials' of your WRITE request depending 
on interaction with other snapshots



Surely if use 'read-only' snapshot you may not see all related problems, but 
such a very minor subclass of whole provisioning solution is not worth a 
special handling of whole thin-p target.




I did not know or did not realize the upgrade paths of the DM module(s) and 
LVM2 itself would be so divergent.


lvm2 is  volume manager...

dm is implementation layer for different 'segtypes' (in lvm2 terminology).

So i.e. anyone can write his own 'volume manager'  and use 'dm'  - it's fully 
supported - dm is not tied to lvm2 and is openly designed  (and used by other 
projects)



So my apologies for that but obviously I was talking about a full-system 
solution (not partial).


yep - 2 different worlds

i.e. crypto, multipath,...




You have origin and 2 snaps.
You set different 'thresholds' for these volumes  -


I would not allow setting threshold for snapshots.

I understand that for dm thin target they are all the same.

But for this model it does not make sense because LVM talks of "origin" and 
"snapshots".



You then overwri

Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume

2017-09-13 Thread Zdenek Kabelac


Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a):

Distribution: centos-release-7-3.1611.el7.centos.x86_64
Kernel: Linux 3.10.0-514.26.2.el7.x86_64
LVM: 2.02.166(2)-RHEL7 (2016-11-16)

Volume group consisted of an 8-drive SSD (500G drives) array, plus an 
additional SSD of the same size.  The array had 64 k stripes.
Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with 
metadata volume 16G.  data was entirely on the 8-drive raid, metadata was 
entirely on the 9th drive.
Virtual volume “thin” was 300 GB.  I also filled it with dd so that it would be 
fully provisioned before the test.
Volume “thick” was also 300GB, just an ordinary volume also entirely on the 
8-drive array.

Four tests were run directlyagainst each volume using fio-2.2.8, random read, 
random write, sequential read, sequential write.  Single thread, 4k blocksize, 
90s run time.


Hi

Can you please provide output of:

lvs -a -o+stripes,stripesize,seg_pe_ranges

so we can see how is your stripe placed on devices ?

SSD typically do needs ideally write 512K chunks.
(something like  'lvcreate -LXXX -i8 -I512k vgname' )

Wouldn't be 'faster' to just concatenate 8 disks together instead of striping 
- or stripe only across 2 disk - and then you concatenate 4 such striped areas...


64k stripes do not seem to look like ideal match in this case of 3 disk with 
512K blocks


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-14 Thread Zdenek Kabelac


Dne 14.9.2017 v 07:59 Xen napsal(a):

Zdenek Kabelac schreef op 13-09-2017 21:35:


We are moving here in right direction.

Yes - current thin-provisiong does not let you limit maximum number of
blocks individual thinLV can address (and snapshot is ordinary thinLV)

Every thinLV can address  exactly   LVsize/ChunkSize  blocks at most.


So basically the only options are allocation check with asynchronously derived 
intel that might be a few seconds late, as a way to execute some standard and 
general "prioritizing" policy, and an interventionalist policy that will 
(fs)freeze certain volumes depending on admin knowledge about what needs to 
happen in his/her particular instance.



Basically user-land tool takes a runtime snapshot of kernel metadata
(so gets you information from some frozen point in time) then it processes the 
input data (up to 16GiB!) and outputs some number - like what is the
real unique blocks allocated in thinLV.  Typically snapshot may share some 
blocks - or could have already be provisioning all blocks  in case shared 
blocks were already modified.




Great - 'prediction' - we getting on the same page -  prediction is
big problem


Yes I mean my own 'system' I generally of course know how much data is on it 
and there is no automatic data generation.


However lvm2 is not 'Xen oriented' tool only.
We need to provide universal tool - everyone can adapt to their needs.

Since your needs are different from others needs.

But if I do create snapshots (which I do every day) when the root and boot 
snapshots fill up (they are on regular lvm) they get dropped which is nice, 


old snapshot are different technology for different purpose.



$ sudo ./thin_size_report.sh
[sudo] password for xen:
Executing self on linux/thin
Individual invocation for linux/thin

     name   pct   size
     -
     data    54.34% 21.69g
     sites    4.60%  1.83g
     home 6.05%  2.41g
     - +
     volumes 64.99% 25.95g
     snapshots    0.09% 24.00m
     - +
     used    65.08% 25.97g
     available   34.92% 13.94g
     - +
     pool size  100.00% 39.91g

The above "sizes" are not volume sizes but usage amounts.


With 'plain'  lvs output is - it's just an orientational number.
Basically highest referenced chunk for a thin given volume.
This is great approximation of size for a single thinLV.
But somewhat 'misleading' for thin devices being created as snapshots...
(having shared blocks)

So you have no precise idea how many blocks are shared or uniquely owned by a 
device.


Removal of snapshot might mean you release  NOTHING from your thin-pool if all 
snapshot blocks where shared with some other thin volumes



If you say that any additional allocation checks would be infeasible because 
it would take too much time per request (which still seems odd because the 
checks wouldn't be that computation intensive and even for 100 gigabyte you'd 
only have 25.000 checks at default extent size) -- of course you 
asynchronously collect the data.


Processing of mapping of upto 16GiB of metadata will not happen in 
miliseconds and consumes memory and CPU...




I mean I generally like the designs of the LVM team.

I think they are some of the most pleasant command line tools anyway...


We try really hard


On the other hand if all you can do is intervene in userland, then all LVM 
team can do is provide basic skeleton for execution of some standard scripts.


Yes - we give all the power to suit thin-p for individual needs to the user.




So all you need to do is to use the tool in user-space for this task.


So maybe we can have an assortment of some 5 interventionalist policies like:

a) Govern max snapshot size and drop snapshots when they exceed this
b) Freeze non-critical volumes when thin space drops below aggegrate values 
appropriate for the critical volumes

c) Drop snapshots when thin space <5% starting with the biggest one
d) Also freeze relevant snapshots in case (b)
e) Drop snapshots when exceeding max configured size in case of threshold reach.


But you are aware you can run such task even with cronjob.

So for example you configure max size for snapshot. When snapshots exceeds 
size gets flagged for removal. But removal only happens when other condition 
is met (threshold reach).


We are blamed already for having way too much configurable knobs




So you would have 5 different interventions you could use that could be 
considered somewhat standard and the admit can just pick and choose or customize.




And we have way longer list of actions we want to do ;) We have not yet come 
to any single conclusion how to make such thing manageable for a user...




But how expensive is it to do it say every 5 seconds?


If you have b

Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"

2017-09-19 Thread Zdenek Kabelac


Dne 19.9.2017 v 16:14 David Teigland napsal(a):

On Tue, Sep 19, 2017 at 01:11:09PM +0200, Zdenek Kabelac wrote:

IMHO the most convenient in my eyes is a usage of some sort of 'envvar'
LVM_SUPPRESS_POOL_WARNINGS


I think we're looking at the wrong thing.  The root problem is what we're
warning about, not that the warning is being printed.  It doesn't make
sense to warn about the inherent nature of things.  When peole create a
linear LV, we don't print a warning that it's not redundant.  By warning
that the pool is overprovisioned, we also mislead people into thinking
that this is what they should worry about, when in fact it's free space in
the pool which is the real thing to worry about.  So I think the message
should be dropped and replaced with something more useful.

The main purpose of the Warning is to really warn user there is NOT configured 
auto-extension of thin-pool (so no threshold is setup) - so thin-pool is not 
set-up in 'preferred' way  (so when user counts with the fact the thin-pool 
can grow and auto-extension is enabled - the warning is not printed).


I think it's really useful to give this information to the user - since from 
my experience with users - many of them are simply unaware of the fact when 
they take 3 snapshots they may need  3x more space of the origin volume.


So in the case users do want to have 'critical' volumes always 'safe' from 
out-of-space condition - the message tells them when pool can't cover all 
space for all thins.


Even in this list - people tend to think it is really an easy to just drop 
snapshots like with old-snaps and they even think it will happen auto-magically.


So IMHO we are better to give user really good info about what is going on.
Once we provide more secure mechanism - we may possibly change the time and 
actual printed message.


Skilled user just ignores the message - so is the major problem with it ?

Is it the 'severity' -  so the message should be prefixed with "NOTE:" instead 
of  "WARNING:"(log_warn() -> log_print_unless_quite())


We have number of similar messages for other cases, so it's relatively common 
in lvm2 to give some guidance messages to users - just this one gets some 
extra tension (i.e. we can open similar discussion about handling duplicates 
cases)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"

2017-09-19 Thread Zdenek Kabelac


Dne 19.9.2017 v 16:34 matthew patton napsal(a):
LVM and thin in particular is not for noobies or novices. If they get burned then they deserve it for using a technology they didn't bother to study and learn 


Well it's not for novices ;) yet I believe  'lvm2' should not be an easy 
weapon for mass destruction for users data - we are then 'beaten' by rumors 
from the 'other side'



"create succeeded". Create succeeded is indicated by a 0 return code. Anything 
else is just noise. If the user wants noise, then need to use the '-v|--verbose' flag.



lvcreate -q  is then likely  wanted  to be 'respected' as normally lvm2 tends 
to be a  bit  'conversational' command.


But I think  '-q'  should be respected and it's not  with this log_warn().

So I think  conversion to  log_print() level is possibly the goal here ??
(covering both side of stories).

Since as it's been said - we emit more lines per command even about successful 
'middle' steps.



Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-21 Thread Zdenek Kabelac


Dne 20.9.2017 v 15:05 Xen napsal(a):

Gionatan Danti schreef op 18-09-2017 21:20:


Xen, I really think that the combination of hard-threshold obtained by
setting thin_pool_autoextend_threshold and thin_command hook for
user-defined script should be sufficient to prevent and/or react to
full thin pools.


I will hopefully respond to Zdenek's message later (and the one before that 
that I haven't responded to),



I'm all for the "keep it simple" on the kernel side.


But I don't mind if you focus on this,


That said, I would like to see some pre-defined scripts to easily
manage pool fullness. (...) but I would really
like the standardisation such predefined scripts imply.


And only provide scripts instead of kernel features.

Again, the reason I am also focussing on the kernel is because:

a) I am not convinced it cannot be done in the kernel
b) A kernel feature would make space reservation very 'standardized'.



Hi

Some more 'light' into the existing state as this is really not about what can 
and what cannot be done in kernel - as clearly you can do 'everything' in 
kernel - if you have the code for it...


I'm here explaining position of lvm2 - which is user-space project (since we 
are on lvm2 list) - and lvm2 is using  'existing'  dm  kernel target which 
provides  thin-provisioning (and has it's configurables). So this is kernel 
piece and differs from user-space lvm2 counterpart.


Surely there is cooperation between these two - but anyone else can write some 
other 'dm'  target - and lvm2 can extend support for given  target/segment 
type if such target is used by users.


In practice your 'proposal' is quite different from the existing target - 
essentially major rework if not a whole new re-implementation  - as it's not 
'a few line' patch extension  which you might possibly believe/hope into.


I can (and effectively I've already spent a lot of time) explaining the 
existing logic and why it is really hardly doable with current design, but we 
cannot work on support for 'hypothetical' non-existing kernel target from lvm2 
side - so you need to start from 'ground-zero' level on dm target design
or you need to 'reevaluate' your vision to be more in touch with existing 
kernel target output...


However we believe our exiting solution in 'user-space' can cover most common 
use-cases and we might just have 'big-holes' in providing better documentation 
to explain reasoning and guide users to use existing technology in more 
optimal way.




The point is that kernel features make it much easier to standardize and to 
put some space reservation metric in userland code (it becomes a default 
feature) and scripts remain a little bit off to the side.


Maintenance/devel/support of kernel code is more expensive - it's usually very 
easy to upgrade small 'user-space' encapsulated package - compared with major 
changes on kernel side.


So that's where dm/lvm2 design gets from - do the 'minimum necessary' inside 
kernel and  maximize usage of user-space.


Of course this decision makes some tasks harder (i.e. there are surely 
problems which would not even exist if it would be done in kernel)  - but lots 
of other things are way easier - you really can't compare those


Yeah - standards are always problem :)  i.e. Xorg & Wayland
but it's way better to play with user-space then playing with kernel

However if we *can* standardize on some tag or way of _reserving_ this space, 
I'm all for it.


Problems of a desktop user with 0.5TB SSD are often different with servers 
using 10PB across multiple network-connected nodes.


I see you call for one standard - but it's very very difficult...


I think a 'critical' tag in combination with the standard autoextend_threshold 
(or something similar) is too loose and ill-defined and not very meaningful.


We look for delivering admins rock-solid bricks.

If you make small house or you build a Southfork out of it is then admins' 
choice.

We have spend really lot of time thinking if there is some sort of 
'one-ring-to-rule-them-all' solution - but we can't see it yet - possibly 
because we know wider range of use-cases compared with individual user-focused 
problem.


And I would prefer to set individual space reservation for each volume even if 
it can only be compared to 5% threshold values.


Which needs 'different' kernel target driver (and possibly some way to 
kill/split page-cache to work on 'per-device' basis)


And just as an illustration of problems you need to start solving for this 
design:

You have origin and 2 snaps.
You set different 'thresholds' for these volumes  -
You then overwrite 'origin'  and you have to maintain 'data' for OTHER LVs.
So you get into the position - when 'WRITE' to origin will invalidate volume 
that is NOT even active (without lvm2 being even aware).
So suddenly rather simple individual thinLV targets  will have to maintain 
whole 'data set' and cooperate with all other active thins targets in case 
they

Re: [linux-lvm] lv raid - how to read this?

2017-09-08 Thread Zdenek Kabelac


Dne 7.9.2017 v 15:12 lejeczek napsal(a):



On 07/09/17 10:16, Zdenek Kabelac wrote:

Dne 7.9.2017 v 10:06 lejeczek napsal(a):

hi fellas

I'm setting up a lvm raid0, 4 devices, I want raid0 and I understand & 
expect - there will be four stripes, all I care of is speed.

I do:
$ lvcreate --type raid0 -i 4 -I 16 -n 0 -l 96%pv intel.raid0-0 
/dev/sd{c..f} # explicitly four stripes


I see:
$ mkfs.xfs /dev/mapper/intel.sataA-0 -f
meta-data=/dev/mapper/intel.sataA-0 isize=512 agcount=32, agsize=30447488 blks
  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1    finobt=0, sparse=0
data =   bsize=4096 blocks=974319616, imaxpct=5
  =   sunit=4 swidth=131076 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096 blocks=475744, version=2
  =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

What puzzles me is xfs's:
  sunit=4  swidth=131076 blks
and I think - what the hexx?



Unfortunatelly  'swidth'  in XFS has different meaning than lvm2's  stripe 
size parameter.


In lvm2 -


-i | --stripes    - how many disks
-I | --stripesize    - how much data before using next disk.

So  -i 4  & -I 16 gives  64KB  total stripe width



XFS meaning:

suinit = 
swidth = <# of data disks (don't count parity disks)>



 so real-world example 

# lvcreate --type striped -i4 -I16 -L1G -n r0 vg

or

# lvcreate --type raid0  -i4 -I16 -L1G -n r0 vg

# mkfs.xfs  /dev/vg/r0 -f
meta-data=/dev/vg/r0 isize=512    agcount=8, agsize=32764 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1    finobt=1, sparse=0, rmapbt=0, 
reflink=0

data =   bsize=4096 blocks=262112, imaxpct=25
 =   sunit=4  swidth=16 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=552, version=2
 =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


 and we have 

sunit=4 ...  4 * 4096 = 16KiB    (matching lvm2 -I16 here)
swidth=16 blks  ... 16 * 4096 = 64KiB
   so we have  64 as total width / size of single strip (sunit) ->  4 disks
   (matching  lvm2 -i4 option here)

Yep complex, don't ask... ;)





In a LVM non-raid stripe scenario I've always remember it was: swidth = 
sunit * Y where Y = number of stripes, right?


I'm hoping some expert could shed some light, help me(maybe others too) 
understand what LVM is doing there? I'd appreciate.

many thanks, L.



We in the first place there is major discrepancy in the naming:

You use intel.raid0-0   VG name
and then you mkfs device: /dev/mapper/intel.sataA-0  ??

While you should be accessing: /dev/intel.raid0/0

Are you sure you are not trying to overwrite some unrelated device here?

(As your shown numbers looks unrelated, or you have buggy kernel or blkid)



hi,
I renamed VG in the meantime,
I get xfs intricacy..
so.. question still stands..
why xfs format does not do what I remember always did in the past(on lvm 
non-raid but stripped), like in your example


  =   sunit=4  swidth=16 blks
but I see instead:

  =   sunit=4 swidth=4294786316 blks

a whole lot:

$ xfs_info /__.aLocalStorages/0
meta-data=/dev/mapper/intel.raid0--0-0 isize=512 agcount=32, agsize=30768000 
blks
  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1    finobt=0 spinodes=0
data =   bsize=4096 blocks=984576000, imaxpct=5
  =   sunit=4 swidth=4294786316 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal   bsize=4096   blocks=480752, version=2
  =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

$ lvs -a -o +segtype,stripe_size,stripes,devices intel.raid0-0
   LV   VG    Attr   LSize   Pool Origin Data% Meta%  Move 
Log Cpy%Sync Convert Type   Stripe #Str Devices
   0    intel.raid0-0 rwi-aor--- 3.67t raid0 16.00k    4 
0_rimage_0(0),0_rimage_1(0),0_rimage_2(0),0_rimage_3(0)

   [0_rimage_0] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdc(0)
   [0_rimage_1] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdd(0)
   [0_rimage_2] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sde(0)
   [0_rimage_3] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdf(0)




Hi

I've checked even 128TiB sized device with mkfs.xfs with -i4 -I16

# lvs -a vg

  LV VG Attr

Re: [linux-lvm] lv raid - how to read this?

2017-09-08 Thread Zdenek Kabelac


Dne 8.9.2017 v 11:39 lejeczek napsal(a):



On 08/09/17 10:34, Zdenek Kabelac wrote:

Dne 8.9.2017 v 11:22 lejeczek napsal(a):



On 08/09/17 09:49, Zdenek Kabelac wrote:

Dne 7.9.2017 v 15:12 lejeczek napsal(a):



On 07/09/17 10:16, Zdenek Kabelac wrote:

Dne 7.9.2017 v 10:06 lejeczek napsal(a):

hi fellas

I'm setting up a lvm raid0, 4 devices, I want raid0 and I understand & 
expect - there will be four stripes, all I care of is speed.

I do:
$ lvcreate --type raid0 -i 4 -I 16 -n 0 -l 96%pv intel.raid0-0 
/dev/sd{c..f} # explicitly four stripes


I see:
$ mkfs.xfs /dev/mapper/intel.sataA-0 -f
meta-data=/dev/mapper/intel.sataA-0 isize=512 agcount=32, 
agsize=30447488 blks

  =   sectsz=512 attr=2, projid32bit=1
  =   crc=1 finobt=0, sparse=0
data =   bsize=4096 blocks=974319616, imaxpct=5
  =   sunit=4 swidth=131076 blks
naming   =version 2  bsize=4096 ascii-ci=0 ftype=1
log  =internal log   bsize=4096 blocks=475744, version=2
  =   sectsz=512 sunit=4 blks, lazy-count=1
realtime =none   extsz=4096 blocks=0, rtextents=0

What puzzles me is xfs's:
  sunit=4  swidth=131076 blks
and I think - what the hexx?



Unfortunatelly  'swidth'  in XFS has different meaning than lvm2's  
stripe size parameter.


In lvm2 -


-i | --stripes    - how many disks
-I | --stripesize    - how much data before using next disk.

So  -i 4  & -I 16 gives  64KB  total stripe width



XFS meaning:

suinit = with k)>

swidth = <# of data disks (don't count parity disks)>



 so real-world example 

# lvcreate --type striped -i4 -I16 -L1G -n r0 vg

or

# lvcreate --type raid0  -i4 -I16 -L1G -n r0 vg

# mkfs.xfs  /dev/vg/r0 -f
meta-data=/dev/vg/r0 isize=512 agcount=8, agsize=32764 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1 finobt=1, sparse=0, rmapbt=0, 
reflink=0

data =   bsize=4096 blocks=262112, imaxpct=25
 =   sunit=4 swidth=16 blks
naming   =version 2  bsize=4096 ascii-ci=0 ftype=1
log  =internal log   bsize=4096 blocks=552, version=2
 =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096 blocks=0, rtextents=0


 and we have 

sunit=4 ...  4 * 4096 = 16KiB    (matching lvm2 -I16 here)
swidth=16 blks  ... 16 * 4096 = 64KiB
   so we have  64 as total width / size of single strip (sunit) ->  4 disks
   (matching  lvm2 -i4 option here)

Yep complex, don't ask... ;)





In a LVM non-raid stripe scenario I've always remember it was: swidth = 
sunit * Y where Y = number of stripes, right?


I'm hoping some expert could shed some light, help me(maybe others too) 
understand what LVM is doing there? I'd appreciate.

many thanks, L.



We in the first place there is major discrepancy in the naming:

You use intel.raid0-0   VG name
and then you mkfs device: /dev/mapper/intel.sataA-0  ??

While you should be accessing: /dev/intel.raid0/0

Are you sure you are not trying to overwrite some unrelated device here?

(As your shown numbers looks unrelated, or you have buggy kernel or 
blkid)




hi,
I renamed VG in the meantime,
I get xfs intricacy..
so.. question still stands..
why xfs format does not do what I remember always did in the past(on lvm 
non-raid but stripped), like in your example


  =   sunit=4 swidth=16 blks
but I see instead:

  =   sunit=4 swidth=4294786316 blks

a whole lot:

$ xfs_info /__.aLocalStorages/0
meta-data=/dev/mapper/intel.raid0--0-0 isize=512 agcount=32, 
agsize=30768000 blks

  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1    finobt=0 spinodes=0
data =   bsize=4096 blocks=984576000, imaxpct=5
  =   sunit=4 swidth=4294786316 blks
naming   =version 2  bsize=4096 ascii-ci=0 ftype=1
log  =internal   bsize=4096 blocks=480752, version=2
  =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

$ lvs -a -o +segtype,stripe_size,stripes,devices intel.raid0-0
   LV   VG    Attr   LSize   Pool Origin Data% Meta%  
Move Log Cpy%Sync Convert Type Stripe #Str Devices
   0    intel.raid0-0 rwi-aor--- 3.67t raid0 16.00k    4 
0_rimage_0(0),0_rimage_1(0),0_rimage_2(0),0_rimage_3(0)

   [0_rimage_0] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdc(0)
   [0_rimage_1] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdd(0)
   [0_rimage_2] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sde(0)
   [0_rimage_3] intel.raid0-0 iwi-aor--- 938.96g linear 0

Re: [linux-lvm] lv raid - how to read this?

2017-09-08 Thread Zdenek Kabelac


Dne 8.9.2017 v 11:22 lejeczek napsal(a):



On 08/09/17 09:49, Zdenek Kabelac wrote:

Dne 7.9.2017 v 15:12 lejeczek napsal(a):



On 07/09/17 10:16, Zdenek Kabelac wrote:

Dne 7.9.2017 v 10:06 lejeczek napsal(a):

hi fellas

I'm setting up a lvm raid0, 4 devices, I want raid0 and I understand & 
expect - there will be four stripes, all I care of is speed.

I do:
$ lvcreate --type raid0 -i 4 -I 16 -n 0 -l 96%pv intel.raid0-0 
/dev/sd{c..f} # explicitly four stripes


I see:
$ mkfs.xfs /dev/mapper/intel.sataA-0 -f
meta-data=/dev/mapper/intel.sataA-0 isize=512 agcount=32, agsize=30447488 
blks

  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1 finobt=0, sparse=0
data =   bsize=4096 blocks=974319616, imaxpct=5
  =   sunit=4 swidth=131076 blks
naming   =version 2  bsize=4096 ascii-ci=0 ftype=1
log  =internal log   bsize=4096 blocks=475744, version=2
  =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

What puzzles me is xfs's:
  sunit=4  swidth=131076 blks
and I think - what the hexx?



Unfortunatelly  'swidth'  in XFS has different meaning than lvm2's  stripe 
size parameter.


In lvm2 -


-i | --stripes    - how many disks
-I | --stripesize    - how much data before using next disk.

So  -i 4  & -I 16 gives  64KB  total stripe width



XFS meaning:

suinit = k)>

swidth = <# of data disks (don't count parity disks)>



 so real-world example 

# lvcreate --type striped -i4 -I16 -L1G -n r0 vg

or

# lvcreate --type raid0  -i4 -I16 -L1G -n r0 vg

# mkfs.xfs  /dev/vg/r0 -f
meta-data=/dev/vg/r0 isize=512    agcount=8, agsize=32764 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1    finobt=1, sparse=0, 
rmapbt=0, reflink=0

data =   bsize=4096 blocks=262112, imaxpct=25
 =   sunit=4  swidth=16 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096 blocks=552, version=2
 =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


 and we have 

sunit=4 ...  4 * 4096 = 16KiB    (matching lvm2 -I16 here)
swidth=16 blks  ... 16 * 4096 = 64KiB
   so we have  64 as total width / size of single strip (sunit) ->  4 disks
   (matching  lvm2 -i4 option here)

Yep complex, don't ask... ;)





In a LVM non-raid stripe scenario I've always remember it was: swidth = 
sunit * Y where Y = number of stripes, right?


I'm hoping some expert could shed some light, help me(maybe others too) 
understand what LVM is doing there? I'd appreciate.

many thanks, L.



We in the first place there is major discrepancy in the naming:

You use intel.raid0-0   VG name
and then you mkfs device: /dev/mapper/intel.sataA-0  ??

While you should be accessing: /dev/intel.raid0/0

Are you sure you are not trying to overwrite some unrelated device here?

(As your shown numbers looks unrelated, or you have buggy kernel or 
blkid)




hi,
I renamed VG in the meantime,
I get xfs intricacy..
so.. question still stands..
why xfs format does not do what I remember always did in the past(on lvm 
non-raid but stripped), like in your example


  =   sunit=4  swidth=16 blks
but I see instead:

  =   sunit=4 swidth=4294786316 blks

a whole lot:

$ xfs_info /__.aLocalStorages/0
meta-data=/dev/mapper/intel.raid0--0-0 isize=512 agcount=32, 
agsize=30768000 blks

  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1    finobt=0 spinodes=0
data =   bsize=4096 blocks=984576000, imaxpct=5
  =   sunit=4 swidth=4294786316 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal   bsize=4096 blocks=480752, version=2
  =   sectsz=512   sunit=4 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

$ lvs -a -o +segtype,stripe_size,stripes,devices intel.raid0-0
   LV   VG    Attr   LSize   Pool Origin Data% Meta%  
Move Log Cpy%Sync Convert Type Stripe #Str Devices
   0    intel.raid0-0 rwi-aor--- 3.67t raid0 16.00k    4 
0_rimage_0(0),0_rimage_1(0),0_rimage_2(0),0_rimage_3(0)

   [0_rimage_0] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdc(0)
   [0_rimage_1] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdd(0)
   [0_rimage_2] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sde(0)
   [0_rimage_3] intel.raid0-0 iwi-aor--- 938.96g linear 0 1 /dev/sdf(0)




Hi

I've checked even 128TiB sized

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-11 Thread Zdenek Kabelac


Dne 11.9.2017 v 12:55 Xen napsal(a):

Zdenek Kabelac schreef op 11-09-2017 12:35:


As thin-provisioning is about 'promising the space you can deliver
later when needed'  - it's not about hidden magic to make the space
out-of-nowhere.
The idea of planning to operate thin-pool on 100% fullness boundary is
simply not going to work well - it's  not been designed for that
use-case


I am going to rear my head again and say that a great many people would 
probably want a thin-provisioning that does exactly that ;-).




Wondering from where they could get this idea...
We always communicate clearly - do not plan to use 100% full unresizable 
thin-pool as a part of regular work-flow - it's always critical situation 
often even leading to system's reboot and full check of all volumes.


I mean you have it designed for auto-extension but there are also many people 
that do not want to auto-extend and just share available resources more flexibly.


For those people safety around 100% fullness boundary becomes more important.

I don't really think there is another solution for that.

I don't think BTRFS is really a good solution for that.

So what alternatives are there, Zdenek? LVM is really the only thing that 
feels "good" to us.




Thin-pool  needs to be ACTIVELY monitored and proactively either added more PV 
free space to the VG or eliminating  unneeded 'existing' provisioned blocks 
(fstrim,  dropping snapshots, removal of unneeded thinLVs -  whatever 
comes on your mind to make a more free space in thin-pool  - lvm2 fully 
supports now to call 'smart' scripts directly out of dmeventd for such action.



It's illusion to hope anyone will be able to operate lvm2 thin-pool at 100% 
fullness reliable - there should be always enough room to give 'scripts'
reaction time to gain some more space in-time  - so thin-pool can serve free 
chunks for provisioning - that's been design - to deliver blocks when needed,

not to brake system

Are there structural design inhibitions that would really prevent this thing 
from ever arising?


Yes, performance and resources consumption :)

And there is fundamental difference between full 'block device' sharing
space with other device - compared with single full filesystem - you can't 
compare these 2 things at all.



Regards


Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-11 Thread Zdenek Kabelac


Dne 11.9.2017 v 17:31 Eric Ren napsal(a):

Hi Zdenek,

On 09/11/2017 09:11 PM, Zdenek Kabelac wrote:

[..snip...]


So don't expect lvm2 team will be solving this - there are more prio work


Sorry for interrupting your discussion. But, I just cannot help to ask:

It's not the first time I see "there are more prio work". So I'm wondering: 
can upstream
consider to have these high priority works available on homepage [1] or trello 
tool [1]?


I really hope upstream can do so. Thus,

1. Users can expect what changes will likely happen for lvm.

2. It helps developer reach agreements on what problems/features should be on 
high

priority and avoid overlap efforts.



lvm2 is using  upstream community BZ located here:

https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper

You can check RHBZ easily for all lvm2 bZ
(mixes  RHEL/Fedora/Upstream)

We usually want to have upstream BZ being linked with Community BZ,
but sometimes it's driven through other channel - not ideal - but still easily 
search-able.


- lvmetad slows down activation much if there are a lot of PVs on system (say 
256 PVs, it takes >10s to pvscan

in my testing).


It's should be opposite case - unless something regressed recently...
Easiest is to write out  lvm2 test suite some test.

And eventually bisect which commit broke it...

- pvmove is slow. I know it's not fault of LVM. The time is almost spent in DM 
(the IO dispatch/copy).


Yeah - this is more or less design issue inside kernel - there are
some workarounds - but since primary motivation was not to overload
system - it's been left a sleep a bit - since focus gained  'raid' target
and these pvmove fixes are working with old dm mirror target...
(i.e. try to use bigger  region_size for mirror in lvm.conf  (over 512K)
and evaluate performance  - there is something wrong - but core mirror 
developer is busy with raid features ATM



- snapshot cannot be used in cluster environment. There is a usecase: user has 
a central backup system


Well, snapshot CANNOT work in cluster.
What you can do is to split snapshot and attach it different volume,
but exclusive assess is simply required - there is no synchronization of 
changes like with cmirrord for old mirror




If our upstream have a place to put and discuss what the prio works are, I 
think it will encourage me to do
more contributions - because I'm not 100% sure if it's a real issue and if 


You are always welcome to open Community BZ  (instead of trello/github/ )
Provide justification, present patches.

Of course I cannot hide :) RH has some sort of influence which bugs are more 
important then the others...



it's a work that upstream hopes
to see, every engineer wants their work to be accepted by upstream :) I can 
try to go forward to do meaningful
work (research, testing...) as far as I can, if you experts can confirm that 
"that's a real problem. Go ahead!".


We do our best

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-11 Thread Zdenek Kabelac


Dne 11.9.2017 v 18:55 David Teigland napsal(a):

On Mon, Sep 11, 2017 at 03:11:06PM +0200, Zdenek Kabelac wrote:

Aye but does design have to be complete failure when condition runs out?


YES


I am not satisfied with the way thin pools fail when space is exhausted,
and we aim to do better.  Our goal should be that the system behaves at
least no worse than a file system reaching 100% usage on a normal LV.


We can reach this goal anytime soon - unless we fix all those filesystem

And there is other metrics - you can make it way more 'safe' for exhausted
space at the prices of massively slowing down a serializing all writes...

I doubt we would find many users that would easily accept massive slowdown of 
their system just because thin-pool can run out of space


Global anonymous page-cache is really a hard thing for resolving...

But when you start to limit your usage of thin-pool with some constrains,
you can get much better behaving system.

i.e. using 'ext4' for mounted  'data' LV should be relatively safe...

And again if you see actual kernel crash OOPS - this is of course a real
kernel bug for fixing...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 13.9.2017 v 00:55 Gionatan Danti napsal(a):

Il 13-09-2017 00:41 Zdenek Kabelac ha scritto:

There are maybe few worthy comments - XFS is great on stanadar big
volumes, but there used to be some hidden details when used on thinly
provisioned volumes on older RHEL (7.0, 7.1)

So now it depend how old distro you use (I'd probably highly recommend
upgrade to RH7.4 if you are on RHEL based distro)


Sure.


Basically 'XFS' does not have similar 'remount-ro' on error behavior
which 'extX' provides - but now XFS knows how to shutdown itself when
meta/data updates starts to fail - although you may need to tune some
'sysfs' params to get 'ideal' behavior.


True, with a catch: with the default data=ordered option, even ext4 does *not* 
remount read only when data writeout fails. You need to use both 
"errors=remount-ro" and "data=journal" which basically nobody uses.



Personally for smaller sized thin volumes I'd prefer 'ext4' over XFS -
unless you demand some specific XFS feature...


Thanks for the input. So, do you run your ext4 filesystem with data=journal? 
How they behave performane-wise?




As said data=journal   is big performance killer (especially on SSD)

Personally  I prefer early 'shutdown' in case the situation becomes critical
(i.e. 95% fullness because some process gets crazy)

But you can write any advanced scripting logic to suit best your needs -

i.e. replace all thins on thin-pool with 'error' target
(which  is as simple as using  'dmsetup remove --force' - this will
make all future read/writes  giving you  i/o errors)

Simply do all in user-space early enough before  thin-pool can ever get NEAR t 
being 100% full -  reaction is really quick - and you have at least 60seconds 
to solve the problem in worst case.




Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Zdenek Kabelac


Dne 13.9.2017 v 02:04 matthew patton napsal(a):

'yes'

The filesystem may not be resident on the hypervisor (dom0) so 'dmsetup 
suspend' is probably more apropos. How well that propagates upward to the 
unwary client VM remains to be seen. But if one were running a NFS server using 
thin+xfs/ext4 then the 'fsfreeze' makes sense.




lvm2 is not 'expecting' someone will touch lvm2 controlled DM devices.

If you suspend  thinLV with dmsetup - you are in big 'danger' of freezing 
further lvm2 processing - i.e. command will try scan device list and will get 
blocked on suspend  (even if lvm2 checks for 'suspended' dm devices to skip 
then via lvm.conf setting - there is clear race)


So any solution which works outside lvm2 and changes  dm table outside of lvm2 
locking mechanism is hardly supportable - it can be used as 'last weapon' - 
but it should be clear to user - that next proper step is to reboot the 
machine



Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Zdenek Kabelac


Dne 13.9.2017 v 10:28 Gionatan Danti napsal(a):

Il 13-09-2017 10:15 Zdenek Kabelac ha scritto:

Ohh this is pretty major constrain ;)




I can well imagine  LVM will let you forcible  replace such LV with
error target  - so instead of  thinLV  - you will have  single 'error'
target snapshot - which could be possibly even  auto-cleaned once the
volume use-count drops bellow 0  (lvmpolld/dmeventd monitoring
whatever...)

(Of course - we are not solving what happens to application
using/running out of such error target - hopefully something not
completely bad)

This way - you get very 'powerful' weapon to be used in those 'scriplets'
so you can drop uneeded volumes ANYTIME you need to and reclaim its 
resources...




This would be *really* great. I played with dm-setup remove/error target and, 
while working, it often freezed LVM.

An integrated forced volume removal/swith to error target would be great.




Forcible remove (with some reasonable locking - so i.e. 2 processes are not 
playing with same device :)   'dmsetup remove --force' - is replacing

existing device with 'error' target (with built-in noflush)

Anyway - if you see a reproducible problem with forcible removal - it needs to 
be reported as this is a real bug then and BZ shall be opened...


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 12.9.2017 v 19:14 Gionatan Danti napsal(a):

On 12/09/2017 16:37, Zdenek Kabelac wrote:

ZFS with zpolls with thin with thinpools running directly on top of device.

If zpools - are 'equally' fast as thins  - and gives you better protection,
and more sane logic the why is still anyone using thins???

I'd really love to see some benchmarks

Of course if you slow down speed of thin-pool and add way more 
synchronization points and consume 10x more memory :) you can get better 
behavior in those exceptional cases which are only hit by unexperienced 
users who tends to intentionally use thin-pools in incorrect way.


Having benchmarked them, I can reply :)

ZFS/ZVOLs surely are slower than thinp, full stop.
However, they are not *massively* slower.


Users interested in thin-provisioning are really mostly interested in 
performance - especially on multicore machines with lots of fast storage with 
high IOPS throughput  (some of them even expect it should be at least as good 
as linear)


So ATM it's preferred to have more complex 'corner-case' which really mostly 
never happens when thin-pool is operated properly and the remaining use case 
you don't pay higher price for having all data always in sync and also you get 
way lower memory foot-print

(I think especially ZFS is well known for nontrivial memory resource 
consumption)

As has been pointed already few times in this thread - lots of those
'reserved space' ideas can be already easily handled by just more advanced 
scripting around notification from dmeventd - if you will keep thinking for a 
while you will at some point see the reasoning.


There is no difference if you start to solve problem around 70% fullness then
100%  - the main difference is - with some 'free-space' in thin-pool you can 
resolve problem way more easily and correctly.


Repeated again - whoever targets for 100% full thin-pool usage has 
misunderstood purpose of thin-provisioning.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Zdenek Kabelac


Dne 13.9.2017 v 04:23 matthew patton napsal(a):

I don't recall seeing an actual, practical, real-world example of why this 
issue got broached again. So here goes.

Create a thin LV on KVM dom0, put XFS/EXT4 on it, lay down (sparse) files as 
KVM virtual disk files.
Create and launch VMs and configure to suit. For example a dedicated VM for 
each of web server, a Tomcat server, and database. Let's call it a 'Stack'.
You're done configuring it.

You take a snapshot as a "restore point".
Then you present to your developers (or customers) a "drive-by" clone (snapshot) of the 
LV in which changes are typically quite limited (but could go up to full capacity) worth of 
overwrites depending on how much they test/play with it. You could have 500 such copies resident. 
Thin LV clones are damn convenient and mostly "free" and attractive for that purpose.




There is one point which IMHO would be way more worth to invest resource into 
ATM whenever you have snapshot -  there is unfortunately no page-cache sharing.


So i.e. you have 10 LVs being snapshots of the single origin you get 10 
different copies of pages in RAM of the same data.


But this is really hard problem to solve...


Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 13.9.2017 v 00:41 Gionatan Danti napsal(a):

Il 13-09-2017 00:16 Zdenek Kabelac ha scritto:

Dne 12.9.2017 v 23:36 Gionatan Danti napsal(a):

Il 12-09-2017 21:44 matthew patton ha scritto:



Again, please don't speak about things you don't know.
I am *not* interested in thin provisioning itself at all; on the other 
side, I find CoW and fast snapshots very useful.



Not going to comment KVM storage architecture - but with this statemnet -
you have VERY simple usage:


Just minimize chance for overprovisioning -

let's go by example:

you have  10  10GiB volumes  and you have 20 snapshots...


to not overprovision - you need 10 GiB * 30 LV  = 300GiB thin-pool.

if that sounds too-much.

you can go with 150 GiB - to always 100% cover all 'base' volumes.
and have some room for snapshots.


Now the fun begins - while monitoring is running -
you get callback for  50%, 55%... 95% 100%
at each moment  you can do whatever action you need.


So assume 100GiB is bare minimum for base volumes - you ignore any
state with less then 66% occupancy of thin-pool and you start solving
problems with 85% (~128GiB)- you know some snapshot is better to be
dropped.
You may try 'harder' actions for higher percentage.
(you need to consider how many dirty pages you leave floating your system
and other variables)

Also you pick with some logic the snapshot which you want to drop -
Maybe the oldest ?
(see airplane :) URL link)

Anyway - you have plenty of time to solve it still at this moment
without any danger of losing write operation...
All you can lose is some 'snapshot' which might have been present a
bit longer...  but that is supposedly fine with your model workflow...

Of course you are getting in serious problem, if you try to keep all
these demo-volumes within 50GiB with massive overprovisioning ;)

There you have much hard times what should happen what should be
removed and where is possibly better to STOP everything and let admin
decide what is the ideal next step



Hi Zdenek,
I fully agree with what you said above, and I sincerely thank you for taking 
the time to reply.
However, I am not sure to understand *why* reserving space for a thin volume 
seems a bad idea to you.


Lets have a 100 GB thin pool, and wanting to *never* run out of space in spite 
of taking multiple snapshots.
To achieve that, I need to a) carefully size the original volume, b) ask the 
thin pool to reserve the needed space and c) counting the "live" data (REFER 
in ZFS terms) allocated inside the thin volume.


Step-by-step example:
- create a 40 GB thin volume and subtract its size from the thin pool (USED 40 
GB, FREE 60 GB, REFER 0 GB);

- overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- completely overwrite the original volume (USED 80 GB, FREE 20 GB, REFER 40 
GB);
- a new snapshot creation will fails (REFER is higher then FREE).

Result: thin pool is *never allowed* to fill. You need to keep track of 
per-volume USED and REFER space, but thinp performance should not be impacted 
in any manner. This is not theoretical: it is already working in this manner 
with ZVOLs and refreservation, *without* involing/requiring any advanced 
coupling/integration between block and filesystem layers.


Don't get me wrong: I am sure that, if you choose to not implement this 
scheme, you have a very good reason to do that. Moreover, I understand that 
patches are welcome :)


But I would like to understand *why* this possibility is ruled out with such 
firmness.




There could be a simple answer and complex one :)

I'd start with simple one - already presented here -

when you write to INDIVIDUAL thin volume target - respective dn thin target 
DOES manipulate with single btree set - it does NOT care there are some other 
snapshot and never influnces them -


You ask here to heavily 'change' thin-pool logic - so writing to THIN volume A 
 can remove/influence volume B - this is very problematic for meny reasons.


We can go into details of BTree updates  (that should be really discussed with 
its authors on dm channel ;)) - but I think the key element is capturing the 
idea the usage of thinLV A does not change thinLV B.






Now to your free 'reserved' space fiction :)
There is NO way to decide WHO deserves to use the reserve :)

Every thin volume is equal - (the fact we call some thin LV snapshot is 
user-land fiction - in kernel all thinLV are just equal -  every thinLV 
reference set of thin-pool chunks)  -


(for late-night thinking -  what would be snapshot of snapshot which is fully 
overwritten ;))


So when you now see that all thinLVs  just maps set of chunks,
and all thinLVs can be active and running concurrently - how do you want to 
use reserves in thin-pool :) ?

When do you decide it ?  (you need to see this is total race-lend)
How do you actually orchestrate locking around this single point of failure ;) ?
You will surely come wit

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 12.9.2017 v 13:34 Gionatan Danti napsal(a):

On 12/09/2017 13:01, Zdenek Kabelac wrote:

There is very good reason why thinLV is fast - when you work with thinLV -
you work only with data-set for single thin LV.


Sad/bad news here - it's not going to work this way


No, I absolutely *do not want* thinp to automatically dallocate/trash some 
provisioned blocks. Rather, I all for something as "if free space is lower 
than 30%, disable new snapshot *creation*"






# lvs -a
  LV  VG Attr   LSize  Pool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert
  [lvol0_pmspare] vg ewi---  2,00m 

  lvol1   vg Vwi-a-tz-- 20,00m pool40,00 

  poolvg twi-aotz-- 10,00m 80,00  1,95 

  [pool_tdata]vg Twi-ao 10,00m 

  [pool_tmeta]vg ewi-ao  2,00m 


[root@linux export]# lvcreate -V10 vg/pool
  Using default stripesize 64,00 KiB.
  Reducing requested stripe size 64,00 KiB to maximum, physical extent size 
32,00 KiB.
  Cannot create new thin volume, free space in thin pool vg/pool reached 
threshold.


# lvcreate -s vg/lvol1
  Using default stripesize 64,00 KiB.
  Reducing requested stripe size 64,00 KiB to maximum, physical extent size 
32,00 KiB.
  Cannot create new thin volume, free space in thin pool vg/pool reached 
threshold.


# grep thin_pool_autoextend_threshold /etc/lvm/lvm.conf
# Configuration option activation/thin_pool_autoextend_threshold.
# thin_pool_autoextend_threshold = 70
thin_pool_autoextend_threshold = 70

So as you can see - lvm2 clearly prohibits you to create a new thinLV
when you are above defined threshold.


To keep things single for a user - we have a single threshold value.


So what else is missing ?



lvm2 also DOES protect you from creation of new thin-pool when the fullness
is about lvm.conf defined threshold - so nothing really new here...


Maybe I am missing something: this threshold is about new thin pools or new 
snapshots within a single pool? I was really speaking about the latter.


Yes - threshold applies to 'extension' as well as to creation of new thinLV.
(and snapshot is just a new thinLV)

Let me repeat: I do *not* want thinp to automatically drop anything. I simply 
what it to disallow new snapshot/volume creation when unallocated space is too 
low


as said - already implemented

 Committed (fsynced) writes are safe, and this is very good. However, *many*

application do not properly issue fsync(); this is a fact of life.

I absolutely *do not expect* thinp to automatically cope well with this 
applications - I full understand & agree that application *must* issue proper 
fsyncs.




Unfortunatelly lvm2 nor dm  can be responsible for whole kernel logic and
all user-land apps...


Yes - anonymous pages cache is somewhat Achilles' heel - but it's not a 
problem of thin-pool - all other 'provisioning' systems has some troubles


So we really cannot fix it here.

You would need to prove that different strategy is better and fix linux kernel 
for this.


Until this moment - you need use well written user-land apps :) properly 
syncing written data - or not use thin-provisioning (and others).


You can also minimize amount of 'dirty' pages to avoid loosing too much data
in case you hit full thin-pool unexpectedly.

You can sync every second to minimize amount of dirty pages

Lots of things  all of them will in some other the other impact system 
performance



In the past, I testified that XFS take its relatively long time to recognize 
that a thin volume is unavailable - and many async writes can be lost in the 
process. Ext4 + data=journaled did a better job, but a) it is not the default 
filesystem in RH anymore and b) data=journaled is not the default option and 
has its share of problems.


journaled is very 'secure' - but also very slow

So depends what you aim for.

But this really cannot be solved on DM side...

So, if in the face of a near-full pool, thinp refuse me to create a new 
filesystem, I would be happy :)


So you are already happy right  :) ?
Your wish is upstream already for quite some time ;)

Regards

Zdenbek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 11.9.2017 v 15:46 Xen napsal(a):

Zdenek Kabelac schreef op 11-09-2017 15:11:


Thin-provisioning is - about 'postponing'  available space to be
delivered in time


That is just one use case.

Many more people probably use it for other use case.

Which is fixed storage space and thin provisioning of available storage.


You order some work which cost $100.
You have just $30, but you know, you will have $90 next week -
so the work can start


I know the typical use case that you advocate yes.


But it seems some users know it will cost $100, but they still think
the work could be done with $10 and it's will 'just' work the same


No that's not what people want.

People want efficient usage of data without BTRFS, that's all.



What's wrong with BTRFS

Either you want  fs & block layer tied together - that the btrfs/zfs approach

or you want

layered approach with separate 'fs' and block layer  (dm approach)

If you are advocating here to start mixing 'dm' with 'fs' layer, just
because you do not want to use 'btrfs' you'll probably not gain main traction 
here...




File system level failure can also not be critical because of using 
non-critical volume because LVM might fail even though filesystem does not 
fail or applications.


So my Laptop machine has 32G RAM - so you can have 60% of dirty-pages
those may raise pretty major 'provisioning' storm


Yes but still system does not need to crash, right.


We  need to see EXACTLY which kind of crash do you mean.

If you are using some older kernel - then please upgrade first and
provide proper BZ case with reproducer.

BTW you can imagine an out-of-space thin-pool with thin volume and filesystem 
as a FS, where some writes ends with 'write-error'.



If you think there is OS system which keeps running uninterrupted, while 
number of writes ends with 'error'  - show them :)  - maybe we should stop 
working on Linux and switch to that (supposedly much better) different OS




But we are talking about generic case here no on some individual sub-cases
where some limitation might give you the chance to rescue better...


But no one in his right mind currently runs /rootvolume out of thin pool and 
in pretty much all cases probably it is only used for data or for example of 
hosting virtual hosts/containers/virtualized environments/guests.


You can have different pools and you can use rootfs  with thins to easily test 
i.e. system upgrades



So Data use for thin volume is pretty much intended/common/standard use case.

Now maybe amount of people that will be able to have running system after data 
volumes overprovision/fill up/crash is limited.


Most thin-pool users are AWARE how to properly use it ;)  lvm2 tries to 
minimize (data-lost) impact for misused thin-pools - but we can't spend too 
much effort there


So what is important:
'commited' data (i.e. transaction database) are never lost
fsck after reboot should work.

If any of these 2 condition do not work - that's serious bug.

But if you advocate for continuing system use of out-of-space thin-pool - that 
I'd probably recommend start sending patches...  as an lvm2 developer I'm not 
seeing this as best time investment but anyway...



However, from both a theoretical and practical standpoint being able to just 
shut down whatever services use those data volumes -- which is only possible 


Are you aware there is just one single page cache shared for all devices
in your system ?


if base system is still running -- makes for far easier recovery than anything 
else, because how are you going to boot system reliably without using any of 
those data volumes? You need rescue mode etc.


Again do you have use-case where you see a crash of data mounted volume
on overfilled thin-pool ?

On my system - I could easily umount such volume after all 'write' requests
are timeouted (eventually use thin-pool with --errorwhenfull y   for instant 
error reaction.


So please can you stop repeating overfilled thin-pool with thin LV data volume 
kills/crashes machine - unless you open BZ and prove otherwise -  you will 
surely get 'fs' corruption  but nothing like crashing OS can be observed on my 
boxes


We are here really interested in upstream issues - not about missing bug fixes 
 backports into every distribution  and its every released version



He might be able to recover his system if his system is still allowed to be 
logged into.


There is no problem with that as long as  /rootfs has consistently working fs!

Regards

Zdene

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 11.9.2017 v 23:59 Gionatan Danti napsal(a):

Il 11-09-2017 12:35 Zdenek Kabelac ha scritto:

The first question here is - why do you want to use thin-provisioning ?


Because classic LVM snapshot behavior (slow write speed and linear performance 
decrease as snapshot count increases) make them useful for nightly backups only.


On the other side, the very fast CoW thinp's behavior mean very usable and 
frequent snapshots (which are very useful to recover from user errors).




There is very good reason why thinLV is fast - when you work with thinLV -
you work only with data-set for single thin LV.

So you write to thinLV and either you modify existing exclusively owned chunk
or you duplicate and provision new one.   Single thinLV does not care about
other thin volume - this is very important to think about and it's important 
for reasonable performance and memory and cpu resources usage.



As thin-provisioning is about 'promising the space you can deliver
later when needed'  - it's not about hidden magic to make the space
out-of-nowhere.


I fully agree. In fact, I was asking about how to reserve space to *protect* 
critical thin volumes from "liberal" resource use by less important volumes. 


I think you need to think 'wider'.

You do not need to use a single thin-pool - you can have numerous thin-pools,
and for each one you can maintain separate thresholds (for now in your own
scripting - but doable with today's  lvm2)

Why would you want to place 'critical' volume into the same pool
as some non-critical one ??

It's simply way easier to have critical volumes in different thin-pool
where you might not even use over-provisioning.


I do *not* want to run at 100% data usage. Actually, I want to avoid it 
entirely by setting a reserved space which cannot be used for things as 
snapshot. In other words, I would very like to see a snapshot to fail rather 
than its volume becoming unavailable *and* corrupted.


Seems to me - everyone here looks for a solution where thin-pool is used till 
the very last chunk in thin-pool is allocated - then some magical AI step in,

decides smartly which  'other already allocated chunk' can be trashed
(possibly the one with minimal impact  :)) - and whole think will continue
run in full speed ;)

Sad/bad news here - it's not going to work this way

In ZFS words, there are object called ZVOLs - ZFS volumes/block devices, which 
can either be "fully-preallocated" or "sparse".


By default, they are "fully-preallocated": their entire nominal space is 
reseved and subtracted from the ZPOOL total capacity. Please note that this 


Fully-preallocated - sounds like thin-pool without overprovisioning to me...


# Snapshot creating - please see that, as REFER is very low (I did write 
nothig on the volume), snapshot creating is allowed


lvm2 also DOES protect you from creation of new thin-pool when the fullness
is about lvm.conf defined threshold - so nothing really new here...



[root@blackhole ~]# zfs destroy tank/vol1@snap1
[root@blackhole ~]# dd if=/dev/zero of=/dev/zvol/tank/vol1 bs=1M count=500 
oflag=direct

500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 12.7038 s, 41.3 MB/s
[root@blackhole ~]# zfs list -t all
NAME    USED  AVAIL  REFER  MOUNTPOINT
tank    622M   258M    96K  /tank
tank/vol1   621M   378M   501M  -

# Snapshot creation now FAILS!


ZFS is filesystem.

So let's repeat again :) amount of problems inside a single filesystem is not 
comparable with block-device layer - it's entirely different world of problems.


You can't really expect filesystem 'smartness' on block-layer.

That's the reason why we can see all those developers boldly stepping into the 
'dark waters' of  mixed filesystem & block layers.


lvm2/dm trusts in different concept - it's possibly less efficient,
but possibly way more secure - where you have different layers,
and each layer could be replaced and is maintained separately.

The above surely is safe behavior: when free, unused space is too low to 
guarantee the reserved space, snapshot creation is disallowed.



ATM thin-pool cannot somehow auto-magically 'drop'  snapshots on its own.

And that's the reason why we have those monitoring features provided with 
dmeventd.   Where you monitor  occupancy of thin-pool and when the

fullness goes above defined threshold  - some 'action' needs to happen.

It's really up-to admin to decide if it's more important to make some
free space for existing user writing his 10th copy of 16GB movie :) or erase
some snapshot with some important company work ;)

Just don't expect it will be some magical AI built-in into thin-pool to do 
such decision :)


User already has ALL the power to do this work - the main condition here is - 
this happens much earlier then your thin-pool gets exhausted!


It's really pointless trying to solve this issue after you are already 
out-of-space...


Now leave ZWORLD, and back to thinp: it woul

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Zdenek Kabelac


Dne 12.9.2017 v 14:47 Xen napsal(a):

Zdenek Kabelac schreef op 12-09-2017 14:03:


Unfortunatelly lvm2 nor dm  can be responsible for whole kernel logic and
all user-land apps...


What Gionatan also means, or at least what I mean here is,

If functioning is chain and every link can be the weakest link.

Then sometimes you can build in a little redundancy so that other weak links 
do not break so easily. Or that your part can cover it.


Linux has had a mindset of reducing redundancy lately. So every bug anywhere 
can break the entire thing.


Example was something I advocated for myself.

I installed GRUB2 inside PV reserved space.

That means 2nd sector had PV, 1st sector had MBR-like boot sector.

libblkid stopped at MBR and did not recognise PV.



This bug has been reported (by me even to libblkid maintainer) AND already 
fixed already in past


Yes - surprise software has bugs...

But to defend a bit libblkid maintainer side :) - this feature was not really 
well documented from lvm2 side...



You can sync every second to minimize amount of dirty pages

Lots of things  all of them will in some other the other impact
system performance


He said no people would be hurt by such a measure except people who wanted to 
unpack and compile kernel pure in page buffers ;-).



So clearly you need to spend resources effectively and support both groups...
Sometimes is better to use large RAM (common laptops have 32G of RAM nowadays)
Sometimes is better to have more 'data' securely and permanently stored...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Difference between Debian and some other distributions with thin provisioning

2017-09-30 Thread Zdenek Kabelac


Dne 29.9.2017 v 18:42 Jan Tulak napsal(a):

Hi guys,
I found out this difference and I'm not sure what is the cause. A
command for creating a thin LV, which works on Archlinux, Centos and
Fedora, fails on Debian and Ubuntu:

lvm lvcreate FOOvg1 \
 -T \
 -l 100%PVS \
 -n FOOvg1_thin001 \
 /dev/loop0 /dev/loop1 /dev/loop2


Hi

This command is not actually creating 'thin'  - only a thin-pool.
For 'thin' you need to specify -V  --virtualsize.

The usage of -T  is 'smart' flag - telling lvm2 to do 'something' 
thin/thin-pool oriented.


If you want to be 'exact' you can use --type thin  or --type thin-pool giving 
you more errors and requiring user more specific option for passing in.




When it is created, lvs shows attributes "twi-a-tz--". But Debian and
Ubuntu complain that  "--name may only be given when creating a new
thin Logical volume or snapshot," even though the command states it
should be thin with -T. A script that tests this issue is at the end
of this email


Yep - the logic of command is internally enhanced over time and when user does 
specify parameters in a way where lvm2 can interpret them uniquely - it will 
do it.


BTW step-by around ;) for detailed explanation of all possible combination 
users can use to create  thin-pool or thin or both volumes at the same time.



Do you know if there is a reason for this different behaviour?


It's not different -  newer version is 'enhanced' and just accepts wider 
syntax range.



Versions seem to be close enough for it not to be a change in LVM


Hmm, close enough 111 and 171 ??



behaviour, so I suspect some downstream or configuration changes. All
tested distributions were run in their current stable versions, up to
date. For example:


The original naming convention with pool names was to add the name next to VG 
name.


lvcreate -T -L10 vg/pool

Pool is 'special' type of LV - thus it got this privileged naming and standard 
option --name was normally left for  'thin' when user specifies -V.


Newer version of lvm2 can use  --name with --size specifier for pool name
when there is no virtual size given as its unique combination.


Debian:
# cat /etc/debian_version
8.9
# lvm version
   LVM version: 2.02.111(2) (2014-09-01)
   Library version: 1.02.90 (2014-09-01) >Driver version:  4.27.0


I'd suggest upgrading your Debian machine

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Distributed Locking of LVM

2017-08-31 Thread Zdenek Kabelac


Dne 31.8.2017 v 08:32 Kalyana sundaram napsal(a):

Thanks all people
I understand reboot/fencing is mandatory
I hope the visibility might be better in external locking tool like redis
With lvmlockd I find no deb available for ubuntu, and documentations for clvm 
to handle an issue is difficult to find


Unfortunately  Debian based distros are not 'the best fit' for anything lvm2 
related and it's not lvm2 fault ;) and it's pretty hard to fix


Maybe you can try some other distro ?
Or eventually building things from source ?


Regards


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-23 Thread Zdenek Kabelac


Dne 21.10.2017 v 16:33 Oleg Cherkasov napsal(a):

On 20. okt. 2017 21:35, John Stoffel wrote:

"Oleg" == Oleg Cherkasov  writes:


Oleg> On 19. okt. 2017 21:09, John Stoffel wrote:




Oleg> RAM 12Gb, swap around 12Gb as well.  /dev/sda is a hardware RAID1, the
Oleg> rest are RAID5.

Interesting, it's all hardware RAID devices from what I can see.


It is exactly what I wrote initially in my first message!



Can you should the *exact* commands you used to make the cache?  Are
you using lvcache, or bcache?  they're two totally different beasts.
I looked into bcache in the past, but since you can't remove it from
an LV, I decided not to use it.  I use lvcache like this:


I have used lvcache of course and here are commands from bash history:

lvcreate -L 1G -n primary_backup_lv_cache_meta primary_backup_vg /dev/sda5

### Allocate ~247G ib /dev/sda5 what has left of VG
lvcreate -l 100%FREE -n primary_backup_lv_cache primary_backup_vg /dev/sda5

lvconvert --type cache-pool --cachemode writethrough --poolmetadata 
primary_backup_vg/primary_backup_lv_cache_meta 
primary_backup_vg/primary_backup_lv_cache


lvconvert --type cache --cachepool primary_backup_vg/primary_backup_lv_cache 
primary_backup_vg/primary_backup_lv


### lvconvert failed because required some extra extends in VG so I had to 
reduce cache LV and try again:


lvreduce -L 200M primary_backup_vg/primary_backup_lv_cache




Hi

Without plans to interrupt thoughts on topic here - the explanation here is 
very simple.


Cache pool is made from 'data' & 'metadata' LV - so both needs some space.
In the case of 'cache pool' it's pretty good plan to have both device is fast 
spindle (SSD).


So can you please provide output of:

lvs -a -o+devices

so it could be easily validated both _cdata & _cmeta LV is hosted by some SSD 
device (it's not shown anywhere in the thread - so just to be sure we have 
them on right disks)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] unable to remove snapshot of raid1 leg

2017-10-24 Thread Zdenek Kabelac


Dne 23.10.2017 v 18:40 Alexander 'Leo' Bergolth napsal(a):

On 10/23/2017 04:44 PM, Heinz Mauelshagen wrote:

LVM snapshots are meant to be used on the user visible raid1 LVs.
You found a bug allowing it to be used on its hidden legs.

Removing such per leg snapshot should be possible after the raid1 LV got
deactivated.


Hmm. This LV is currently in use. Any chance to remove it with the raid1
LV active?

Cheers,


Hi

Not with lvm2 command.

I could probably imagine a way with several dmsetup commands but it looks
fairly easier to do it in some 'offline' time.

-

The one 'risky' idea could be to try 'vgcfgrestore' metadata before taking
snapshot then (forcibly) remove  (dmsetup remove (-f)  vg-snapshot)  so 
snapshot is no longer referencing origin.

(force should not be needed - but if some has opened snapshot...)

Then you should be able to use  'lvchange --refresh vg'

and finally drop  'left-over'  cow volume with dmsetup remove.
(and make sure snapshot is already gone - if not - try remove again)

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reattach cache

2017-11-22 Thread Zdenek Kabelac


Dne 22.11.2017 v 09:55 Xen napsal(a):

Ehm,

When you split a cache and later reattach it, LVM ensures it is in a 
consistent state right?


LVM is a bit old, I mean Ubuntu 16.04 version, so something about 133.



Hi

Sorry but version 133 is really ancient - the original purpose of 'cache' 
--splitcache was rather more debug oriented.


Clearing of cache-pool metadata when cache-pool is reattached to another LV 
goes with version 2.02.162 - so your version is way too old...


In your case - just destroy the cache (--uncache) and do not try to reuse 
cache-pool unless you really know what you are doing.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM hangs

2017-11-16 Thread Zdenek Kabelac


Dne 16.11.2017 v 12:02 Alexander 'Leo' Bergolth napsal(a):

On 2017-11-13 15:51, Zdenek Kabelac wrote:

Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):

I have a EL7 desktop box with two sata harddisks and two ssds in a
LVM raid1 - thin pool - cache configuration. (Just migrated to this
setup a few weeks ago.)

After some days, individual processes start to block in disk wait.
I don't know if the problem resides in the cache-, thin- or raid1-layer
but the underlying block-devices are fully responsive.


It would be probably nice to see the result of 'dmsetup status'

I'd have guessed you are probably hitting  'frozen' raid state
which is unfortunate existing upstream bug.


As it just happened again, I have collected some additional info like
dmsetup status
dmsetup info -c (do the event counts look suspicious?)

https://leo.kloburg.at/tmp/lvm-blocks/2017-11-16/

I don't see any volume in "frozen" state.

I haven't rebooted the box yet. Maybe I provide some more info?




From the plain look over those file - it doesn't even seem there is anything 
wrong with dm devices as such.


So it looks like  possibly XFS got into some unhappy moment.

I'd probably recommend to open regular  Bugzilla case  and attach files from 
your directory.


You can try if individual devices in the  'stack' are blocked.

i.e. try 'dd' read from every 'dm'  if there is something blocked.


From status all device looks fully operational and also process stack trace 
do look reasonable idle.



I'm not sure how 'afs' is involved here - can you reproduce without afs ?


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM hangs

2017-11-13 Thread Zdenek Kabelac


Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):

Hi!

I have a EL7 desktop box with two sata harddisks and two ssds in a
LVM raid1 - thin pool - cache configuration. (Just migrated to this
setup a few weeks ago.)

After some days, individual processes start to block in disk wait.
I don't know if the problem resides in the cache-, thin- or raid1-layer
but the underlying block-devices are fully responsive.

I have prepared some info at:
   http://leo.kloburg.at/tmp/lvm-blocks/

Do the stack backtraces provide enough information to locate the source
of the blocks?

I'd be happy to provide additional info, if necessary.
Meanwhile I'll disable the LVM cache layer to eliminate this potential
candidate.



Hi


It would be probably nice to see the result of 'dmsetup status'

I'd have guessed you are probably hitting  'frozen' raid state
which is unfortunate existing upstream bug.


Regards


Zdenek



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM hangs

2017-11-13 Thread Zdenek Kabelac


Dne 13.11.2017 v 16:12 Alexander 'Leo' Bergolth napsal(a):

Hi!

On 11/13/2017 03:51 PM, Zdenek Kabelac wrote:

Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):

I have a EL7 desktop box with two sata harddisks and two ssds in a
LVM raid1 - thin pool - cache configuration. (Just migrated to this
setup a few weeks ago.)

After some days, individual processes start to block in disk wait.
I don't know if the problem resides in the cache-, thin- or raid1-layer
but the underlying block-devices are fully responsive.

I have prepared some info at:
    http://leo.kloburg.at/tmp/lvm-blocks/

Do the stack backtraces provide enough information to locate the source
of the blocks?

I'd be happy to provide additional info, if necessary.
Meanwhile I'll disable the LVM cache layer to eliminate this potential
candidate.


It would be probably nice to see the result of 'dmsetup status'



OK. Will be included next time.



I'd have guessed you are probably hitting  'frozen' raid state
which is unfortunate existing upstream bug.


Are you talking about RH bug 1388632?
https://bugzilla.redhat.com/show_bug.cgi?id=1388632

Unfortunately I can only view the google-cached version of the bugzilla
page, since the bug is restricted to internal view only.



that could be similar issue yes


But the google-cached version suggests that the bug is mainly hit when
removing the raid-backed cache pool under IO.

I my scenario, no modification (like cache removal) of the lvm setup was
done when the blocks occured.


Easiest is to check  'dmsetup status' - just to exclude if it's frozen raid 
case.


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM hangs

2017-11-13 Thread Zdenek Kabelac


Dne 13.11.2017 v 18:41 Gionatan Danti napsal(a):

On 13/11/2017 16:20, Zdenek Kabelac wrote:


Are you talking about RH bug 1388632?
https://bugzilla.redhat.com/show_bug.cgi?id=1388632

Unfortunately I can only view the google-cached version of the bugzilla
page, since the bug is restricted to internal view only.



that could be similar issue yes


But the google-cached version suggests that the bug is mainly hit when
removing the raid-backed cache pool under IO.

I my scenario, no modification (like cache removal) of the lvm setup was
done when the blocks occured.


Easiest is to check  'dmsetup status' - just to exclude if it's frozen raid 
case.


Hi Zdeneck,
due to how easy is to trigger the bug, it seems a very serious problem to me. 
As the bug report is for internal use only, can you shed some light on what 
causes it and how to avoid?


Specifically can you confirm that, if using an "old-school" mdadm RAID device, 
the bug does not apply?



IMHO this particular issue is probably not triggerable (at least not so 
easily) by mdadm.


lvm2 has some sort of problem compared to mdadm - it's able to 'generate' more 
device state changes per second then mdadm.


BZ is still being examined AFAIK


Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Why LVM doesn't send the dm-thin delete message immediately after volume creation failure

2017-12-04 Thread Zdenek Kabelac


Dne 4.12.2017 v 05:30 Ming-Hung Tsai napsal(a):

Hi All,

I'm not sure if it is a bug or an intention. If there's error in
volume creation, the function _lv_create_an_lv() invokes lv_remove()
to delete the newly created volume. However, in the case of creating
thin volumes, it just queues a "delete" message without sending it
immediately. The pending message won't be process until the next
lvcreate.

Why not send it before the end of _lv_create_an_lv(), to ensure the
synchronization between LVM and kernel? (i.e., no dangling volume in
kernel metadata, and the transaction ID is also synced)



Hi

This is unfortunately not as easy as it might look like.

This error path is quite hard and lvm2 is not yet fully capable to handle all 
variants here.


So ATM it tries to do 'less harm' so possible further recovery is more simple.

The reason here is this - when kernel 'thin-pool' is deleting  any thin device 
it needs some 'free' metadata space to handle this operation (as it never 
overwrite existing btrees and uses journaling).


When thin-pool fails to create new thin device - there is likely big 
possibility the reason for failure is 'out-of-metadata' condition.


Clearly there is no point in trying to 'delete' anything if thin-pool hits 
this state.


ATM lvm2's  'lvcreate' command is incapable to do 'metadata' resize operation
to gain some more space for successful delete operation - so as such this
is left for separate 'repair'.

As always applies - do not force thin-pool to run at its corner case - since 
it's not yet possible to completely automated in all scenarios.


As a protection - lvm2 should not let you even create a new thinLV if the free 
space in metadata is above some certain safe-level - but clearly there is race
of  time-of-check & time-of-use - so if there is some major load pushed into 
thin-pool and at the some time there is try of create - we still may hit 
dead-spot (especially for small sized metadata).


So this briefly explain why we rather queue 'delete' instead of directly 
pushing it to thin-target.


To enhances this logic - we would need we more 'statuses' during operation for 
making sure we are not operating on already overfilled thin-pool - this will 
eventually happen...


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM2 - pvresize alert

2017-12-12 Thread Zdenek Kabelac


Dne 12.12.2017 v 14:13 VESELÍK Jan napsal(a):

Hello,

I would like to suggest small tweak in command pvresize. If you use parameter 
–setphysicalvolumesize, you will get only pasive wargning with this 
potentially dangerous action. In comparrison to lvresize with parameter –L -, 
meaning making logical volume smaller, you receive warning and prompt to 
continue.


Is it possible to implement this kind of prompt to enter yes or no before 
every lvm action that could pottentially lead to data loss.


Hi

Latest versions (>= 2.02.171) of lvm2 provides this prompt:


# pvs
  PV VG Fmt  Attr PSize   PFree
  /dev/loop0 vg lvm2 a--  <30,00g <30,00g


# pvresize --setphysical 1T /dev/loop0
  WARNING: /dev/loop0: Overriding real size 30,00 GiB. You could lose data.
/dev/loop0: Requested size 1,00 TiB exceeds real size 30,00 GiB. Proceed?  
[y/n]:



Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] [dm-devel] dmsetup hangs forever

2017-10-26 Thread Zdenek Kabelac


Dne 26.10.2017 v 10:07 Zhangyanfei (YF) napsal(a):

Hello

I find an issue when use  dmsetup in the situation udev event timeout.

Dmsetup use the dm_udev_wait function sync with udev event.When use the 
dmsetup generate a new dm-disk, if the raw disk is abnormal(for example ,a 
ipsan disk hung IO request), the udevd daemon handle the dm-disk udev event 
maybe timeout, and will not notify the dmsetup  by semaphore. And because the 
  dm_udev_wait use the semop to sync with udevd, if udevd event timeout, the 
dmsetup will hung forever even when the raw disk be recovery.


I wonder if we could use the semtimedop instead semop to add the timeout in 
function  dm_udev_wait. If the udevd daemon timeout when handle the dm event, 
the dm_udev_wait could timeout too, and the dmsetup could return error.


This is my patch base lvm2-2.02.115-3:



Hi


Unfortunately the same argument why this can't really work still applies.

If the  dm will start to timeout on it's own - without coordination with udev,
your system's logic will end-up with one big mess.

So if the dm would handle timeout - you would also need to provide mechanism 
to correct associated services around it.


The main case here is - it's mandatory it's udev finalizing any timeouts so 
it's in sync with db content.


Moreover if you start to timeout - you typically mask some system failure. In 
majority of cases I've ever seen - it's been always a bug from this category 
(buggy udev rule, or service). So it's always better to fix the bug then keep 
it masked.


AFAIK I'd like to see the semaphore to go away - but it needs wider cooperation.


Regards

Zdenek


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

1 2 3 4 >

1 - 100 of 374 matches

Mail list logo