Re: [qubes-users] Q4.0 - LVM Thin Pool volumes - lsblk returns very large (256kb) MIN-IO and DISC-GRAN values

Chris Laprise Sat, 25 May 2019 11:28:30 -0700

On 5/25/19 12:45 PM, Brendan Hoar wrote:

On Sat, May 25, 2019 at 12:09 PM Chris Laprise <[email protected]<mailto:[email protected]>> wrote:
    It would be interesting if thin-lvm min transfer were the reason for
    this difference in behavior between fstrim and the filesystem.


Indeed. Pretty sure that is the case for some workloads.

    However, I think you're wrong to assume that any free block at any
    scale
    should be discarded at the lvm level. This behavior is probably a
    feature designed to prevent pool metadata use from exploding to the
    point where the volume becomes slow or unmanageable. Controlling
    metadata size is a serious issue with COW storage systems and at some
    point compromises must be made between data efficiency and metadata
    efficiency.
Agreed. I started with that assumption but as I read through the docs Irealized there was some performance-related balancing going on.
    On thin-lvm volumes, maxing-out the allocated metadata space can have
    serious consequences including loss of the entire pool. I experienced
    this myself several weeks ago and I was just barely able to manage
    recovery without reinstalling the whole system – it involved deleting
    and re-creating the thin-pool, then restoring all the volumes from
    backup.


Ouch!
I’m going to add an Issue/Feature request to add metadata storemonitoring and alerts to the disk space widget. :)
—-
I will note that the docs indicate that lvcreate uses the poolallocation size divided by the chunk size times a multiplier todetermine the default metadata store size (assuming you don’t overridethe final value). So if you specify the chunk size the “default”metadata store is *supposed* to scale...
One can also specify a safer (larger) metadata store during lvcreate atthe expense of file storage of course.

Based on my experience (two metadata meltdowns since moving to Qubes 4)I would open another issue to have Qubes double or triple the system'sdefault metadata size after installation. Proportionally, the loss ofdata space is small and its easy to implement using 'lvresize--poolmetadatasize'.

I ran across a discussion of chunk size guidance and one thing I’ll noteis that for heavy COW workloads the recommendation was to keep the chunksize value at the low end but be sure to increase the metadata storesize. I’ll see if I can find it in my browser history.
    Run the 'lvs' command and look at the Meta% column for pool00. If its
    much more than 50% there is reason for concern, because if you put the
    system through a flurry of activity including cloning/snapshotting
    and/or modifying many small files then that figure could balloon close
    to 100% in a very short period.


Will do!
In the end I am just puzzled why the default chunk is 256k and not 64k,though. I haven’t found a place in the qubes installer iso source wherethe size is overriden.

64k is the minimum but this increases when the pool size reaches certainthresholds. On my system, its 128k. As for Redhat switching to such alarge (2MB) minimum size, I think it should be regarded as throwing upone's hands and giving up on the subject. IMO, its too large andshouldn't be used.

FWIW, Redhat's new COW storage system is a frankenstein patchwork usingxfs volumes like some kind of block layer. It looks about as elegant andcomprehensible as their other gift to the world, systemd. They need tohire better engineers.

I think the only _good_ way to deal with COW metadata expansion, sinceits always related to data fragmentation, is to keep expanding it andlet system performance degrade accordingly. This simply makesde-fragmentation maintenance issue (defrag to shrink metadata and getperformance back). This is what Microsoft did with NTFS and it was theright choice; clinging to fixed metadata sizes is merely a state ofdenial that leads to peoples' disks suddenly becoming unusable.

I also ran across docs from red hat saying the the 7.4 to 7.5 rheltransition moved from a default of 64KB to 2MB (possibly due toupstream?)...so discard on delete’s usefulness inside VMs may be evenmore constrained in the future if I read that right.


Its a good bet that "upstream" in this case is Redhat.

I’ll probably open a feature ticket asking for auto fstrim of themounted rw filesystems on templates/templated VM shutdowns. As it is, Ialready do this manually on templates after every update and from timeto time in VMs that see a lot of file churn.


--

Chris Laprise, [email protected]
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB  4AB3 1DC4 D106 F07F 1886

--
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/8d3d6bf8-48fb-20dc-8b0b-937b22497e2b%40posteo.net.
For more options, visit https://groups.google.com/d/optout.

Re: [qubes-users] Q4.0 - LVM Thin Pool volumes - lsblk returns very large (256kb) MIN-IO and DISC-GRAN values

Reply via email to