On 5/25/19 12:45 PM, Brendan Hoar wrote:

On Sat, May 25, 2019 at 12:09 PM Chris Laprise <[email protected] <mailto:[email protected]>> wrote:


    It would be interesting if thin-lvm min transfer were the reason for
    this difference in behavior between fstrim and the filesystem.


Indeed. Pretty sure that is the case for some workloads.

    However, I think you're wrong to assume that any free block at any
    scale
    should be discarded at the lvm level. This behavior is probably a
    feature designed to prevent pool metadata use from exploding to the
    point where the volume becomes slow or unmanageable. Controlling
    metadata size is a serious issue with COW storage systems and at some
    point compromises must be made between data efficiency and metadata
    efficiency.


Agreed. I started with that assumption but as I read through the docs I realized there was some performance-related balancing going on.

    On thin-lvm volumes, maxing-out the allocated metadata space can have
    serious consequences including loss of the entire pool. I experienced
    this myself several weeks ago and I was just barely able to manage
    recovery without reinstalling the whole system – it involved deleting
    and re-creating the thin-pool, then restoring all the volumes from
    backup.


Ouch!

I’m going to add an Issue/Feature request to add metadata store monitoring and alerts to the disk space widget. :)

—-

I will note that the docs indicate that lvcreate uses the pool allocation size divided by the chunk size times a multiplier to determine the default metadata store size (assuming you don’t override the final value). So if you specify the chunk size the “default” metadata store is *supposed* to scale...

One can also specify a safer (larger) metadata store during lvcreate at the expense of file storage of course.

Based on my experience (two metadata meltdowns since moving to Qubes 4) I would open another issue to have Qubes double or triple the system's default metadata size after installation. Proportionally, the loss of data space is small and its easy to implement using 'lvresize --poolmetadatasize'.


I ran across a discussion of chunk size guidance and one thing I’ll note is that for heavy COW workloads the recommendation was to keep the chunk size value at the low end but be sure to increase the metadata store size. I’ll see if I can find it in my browser history.

    Run the 'lvs' command and look at the Meta% column for pool00. If its
    much more than 50% there is reason for concern, because if you put the
    system through a flurry of activity including cloning/snapshotting
    and/or modifying many small files then that figure could balloon close
    to 100% in a very short period.


Will do!

In the end I am just puzzled why the default chunk is 256k and not 64k, though. I haven’t found a place in the qubes installer iso source where the size is overriden.

64k is the minimum but this increases when the pool size reaches certain thresholds. On my system, its 128k. As for Redhat switching to such a large (2MB) minimum size, I think it should be regarded as throwing up one's hands and giving up on the subject. IMO, its too large and shouldn't be used.

FWIW, Redhat's new COW storage system is a frankenstein patchwork using xfs volumes like some kind of block layer. It looks about as elegant and comprehensible as their other gift to the world, systemd. They need to hire better engineers.

I think the only _good_ way to deal with COW metadata expansion, since its always related to data fragmentation, is to keep expanding it and let system performance degrade accordingly. This simply makes de-fragmentation maintenance issue (defrag to shrink metadata and get performance back). This is what Microsoft did with NTFS and it was the right choice; clinging to fixed metadata sizes is merely a state of denial that leads to peoples' disks suddenly becoming unusable.


I also ran across docs from red hat saying the the 7.4 to 7.5 rhel transition moved from a default of 64KB to 2MB (possibly due to upstream?)...so discard on delete’s usefulness inside VMs may be even more constrained in the future if I read that right.

Its a good bet that "upstream" in this case is Redhat.


I’ll probably open a feature ticket asking for auto fstrim of the mounted rw filesystems on templates/templated VM shutdowns. As it is, I already do this manually on templates after every update and from time to time in VMs that see a lot of file churn.

--

Chris Laprise, [email protected]
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB  4AB3 1DC4 D106 F07F 1886

--
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/8d3d6bf8-48fb-20dc-8b0b-937b22497e2b%40posteo.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to