On 5/25/19 12:45 PM, Brendan Hoar wrote:
On Sat, May 25, 2019 at 12:09 PM Chris Laprise <[email protected]
<mailto:[email protected]>> wrote:
It would be interesting if thin-lvm min transfer were the reason for
this difference in behavior between fstrim and the filesystem.
Indeed. Pretty sure that is the case for some workloads.
However, I think you're wrong to assume that any free block at any
scale
should be discarded at the lvm level. This behavior is probably a
feature designed to prevent pool metadata use from exploding to the
point where the volume becomes slow or unmanageable. Controlling
metadata size is a serious issue with COW storage systems and at some
point compromises must be made between data efficiency and metadata
efficiency.
Agreed. I started with that assumption but as I read through the docs I
realized there was some performance-related balancing going on.
On thin-lvm volumes, maxing-out the allocated metadata space can have
serious consequences including loss of the entire pool. I experienced
this myself several weeks ago and I was just barely able to manage
recovery without reinstalling the whole system – it involved deleting
and re-creating the thin-pool, then restoring all the volumes from
backup.
Ouch!
I’m going to add an Issue/Feature request to add metadata store
monitoring and alerts to the disk space widget. :)
—-
I will note that the docs indicate that lvcreate uses the pool
allocation size divided by the chunk size times a multiplier to
determine the default metadata store size (assuming you don’t override
the final value). So if you specify the chunk size the “default”
metadata store is *supposed* to scale...
One can also specify a safer (larger) metadata store during lvcreate at
the expense of file storage of course.
Based on my experience (two metadata meltdowns since moving to Qubes 4)
I would open another issue to have Qubes double or triple the system's
default metadata size after installation. Proportionally, the loss of
data space is small and its easy to implement using 'lvresize
--poolmetadatasize'.
I ran across a discussion of chunk size guidance and one thing I’ll note
is that for heavy COW workloads the recommendation was to keep the chunk
size value at the low end but be sure to increase the metadata store
size. I’ll see if I can find it in my browser history.
Run the 'lvs' command and look at the Meta% column for pool00. If its
much more than 50% there is reason for concern, because if you put the
system through a flurry of activity including cloning/snapshotting
and/or modifying many small files then that figure could balloon close
to 100% in a very short period.
Will do!
In the end I am just puzzled why the default chunk is 256k and not 64k,
though. I haven’t found a place in the qubes installer iso source where
the size is overriden.
64k is the minimum but this increases when the pool size reaches certain
thresholds. On my system, its 128k. As for Redhat switching to such a
large (2MB) minimum size, I think it should be regarded as throwing up
one's hands and giving up on the subject. IMO, its too large and
shouldn't be used.
FWIW, Redhat's new COW storage system is a frankenstein patchwork using
xfs volumes like some kind of block layer. It looks about as elegant and
comprehensible as their other gift to the world, systemd. They need to
hire better engineers.
I think the only _good_ way to deal with COW metadata expansion, since
its always related to data fragmentation, is to keep expanding it and
let system performance degrade accordingly. This simply makes
de-fragmentation maintenance issue (defrag to shrink metadata and get
performance back). This is what Microsoft did with NTFS and it was the
right choice; clinging to fixed metadata sizes is merely a state of
denial that leads to peoples' disks suddenly becoming unusable.
I also ran across docs from red hat saying the the 7.4 to 7.5 rhel
transition moved from a default of 64KB to 2MB (possibly due to
upstream?)...so discard on delete’s usefulness inside VMs may be even
more constrained in the future if I read that right.
Its a good bet that "upstream" in this case is Redhat.
I’ll probably open a feature ticket asking for auto fstrim of the
mounted rw filesystems on templates/templated VM shutdowns. As it is, I
already do this manually on templates after every update and from time
to time in VMs that see a lot of file churn.
--
Chris Laprise, [email protected]
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB 4AB3 1DC4 D106 F07F 1886
--
You received this message because you are subscribed to the Google Groups
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/qubes-users/8d3d6bf8-48fb-20dc-8b0b-937b22497e2b%40posteo.net.
For more options, visit https://groups.google.com/d/optout.