Re: [PATCH] qcow2: add discard-no-unref option

Jean-Louis Dupond Fri, 26 May 2023 07:31:18 -0700

On 26/05/2023 15:31, Hanna Czenczek wrote:

On 15.05.23 09:36, Jean-Louis Dupond wrote:

When we for example have a sparse qcow2 image and discard: unmap isenabled,there can be a lot of fragmentation in the image after some time.Surely on VM's

that do a lot of writes/deletes.
This causes the qcow2 image to grow even over 110% of its virtual size,
because the free gaps in the image get to small to allocate new
continuous clusters. So it allocates new space as the end of the image.


Disabling discard is not an option, as discard is needed to keep the
incremental backup size as low as possible. Without discard, the
incremental backups would become large, as qemu thinks it's just dirty
blocks but it doesn't know the blocks are empty/useless.
So we need to avoid fragmentation but also 'empty' the useless blocks in
the image to have a small incremental backup.

Next to that we also want to send the discards futher down the stack, so
the underlying blocks are still discarded.

Therefor we introduce a new qcow2 option "discard-no-unref". When
setting this option to true (defaults to false), the discard requests
will still be executed, but it will keep the offset of the cluster. And
it will also pass the discard request further down the stack (if
discard:unmap is enabled).
This will avoid fragmentation and for example on a fully preallocated
qcow2 image, this will make sure the image is perfectly continuous.

I don’t follow how this patch is cleaner than the “block: Add zeroesdiscard option” patch. That patch’s diff stat is +14/-5, this one’sis 120+/57-.

Multiple reasons :)

- It's made for this use-case only, as there might be no other use-casesexcept this one so the scope for discard=zeroes might be to big- This one still handles the discards and passes it to the lower layer,which might be an important reason for the fact you enable discards- The diffstat is mostly bigger because of indention changes inupdate_refcount

As for better, I don’t want to discount that, but at the same time Idon’t know the reasoning for it. As far as I understand, this patchmakes qcow2 retain the cluster mapping as-is, and only discards on thelower layer. So effectively, instead of marking the cluster as zeroon the qcow2 level, we do so on the filesystem level. I’m not sure Isee all the implications of that difference.

We want to keep the cluster mapping to avoid creating holes in the qcow2cluster.

But we still want to do discards for 2 reasons:

- Mark the cluster as ZERO, so that incremental backups using dirtybitmaps can just skip this block- Discard the data on the lower layer for efficiency reasons (forexample if the lower layer has some dedup/compression/whatever), westill want the lower layer to know the block has been emptied

The advantage I see is that this free up disk space, which thediscard=zeroes wouldn’t do. I wonder whether that couldn’t be solvedwith an orthogonal qcow2-only option, though, which would have theqcow2 driver always discard zeroed clusters.

This is an option also indeed. But we will end up with a similar patch(also in size).

On the other hand, one thing to note is that we’ve had performanceproblems in the past with holes in the filesystem level (on tmpfs atleast). qemu generally can get the information on which clusters arezero quicker from qcow2 than from the filesystem, which suggests thateven if we want to discard something on the filesystem layer, weprobably also want to mark it as zero on the qcow2 layer.

This is what we do in discard_in_l2_slice, we mark the entry asQCOW_OFLAG_ZERO when we receive a discard request.

I also have a small concern about fragmentation on the filesystemlayer – if you use these options to prevent fragmentation, with thispatch, you’re only doing so on the qcow2 layer. Because the clusteris then discarded in the filesystem, it’s possible to getfragmentation there, which might not be desirable. I don’t thinkthat’s too important, but I think it’d just be nice to have aconfiguration in which the guest can tell qemu what areas it doesn’tcare about, qemu marks these as zero, so that backups are moreefficient; but at the same time, everything stays allocated, so nofragmentatoin is introduced.

That would be an additional option/improvement indeed. But it's not thatthis patch makes the fragmentation worse then it's already the case whenyou enable discard.If you really want this you might even want your lower level storage tojust ignore discards instead of fixing it in qcow2.


Hanna


Thanks for your review
Jean-Louis

Re: [PATCH] qcow2: add discard-no-unref option

Reply via email to