On Fri, May 09, 2025 at 03:40:28PM -0500, Eric Blake wrote: > When mirroring, the goal is to ensure that the destination reads the > same as the source; this goal is met whether the destination is sparse > or fully-allocated (except when explicitly punching holes, then merely > reading zero is not enough to know if it is sparse, so we still want > to punch the hole). Avoiding a redundant write to zero (whether in > the background because the zero cluster was marked in the dirty > bitmap, or in the foreground because the guest is writing zeroes) when > the destination already reads as zero makes mirroring faster, and > avoids allocating the destination merely because the source reports as > allocated. > > The effect is especially pronounced when the source is a raw file. > That's because when the source is a qcow2 file, the dirty bitmap only > visits the portions of the source that are allocated, which tend to be > non-zero. But when the source is a raw file, > bdrv_co_is_allocated_above() reports the entire file as allocated so > mirror_dirty_init sets the entire dirty bitmap, and it is only later > during mirror_iteration that we change to consulting the more precise > bdrv_co_block_status_above() to learn where the source reads as zero. > > Remember that since a mirror operation can write a cluster more than > once (every time the guest changes the source, the destination is also > changed to keep up), and the guest can change whether a given cluster > reads as zero, is discarded, or has non-zero data over the course of > the mirror operation, we can't take the shortcut of relying on > s->target_is_zero (which is static for the life of the job) in > mirror_co_zero() to see if the destination is already zero, because > that information may be stale. Any solution we use must be dynamic in > the face of the guest writing or discarding a cluster while the mirror > has been ongoing. > > We could just teach mirror_co_zero() to do a block_status() probe of > the destination, and skip the zeroes if the destination already reads > as zero, but we know from past experience that extra block_status() > calls are not always cheap (tmpfs, anyone?), especially when they are > random access rather than linear. Use of block_status() of the source > by the background task in a linear fashion is not our bottleneck (it's > a background task, after all); but since mirroring can be done while > the source is actively being changed, we don't want a slow > block_status() of the destination to occur on the hot path of the > guest trying to do random-access writes to the source. > > So this patch takes a slightly different approach: any time we have to > track dirty clusters, we can also track which clusters are known to > read as zero. For sync=TOP or when we are punching holes from > "detect-zeroes":"unmap", the zero bitmap starts out empty, but > prevents a second write zero to a cluster that was already zero by an > earlier pass; for sync=FULL when we are not punching holes, the zero > bitmap starts out full if the destination reads as zero during > initialization. Either way, I/O to the destination can now avoid > redundant write zero to a cluster that already reads as zero, all > without having to do a block_status() per write on the destination. > > With this patch, if I create a raw sparse destination file, connect it > with QMP 'blockdev-add' while leaving it at the default "discard": > "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the > destination remains sparse rather than fully allocated. Meanwhile, a > destination image that is already fully allocated remains so unless it > was opened with "detect-zeroes": "unmap". And any time writing zeroes > is skipped, the job counters are not incremented. > > Signed-off-by: Eric Blake <ebl...@redhat.com> > > --- > > v3: also skip counters when skipping I/O [Sunny] > v4: rebase to earlier changes > --- > block/mirror.c | 107 ++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 93 insertions(+), 14 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature