On Fri, May 09, 2025 at 03:40:28PM -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated (except when explicitly punching holes, then merely
> reading zero is not enough to know if it is sparse, so we still want
> to punch the hole).  Avoiding a redundant write to zero (whether in
> the background because the zero cluster was marked in the dirty
> bitmap, or in the foreground because the guest is writing zeroes) when
> the destination already reads as zero makes mirroring faster, and
> avoids allocating the destination merely because the source reports as
> allocated.
> 
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero.  But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
> 
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), and the guest can change whether a given cluster
> reads as zero, is discarded, or has non-zero data over the course of
> the mirror operation, we can't take the shortcut of relying on
> s->target_is_zero (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale.  Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
> 
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear.  Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
> 
> So this patch takes a slightly different approach: any time we have to
> track dirty clusters, we can also track which clusters are known to
> read as zero.  For sync=TOP or when we are punching holes from
> "detect-zeroes":"unmap", the zero bitmap starts out empty, but
> prevents a second write zero to a cluster that was already zero by an
> earlier pass; for sync=FULL when we are not punching holes, the zero
> bitmap starts out full if the destination reads as zero during
> initialization.  Either way, I/O to the destination can now avoid
> redundant write zero to a cluster that already reads as zero, all
> without having to do a block_status() per write on the destination.
> 
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.  Meanwhile, a
> destination image that is already fully allocated remains so unless it
> was opened with "detect-zeroes": "unmap".  And any time writing zeroes
> is skipped, the job counters are not incremented.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> 
> ---
> 
> v3: also skip counters when skipping I/O [Sunny]
> v4: rebase to earlier changes
> ---
>  block/mirror.c | 107 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 93 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>

Attachment: signature.asc
Description: PGP signature

Reply via email to