Vladimir, thank you for outlining the current state of affairs regarding 
efficient backup. I'd like to describe what we know about the 
image-expansion problem we're having using the current (qemu 4.2.0) code, 
just to be sure that your work is addressing it.

In our use case, the image-expansion problem occurs only when the source 
disk file and the target backup file are in different file systems. Both 
files are qcow2 files, and as long as they both reside in the same file 
system, the target file winds up with roughly the same size as the source. 
But if the target is in another file system (we've tried a second ext4 
hard disk file system, a tmpfs file system, and fuse-based file systems 
such as s3fs), the target ends up with a size comparable to the nominal 
size of the source disk.

I think the expansion is related to this comment in 
qemu/include/block/block.h:

/**
 * bdrv_co_copy_range:
. . . .
 * Note: block layer doesn't emulate or fallback to a bounce buffer 
approach
 * because usually the caller shouldn't attempt offloaded copy any more 
(e.g.
 * calling copy_file_range(2)) after the first error, thus it should fall 
back
 * to a read+write path in the caller level.

The bdrv_co_copy_range() service does the right things with respect to 
skipping unallocated ranges in the source disk and not writing zeros to 
the target. But qemu gives up on using this service the first time an 
underlying copy_file_range() system call fails, and copy_file_range() 
always fails with EXDEV when the source and destination files are on 
different file systems. In this specific case (at least), I think that 
falling back to a bounce buffer approach would make sense so that we don't 
lose the rest of the logic in bdrv_co_copy_range. As it is, qemu falls 
back on a very high-level loop reading from the source and writing to the 
target. At this high level, reading an unallocated range from the source 
simply returns a buffer full of zeroes, with no indication that the range 
was unallocated. The zeroes are then written to the target as if they were 
real data.

As a quick experiment, I tried a very localized fallback when 
copy_file_range returns EXDEV in handle_aiocb_copy_range() in 
qemu/block/file-posix.c. It's not a great fix because it has to allocate 
and free a buffer on the spot and it does not head off future calls to 
copy_file_range that will also fail, but it does fix the image-expansion 
problem when crossing file systems. I can provide the patch if anyone 
wants to see it.

I just wanted to get this aspect of the problem onto the table, to make 
sure it gets addressed in the current rework. Maybe it's a non-issue 
already.

- Bryan


Reply via email to