On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote: > 13.08.2019 17:57, Max Reitz wrote: >> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote: >>> 13.08.2019 17:23, Max Reitz wrote: >>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote: >>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote: >>>>>> 12.08.2019 19:11, Max Reitz wrote: >>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>> 12.08.2019 18:10, Max Reitz wrote: >>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let >>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number >>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster. >>>>>>>>>> >>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate >>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also >>>>>>>>>> allocate buffer of full-request length it may be too large, so >>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic >>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations >>>>>>>>>> at the point where we need the buffer for the first time. >>>>>>>>>> >>>>>>>>>> Bonus: get rid of pointer-to-pointer. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>>>>>>>> <vsement...@virtuozzo.com> >>>>>>>>>> --- >>>>>>>>>> block/backup.c | 65 >>>>>>>>>> +++++++++++++++++++++++++++++++------------------- >>>>>>>>>> 1 file changed, 41 insertions(+), 24 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/block/backup.c b/block/backup.c >>>>>>>>>> index d482d93458..65f7212c85 100644 >>>>>>>>>> --- a/block/backup.c >>>>>>>>>> +++ b/block/backup.c >>>>>>>>>> @@ -27,6 +27,7 @@ >>>>>>>>>> #include "qemu/error-report.h" >>>>>>>>>> #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16) >>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024) >>>>>>>>>> typedef struct CowRequest { >>>>>>>>>> int64_t start_byte; >>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req) >>>>>>>>>> qemu_co_queue_restart_all(&req->wait_queue); >>>>>>>>>> } >>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes >>>>>>>>>> copied. If >>>>>>>>>> - * error occurred, return a negative error number */ >>>>>>>>>> +/* >>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes >>>>>>>>>> copied. If >>>>>>>>>> + * error occurred, return a negative error number >>>>>>>>>> + * >>>>>>>>>> + * @bounce_buffer is assumed to enough to store >>>>>>>>> >>>>>>>>> s/to/to be/ >>>>>>>>> >>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes >>>>>>>>>> + */ >>>>>>>>>> static int coroutine_fn >>>>>>>>>> backup_cow_with_bounce_buffer(BackupBlockJob *job, >>>>>>>>>> int64_t >>>>>>>>>> start, >>>>>>>>>> int64_t >>>>>>>>>> end, >>>>>>>>>> bool >>>>>>>>>> is_write_notifier, >>>>>>>>>> bool >>>>>>>>>> *error_is_read, >>>>>>>>>> - void >>>>>>>>>> **bounce_buffer) >>>>>>>>>> + void >>>>>>>>>> *bounce_buffer) >>>>>>>>>> { >>>>>>>>>> int ret; >>>>>>>>>> BlockBackend *blk = job->common.blk; >>>>>>>>>> - int nbytes; >>>>>>>>>> + int nbytes, remaining_bytes; >>>>>>>>>> int read_flags = is_write_notifier ? >>>>>>>>>> BDRV_REQ_NO_SERIALISING : 0; >>>>>>>>>> assert(QEMU_IS_ALIGNED(start, job->cluster_size)); >>>>>>>>>> - bdrv_reset_dirty_bitmap(job->copy_bitmap, start, >>>>>>>>>> job->cluster_size); >>>>>>>>>> - nbytes = MIN(job->cluster_size, job->len - start); >>>>>>>>>> - if (!*bounce_buffer) { >>>>>>>>>> - *bounce_buffer = blk_blockalign(blk, job->cluster_size); >>>>>>>>>> - } >>>>>>>>>> + bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start); >>>>>>>>>> + nbytes = MIN(end - start, job->len - start); >>>>>>>>>> - ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, >>>>>>>>>> read_flags); >>>>>>>>>> - if (ret < 0) { >>>>>>>>>> - trace_backup_do_cow_read_fail(job, start, ret); >>>>>>>>>> - if (error_is_read) { >>>>>>>>>> - *error_is_read = true; >>>>>>>>>> + >>>>>>>>>> + remaining_bytes = nbytes; >>>>>>>>>> + while (remaining_bytes) { >>>>>>>>>> + int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes); >>>>>>>>>> + >>>>>>>>>> + ret = blk_co_pread(blk, start, chunk, bounce_buffer, >>>>>>>>>> read_flags); >>>>>>>>>> + if (ret < 0) { >>>>>>>>>> + trace_backup_do_cow_read_fail(job, start, ret); >>>>>>>>>> + if (error_is_read) { >>>>>>>>>> + *error_is_read = true; >>>>>>>>>> + } >>>>>>>>>> + goto fail; >>>>>>>>>> } >>>>>>>>>> - goto fail; >>>>>>>>>> - } >>>>>>>>>> - ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, >>>>>>>>>> - job->write_flags); >>>>>>>>>> - if (ret < 0) { >>>>>>>>>> - trace_backup_do_cow_write_fail(job, start, ret); >>>>>>>>>> - if (error_is_read) { >>>>>>>>>> - *error_is_read = false; >>>>>>>>>> + ret = blk_co_pwrite(job->target, start, chunk, >>>>>>>>>> bounce_buffer, >>>>>>>>>> + job->write_flags); >>>>>>>>>> + if (ret < 0) { >>>>>>>>>> + trace_backup_do_cow_write_fail(job, start, ret); >>>>>>>>>> + if (error_is_read) { >>>>>>>>>> + *error_is_read = false; >>>>>>>>>> + } >>>>>>>>>> + goto fail; >>>>>>>>>> } >>>>>>>>>> - goto fail; >>>>>>>>>> + >>>>>>>>>> + start += chunk; >>>>>>>>>> + remaining_bytes -= chunk; >>>>>>>>>> } >>>>>>>>>> return nbytes; >>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn >>>>>>>>>> backup_do_cow(BackupBlockJob *job, >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> if (!job->use_copy_range) { >>>>>>>>>> + if (!bounce_buffer) { >>>>>>>>>> + size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER, >>>>>>>>>> + MAX(dirty_end - start, end - >>>>>>>>>> dirty_end)); >>>>>>>>>> + bounce_buffer = blk_try_blockalign(job->common.blk, >>>>>>>>>> len); >>>>>>>>>> + } >>>>>>>>> >>>>>>>>> If you use _try_, you should probably also check whether it succeeded. >>>>>>>> >>>>>>>> Oops, you are right, of course. >>>>>>>> >>>>>>>>> >>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer >>>>>>>>> once per job (the first time we get here, probably) to be of size >>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob. (And maybe >>>>>>>>> add >>>>>>>>> a buf-size parameter similar to what the mirror jobs have.) >>>>>>>>> >>>>>>>> >>>>>>>> Once per job will not work, as we may have several guest writes in >>>>>>>> parallel and therefore >>>>>>>> several parallel copy-before-write operations. >>>>>>> >>>>>>> Hm. I’m not quite happy with that because if the guest just issues many >>>>>>> large discards in parallel, this means that qemu will allocate a large >>>>>>> amount of memory. >>>>>>> >>>>>>> It would be nice if there was a simple way to keep track of the total >>>>>>> memory usage and let requests yield if they would exceed it. >>>>>> >>>>>> Agree, it should be fixed anyway. >>>>>> >>>>> >>>>> >>>>> But still.. >>>>> >>>>> Synchronous mirror allocates full-request buffers on guest write. Is it >>>>> correct? >>>>> >>>>> If we assume that it is correct to double memory usage of guest >>>>> operations, than for backup >>>>> the problem is only in write_zero and discard where guest-assumed memory >>>>> usage should be zero. >>>> >>>> Well, but that is the problem. I didn’t say anything in v2, because I >>>> only thought of normal writes and I found it fine to double the memory >>>> usage there (a guest won’t issue huge write requests in parallel). But >>>> discard/write-zeroes are a different matter. >>>> >>>>> And if we should distinguish writes from write_zeroes and discard, it's >>>>> better to postpone this >>>>> improvement to be after backup-top filter merged. >>>> >>>> But do you need to distinguish it? Why not just keep track of memory >>>> usage and put the current I/O coroutine to sleep in a CoQueue or >>>> something, and wake that up at the end of backup_do_cow()? >>>> >>> >>> 1. Because if we _can_ allow doubling of memory, it's more effective to not >>> restrict allocations on >>> guest writes. It's just seems to be more effective technique. >> >> But the problem with backup and zero writes/discards is that the memory >> is not doubled. The request doesn’t need any memory, but the CBW >> operation does, and maybe lots of it. >> >> So the guest may issue many zero writes/discards in parallel and thus >> exhaust memory on the host. > > So this is the reason to separate writes from write-zeros/discrads. So at > least write will be happy. And I > think that write is more often request than write-zero/discard
But that makes it complicated for no practical gain whatsoever. >> >>> 2. Anyway, I'd allow some always-available size to allocate - let it be one >>> cluster, which will correspond >>> to current behavior and prevent guest io hang in worst case. >> >> The guest would only hang if it we have to copy more than e.g. 64 MB at >> a time. At which point I think it’s not unreasonable to sequentialize >> requests. Because of this. How is it bad to start sequentializing writes when the data exceeds 64 MB? Max
signature.asc
Description: OpenPGP digital signature