01.06.2020 21:10, Vladimir Sementsov-Ogievskiy wrote:
Hi all!This a last part of original "[RFC 00/24] backup performance: block_status + async", prepartions are already merged. The series turn backup into series of block_copy_async calls, covering the whole disk, so we get block-status based paralallel async requests out of the box, which gives performance gain: ----------------- ---------------- ------------- -------------------------- -------------------------- ---------------- ------------------------------- mirror(upstream) backup(new) backup(new, no-copy-range) backup(new, copy-range-1w) backup(upstream) backup(upstream, no-copy-range) hdd-ext4:hdd-ext4 18.86 +- 0.11 45.50 +- 2.35 19.22 +- 0.09 19.51 +- 0.09 22.85 +- 5.98 19.72 +- 0.35 hdd-ext4:ssd-ext4 8.99 +- 0.02 9.30 +- 0.01 8.97 +- 0.02 9.02 +- 0.02 9.68 +- 0.26 9.84 +- 0.12 ssd-ext4:hdd-ext4 9.09 +- 0.11 9.34 +- 0.10 9.34 +- 0.10 8.99 +- 0.01 11.37 +- 0.37 11.47 +- 0.30 ssd-ext4:ssd-ext4 4.07 +- 0.02 5.41 +- 0.05 4.05 +- 0.01 8.35 +- 0.58 9.83 +- 0.64 8.62 +- 0.35 hdd-xfs:hdd-xfs 18.90 +- 0.19 43.26 +- 2.47 19.62 +- 0.14 19.38 +- 0.16 19.55 +- 0.26 19.62 +- 0.12 hdd-xfs:ssd-xfs 8.93 +- 0.12 9.35 +- 0.03 8.93 +- 0.08 8.93 +- 0.05 9.79 +- 0.30 9.55 +- 0.15 ssd-xfs:hdd-xfs 9.15 +- 0.07 9.74 +- 0.28 9.29 +- 0.03 9.08 +- 0.05 10.85 +- 0.31 10.91 +- 0.30 ssd-xfs:ssd-xfs 4.06 +- 0.01 4.93 +- 0.02 4.04 +- 0.01 8.17 +- 0.42 9.52 +- 0.49 8.85 +- 0.46 ssd-ext4:nbd 9.96 +- 0.11 11.45 +- 0.15 11.45 +- 0.02 17.22 +- 0.06 34.45 +- 1.35 35.16 +- 0.37 nbd:ssd-ext4 9.84 +- 0.02 9.84 +- 0.04 9.80 +- 0.06 18.96 +- 0.06 30.89 +- 0.73 31.46 +- 0.21 ----------------- ---------------- ------------- -------------------------- -------------------------- ---------------- -------------------------------
I should add, that nbd results may be damaged by the fact that node with nbd server is my desktop, which was used for another tasks in parallel. Still I don't think it really hurt.
The table shows, that copy_range is in bad relation with parallel async requests. copy_range brings real performance gain only on supporting fs, like btrfs. But even on such fs, I'm not sure that this is a good default behavior: if we do offload copy, so, that no real copy but just link block in backup the same blocks as in original, this means that further write from guest will lead to fragmentation of guest disk, when the aim of backup is to operate transparently for the guest. So, in addition to these series I also suggest to disable copy_range by default. === How to test: prepare images: In a directories, where you want to place source and target images, prepare images by: for img in test-source test-target; do ./qemu-img create -f raw $img 1000M; ./qemu-img bench -c 1000 -d 1 -f raw -s 1M -w --pattern=0xff $img done prepare similar image for nbd server, and start it somewhere by qemu-nbd --persistent --nocache -f raw IMAGE Then, run benchmark, like this: ./bench-backup.py --qemu new:../../x86_64-softmmu/qemu-system-x86_64 upstream:/work/src/qemu/up-backup-block-copy-master/x86_64-softmmu/qemu-system-x86_64 --dir hdd-ext4:/test-a hdd-xfs:/test-b ssd-ext4:/ssd ssd-xfs:/ssd-b --test $(for fs in ext4 xfs; do echo hdd-$fs:hdd-$fs hdd-$fs:ssd-$fs ssd-$fs:hdd-$fs ssd-$fs:ssd-$fs; done) --nbd 192.168.100.2 --test ssd-ext4:nbd nbd:ssd-ext4 (you may simply reduce number of directories/test-cases, use --help for help) === Note, that I included here "[PATCH] block/block-copy: block_copy_dirty_clusters: fix failure check" which was previously sent in separate, but still untouched in mailing list. It still may be applied separately. Vladimir Sementsov-Ogievskiy (20): block/block-copy: block_copy_dirty_clusters: fix failure check iotests: 129 don't check backup "busy" qapi: backup: add x-use-copy-range parameter block/block-copy: More explicit call_state block/block-copy: implement block_copy_async block/block-copy: add max_chunk and max_workers parameters block/block-copy: add ratelimit to block-copy block/block-copy: add block_copy_cancel blockjob: add set_speed to BlockJobDriver job: call job_enter from job_user_pause qapi: backup: add x-max-chunk and x-max-workers parameters iotests: 56: prepare for backup over block-copy iotests: 129: prepare for backup over block-copy iotests: 185: prepare for backup over block-copy iotests: 219: prepare for backup over block-copy iotests: 257: prepare for backup over block-copy backup: move to block-copy block/block-copy: drop unused argument of block_copy() simplebench: bench_block_job: add cmd_options argument simplebench: add bench-backup.py qapi/block-core.json | 11 +- block/backup-top.h | 1 + include/block/block-copy.h | 45 +++- include/block/block_int.h | 8 + include/block/blockjob_int.h | 2 + block/backup-top.c | 6 +- block/backup.c | 170 ++++++++------ block/block-copy.c | 183 ++++++++++++--- block/replication.c | 1 + blockdev.c | 10 + blockjob.c | 6 + job.c | 1 + scripts/simplebench/bench-backup.py | 132 +++++++++++ scripts/simplebench/bench-example.py | 2 +- scripts/simplebench/bench_block_job.py | 13 +- tests/qemu-iotests/056 | 8 +- tests/qemu-iotests/129 | 3 +- tests/qemu-iotests/185 | 3 +- tests/qemu-iotests/185.out | 2 +- tests/qemu-iotests/219 | 13 +- tests/qemu-iotests/257 | 1 + tests/qemu-iotests/257.out | 306 ++++++++++++------------- 22 files changed, 640 insertions(+), 287 deletions(-) create mode 100755 scripts/simplebench/bench-backup.py
-- Best regards, Vladimir
