Pinging, as Stefan's branch merged into master and now these series based on master.
16.08.2019 18:30, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > Here is an asynchronous scheme for handling fragmented qcow2 > reads and writes. Both qcow2 read and write functions loops through > sequential portions of data. The series aim it to parallelize these > loops iterations. > It improves performance for fragmented qcow2 images, I've tested it > as described below. > > v4 [perf results not updated]: > 01: new patch. Unrelated, but need to fix 026 before the series to > correctly fix it after :) > 02: - use coroutine_fn where appropriate (i.e. in aio_task_pool_new too) > - add Max's r-b > 03,04: add Max's r-b > 05: fix 026 output > > v3 (by Max's comments) [perf results not updated]: > > 01: - use coroutine_fn where appropriate !!!!!!!!!!!!!!!!!!!!!!! > - add aio_task_pool_free > - add some comments > - move header to include/block > - s/wait_done/waiting > 02: - Rewrite note about decryption in guest buffers [thx to Eric] > - separate g_assert_not_reached for QCOW2_CLUSTER_ZERO_* > - drop return after g_assert_not_reached > 03: - drop bytes_done and correctly use qiov_offset > - fix comment > 04: - move QCOW2_MAX_WORKERS to block/qcow2.h > - initialize ret in qcow2_co_preadv_part > Based-on: https://github.com/stefanha/qemu/commits/block > > > v2: changed a lot, as > 1. a lot of preparations around locks, hd_qiovs, threads for encryption > are done > 2. I decided to create separate file with async request handling API, to > reuse it for backup, stream and copy-on-read to improve their performance > too. Mirror and qemu-img convert has their own async request handling, > may be we'll be able finally merge all these similar code into one > feature. > Note that not all API calls used in qcow2, some will be needed on > following steps for parallelizing other io loops. > > About testing: > > I have four 4G qcow2 images (with default 64k block size) on my ssd disk: > t-seq.qcow2 - sequentially written qcow2 image > t-reverse.qcow2 - filled by writing 64k portions from end to the start > t-rand.qcow2 - filled by writing 64k portions (aligned) in random order > t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters > (see source code of image generation in the end for details) > > and I've done several runs like the following (sequential io by 1mb chunks): > > out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr > in {"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n > -s 1m -t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >> > $out; done; done > > > short info about parameters: > -w - do writes (otherwise do reads) > -c - count of blocks > -s - block size > -t none - disable cache > -n - native aio > -d 1 - don't use parallel requests provided by qemu-img bench itself > > results: > > +---------------------------+---------+---------+ > | file | master | async | > +---------------------------+---------+---------+ > | /ssd/t-part-rand.qcow2 | 14.671 | 9.193 | > +---------------------------+---------+---------+ > | /ssd/t-part-rand.qcow2 -w | 11.434 | 8.621 | > +---------------------------+---------+---------+ > | /ssd/t-rand.qcow2 | 20.421 | 10.05 | > +---------------------------+---------+---------+ > | /ssd/t-rand.qcow2 -w | 11.097 | 8.915 | > +---------------------------+---------+---------+ > | /ssd/t-reverse.qcow2 | 17.515 | 9.407 | > +---------------------------+---------+---------+ > | /ssd/t-reverse.qcow2 -w | 11.255 | 8.649 | > +---------------------------+---------+---------+ > | /ssd/t-seq.qcow2 | 9.081 | 9.072 | > +---------------------------+---------+---------+ > | /ssd/t-seq.qcow2 -w | 8.761 | 8.747 | > +---------------------------+---------+---------+ > | /tmp/t-part-rand.qcow2 | 41.179 | 41.37 | > +---------------------------+---------+---------+ > | /tmp/t-part-rand.qcow2 -w | 54.097 | 55.323 | > +---------------------------+---------+---------+ > | /tmp/t-rand.qcow2 | 711.899 | 514.339 | > +---------------------------+---------+---------+ > | /tmp/t-rand.qcow2 -w | 546.259 | 642.114 | > +---------------------------+---------+---------+ > | /tmp/t-reverse.qcow2 | 86.065 | 96.522 | > +---------------------------+---------+---------+ > | /tmp/t-reverse.qcow2 -w | 46.557 | 48.499 | > +---------------------------+---------+---------+ > | /tmp/t-seq.qcow2 | 33.804 | 33.862 | > +---------------------------+---------+---------+ > | /tmp/t-seq.qcow2 -w | 34.299 | 34.233 | > +---------------------------+---------+---------+ > > > Performance gain is obvious, especially for read and especially for ssd. > For hdd there is a degradation for reverse case, but this is the most > impossible case and seems not critical. > > How images are generated: > > ==== gen-writes ====== > #!/usr/bin/env python > import random > import sys > > size = 4 * 1024 * 1024 * 1024 > block = 64 * 1024 > block2 = 1024 * 1024 > > arg = sys.argv[1] > > if arg in ('rand', 'reverse', 'seq'): > writes = list(range(0, size, block)) > > if arg == 'rand': > random.shuffle(writes) > elif arg == 'reverse': > writes.reverse() > elif arg == 'part-rand': > writes = [] > for off in range(0, size, block2): > wr = list(range(off, off + block2, block)) > random.shuffle(wr) > writes.extend(wr) > elif arg != 'seq': > sys.exit(1) > > for w in writes: > print 'write -P 0xff {} {}'.format(w, block) > > print 'q' > ========================== > > ===== gen-test-images.sh ===== > #!/bin/bash > > IMG_PATH=/ssd > > for name in seq reverse rand part-rand; do > IMG=$IMG_PATH/t-$name.qcow2 > echo createing $IMG ... > rm -f $IMG > qemu-img create -f qcow2 $IMG 4G > gen-writes $name | qemu-io $IMG > done > ============================== > > > Vladimir Sementsov-Ogievskiy (5): > tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache > block: introduce aio task pool > block/qcow2: refactor qcow2_co_preadv_part > block/qcow2: refactor qcow2_co_pwritev_part > block/qcow2: introduce parallel subrequest handling in read and write > > qapi/block-core.json | 2 +- > block/qcow2.h | 3 + > include/block/aio_task.h | 54 ++++ > block/aio_task.c | 124 ++++++++ > block/qcow2.c | 461 +++++++++++++++++++---------- > block/Makefile.objs | 2 + > block/trace-events | 1 + > tests/qemu-iotests/026.out | 18 +- > tests/qemu-iotests/026.out.nocache | 188 ++++++------ > 9 files changed, 591 insertions(+), 262 deletions(-) > create mode 100644 include/block/aio_task.h > create mode 100644 block/aio_task.c > -- Best regards, Vladimir