This series add a new `aio-max-batch` parameter to IOThread, and use it in the Linux AIO backend to limit the batch size (number of request submitted to the kernel through io_submit(2)).
Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase the number of in-flight requests. But this change also increased the potential maximum batch to 1024 elements. The problem is noticeable when we have a lot of requests in flight and multiple queues attached to the same AIO context. In this case we potentially create very large batches. Instead, when we have a single queue, the batch is limited because when the queue is unplugged, there is a call to io_submit(2). In practice, io_submit(2) was called only when there are no more queues plugged in or when we fill the AIO queue (MAX_EVENTS = 1024). I run some benchmarks to choose 32 as default batch value for Linux AIO. Below the kIOPS measured with fio running in the guest (average over 3 runs): | master | with this series applied | |687f9f7834e| maxbatch=8|maxbatch=16|maxbatch=32|maxbatch=64| # queues | 1q | 4qs | 1q | 4qs | 1q | 4qs | 1q | 4qs | 1q | 4qs | -- randread tests -|-----------------------------------------------------------| bs=4k iodepth=1 | 193 | 188 | 204 | 198 | 194 | 202 | 201 | 213 | 195 | 201 | bs=4k iodepth=8 | 241 | 265 | 247 | 248 | 249 | 250 | 257 | 269 | 270 | 240 | bs=4k iodepth=64 | 216 | 202 | 257 | 269 | 269 | 256 | 258 | 271 | 254 | 251 | bs=4k iodepth=128 | 212 | 177 | 267 | 253 | 285 | 271 | 245 | 281 | 255 | 269 | bs=16k iodepth=1 | 130 | 133 | 137 | 137 | 130 | 130 | 130 | 130 | 130 | 130 | bs=16k iodepth=8 | 130 | 137 | 144 | 137 | 131 | 130 | 131 | 131 | 130 | 131 | bs=16k iodepth=64 | 130 | 104 | 137 | 134 | 131 | 128 | 131 | 128 | 137 | 128 | bs=16k iodepth=128 | 130 | 101 | 137 | 134 | 131 | 129 | 131 | 129 | 138 | 129 | 1q = virtio-blk device with a single queue 4qs = virito-blk device with multi queues (one queue per vCPU - 4) I reported only the most significant tests, but I also did other tests to make sure there were no regressions, here the full report: https://docs.google.com/spreadsheets/d/11X3_5FJu7pnMTlf4ZatRDvsnU9K3EPj6Mn3aJIsE4tI Test environment: - Disk: Intel Corporation NVMe Datacenter SSD [Optane] - CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz - QEMU: qemu-system-x86_64 -machine q35,accel=kvm -smp 4 -m 4096 \ ... \ -object iothread,id=iothread0,aio-max-batch=${MAX_BATCH} \ -device virtio-blk-pci,iothread=iothread0,num-queues=${NUM_QUEUES} - benchmark: fio --ioengine=libaio --thread --group_reporting \ --number_ios=200000 --direct=1 --filename=/dev/vdb \ --rw=${TEST} --bs=${BS} --iodepth=${IODEPTH} --numjobs=16 Next steps: - benchmark io_uring and use `aio-max-batch` also there - make MAX_EVENTS parametric adding a new `aio-max-events` parameter Comments and suggestions are welcome :-) Thanks, Stefano Stefano Garzarella (3): iothread: generalize iothread_set_param/iothread_get_param iothread: add aio-max-batch parameter linux-aio: limit the batch size using `aio-max-batch` parameter qapi/misc.json | 6 ++- qapi/qom.json | 7 +++- include/block/aio.h | 12 ++++++ include/sysemu/iothread.h | 3 ++ block/linux-aio.c | 6 ++- iothread.c | 82 ++++++++++++++++++++++++++++++++++----- monitor/hmp-cmds.c | 2 + util/aio-posix.c | 12 ++++++ util/aio-win32.c | 5 +++ util/async.c | 2 + qemu-options.hx | 8 +++- 11 files changed, 131 insertions(+), 14 deletions(-) -- 2.31.1