On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote:
> v3
> - Patch 5: Mention why dev_max_batch condition was dropped [Stefano]
> v2
> - Patch 1: "is not be freed" -> "is not freed" [Eric]
> - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
> [Stefano]
> - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
> - Patch 4: Removed whitespace hunk [Eric]
>
> The existing blk_io_plug() API is not block layer multi-queue friendly because
> the plug state is per-BlockDriverState.
>
> Change blk_io_plug()'s implementation so it is thread-local. This is done by
> introducing the blk_io_plug_call() function that block drivers use to batch
> calls while plugged. It is relatively easy to convert block drivers from
> .bdrv_co_io_plug() to blk_io_plug_call().
>
> Random read 4KB performance with virtio-blk on a host NVMe block device:
>
> iodepth iops change vs today
> 145612 -4%
> 287967 +2%
> 4 129872 +0%
> 8 171096 -3%
> 16 194508 -4%
> 32 208947 -1%
> 64 217647 +0%
> 128 229629 +0%
>
> The results are within the noise for these benchmarks. This is to be expected
> because the plugging behavior for a single thread hasn't changed in this patch
> series, only that the state is thread-local now.
>
> The following graph compares several approaches:
> https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
> - v7.2.0: before most of the multi-queue block layer changes landed.
> - with-blk_io_plug: today's post-8.0.0 QEMU.
> - blk_io_plug-thread-local: this patch series.
> - no-blk_io_plug: what happens when we simply remove plugging?
> - call-after-dispatch: what if we integrate plugging into the event loop? I
> decided against this approach in the end because it's more likely to
> introduce performance regressions since I/O submission is deferred until the
> end of the event loop iteration.
>
> Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
> others, we see that all plugging approaches are more or less equivalent in
> this
> benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.
>
> The Ansible playbook, fio results, and a Jupyter notebook are available here:
> https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug
>
> Stefan Hajnoczi (6):
> block: add blk_io_plug_call() API
> block/nvme: convert to blk_io_plug_call() API
> block/blkio: convert to blk_io_plug_call() API
> block/io_uring: convert to blk_io_plug_call() API
> block/linux-aio: convert to blk_io_plug_call() API
> block: remove bdrv_co_io_plug() API
>
> MAINTAINERS | 1 +
> include/block/block-io.h | 3 -
> include/block/block_int-common.h | 11 ---
> include/block/raw-aio.h | 14 ---
> include/sysemu/block-backend-io.h | 13 +--
> block/blkio.c | 43
> block/block-backend.c | 22 -
> block/file-posix.c| 38 ---
> block/io.c| 37 ---
> block/io_uring.c | 44 -
> block/linux-aio.c | 41 +++-
> block/nvme.c | 44 +++--
> block/plug.c | 159 ++
> hw/block/dataplane/xen-block.c| 8 +-
> hw/block/virtio-blk.c | 4 +-
> hw/scsi/virtio-scsi.c | 6 +-
> block/meson.build | 1 +
> block/trace-events| 6 +-
> 18 files changed, 239 insertions(+), 256 deletions(-)
> create mode 100644 block/plug.c
>
> --
> 2.40.1
>
Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block
Stefan
signature.asc
Description: PGP signature