Re: [PATCH v3 0/6] block: add blk_io_plug_call() API
On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote: > v3 > - Patch 5: Mention why dev_max_batch condition was dropped [Stefano] > v2 > - Patch 1: "is not be freed" -> "is not freed" [Eric] > - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event > [Stefano] > - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano] > - Patch 4: Removed whitespace hunk [Eric] > > The existing blk_io_plug() API is not block layer multi-queue friendly because > the plug state is per-BlockDriverState. > > Change blk_io_plug()'s implementation so it is thread-local. This is done by > introducing the blk_io_plug_call() function that block drivers use to batch > calls while plugged. It is relatively easy to convert block drivers from > .bdrv_co_io_plug() to blk_io_plug_call(). > > Random read 4KB performance with virtio-blk on a host NVMe block device: > > iodepth iops change vs today > 145612 -4% > 287967 +2% > 4 129872 +0% > 8 171096 -3% > 16 194508 -4% > 32 208947 -1% > 64 217647 +0% > 128 229629 +0% > > The results are within the noise for these benchmarks. This is to be expected > because the plugging behavior for a single thread hasn't changed in this patch > series, only that the state is thread-local now. > > The following graph compares several approaches: > https://vmsplice.net/~stefan/blk_io_plug-thread-local.png > - v7.2.0: before most of the multi-queue block layer changes landed. > - with-blk_io_plug: today's post-8.0.0 QEMU. > - blk_io_plug-thread-local: this patch series. > - no-blk_io_plug: what happens when we simply remove plugging? > - call-after-dispatch: what if we integrate plugging into the event loop? I > decided against this approach in the end because it's more likely to > introduce performance regressions since I/O submission is deferred until the > end of the event loop iteration. > > Aside from the no-blk_io_plug case, which bottlenecks much earlier than the > others, we see that all plugging approaches are more or less equivalent in > this > benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0. > > The Ansible playbook, fio results, and a Jupyter notebook are available here: > https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug > > Stefan Hajnoczi (6): > block: add blk_io_plug_call() API > block/nvme: convert to blk_io_plug_call() API > block/blkio: convert to blk_io_plug_call() API > block/io_uring: convert to blk_io_plug_call() API > block/linux-aio: convert to blk_io_plug_call() API > block: remove bdrv_co_io_plug() API > > MAINTAINERS | 1 + > include/block/block-io.h | 3 - > include/block/block_int-common.h | 11 --- > include/block/raw-aio.h | 14 --- > include/sysemu/block-backend-io.h | 13 +-- > block/blkio.c | 43 > block/block-backend.c | 22 - > block/file-posix.c| 38 --- > block/io.c| 37 --- > block/io_uring.c | 44 - > block/linux-aio.c | 41 +++- > block/nvme.c | 44 +++-- > block/plug.c | 159 ++ > hw/block/dataplane/xen-block.c| 8 +- > hw/block/virtio-blk.c | 4 +- > hw/scsi/virtio-scsi.c | 6 +- > block/meson.build | 1 + > block/trace-events| 6 +- > 18 files changed, 239 insertions(+), 256 deletions(-) > create mode 100644 block/plug.c > > -- > 2.40.1 > Thanks, applied to my block tree: https://gitlab.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [PATCH v3 0/6] block: add blk_io_plug_call() API
Am 31.05.2023 um 21:50 hat Stefan Hajnoczi geschrieben: > Hi Kevin, > Do you want to review the thread-local blk_io_plug() patch series or > should I merge it? I haven't reviewed it in detail, but on the high level it looks good to me, and you already got reviews for the details. Acked-by: Kevin Wolf signature.asc Description: PGP signature
Re: [PATCH v3 0/6] block: add blk_io_plug_call() API
Hi Kevin, Do you want to review the thread-local blk_io_plug() patch series or should I merge it? Thanks, Stefan signature.asc Description: PGP signature
[PATCH v3 0/6] block: add blk_io_plug_call() API
v3 - Patch 5: Mention why dev_max_batch condition was dropped [Stefano] v2 - Patch 1: "is not be freed" -> "is not freed" [Eric] - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event [Stefano] - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano] - Patch 4: Removed whitespace hunk [Eric] The existing blk_io_plug() API is not block layer multi-queue friendly because the plug state is per-BlockDriverState. Change blk_io_plug()'s implementation so it is thread-local. This is done by introducing the blk_io_plug_call() function that block drivers use to batch calls while plugged. It is relatively easy to convert block drivers from .bdrv_co_io_plug() to blk_io_plug_call(). Random read 4KB performance with virtio-blk on a host NVMe block device: iodepth iops change vs today 145612 -4% 287967 +2% 4 129872 +0% 8 171096 -3% 16 194508 -4% 32 208947 -1% 64 217647 +0% 128 229629 +0% The results are within the noise for these benchmarks. This is to be expected because the plugging behavior for a single thread hasn't changed in this patch series, only that the state is thread-local now. The following graph compares several approaches: https://vmsplice.net/~stefan/blk_io_plug-thread-local.png - v7.2.0: before most of the multi-queue block layer changes landed. - with-blk_io_plug: today's post-8.0.0 QEMU. - blk_io_plug-thread-local: this patch series. - no-blk_io_plug: what happens when we simply remove plugging? - call-after-dispatch: what if we integrate plugging into the event loop? I decided against this approach in the end because it's more likely to introduce performance regressions since I/O submission is deferred until the end of the event loop iteration. Aside from the no-blk_io_plug case, which bottlenecks much earlier than the others, we see that all plugging approaches are more or less equivalent in this benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0. The Ansible playbook, fio results, and a Jupyter notebook are available here: https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug Stefan Hajnoczi (6): block: add blk_io_plug_call() API block/nvme: convert to blk_io_plug_call() API block/blkio: convert to blk_io_plug_call() API block/io_uring: convert to blk_io_plug_call() API block/linux-aio: convert to blk_io_plug_call() API block: remove bdrv_co_io_plug() API MAINTAINERS | 1 + include/block/block-io.h | 3 - include/block/block_int-common.h | 11 --- include/block/raw-aio.h | 14 --- include/sysemu/block-backend-io.h | 13 +-- block/blkio.c | 43 block/block-backend.c | 22 - block/file-posix.c| 38 --- block/io.c| 37 --- block/io_uring.c | 44 - block/linux-aio.c | 41 +++- block/nvme.c | 44 +++-- block/plug.c | 159 ++ hw/block/dataplane/xen-block.c| 8 +- hw/block/virtio-blk.c | 4 +- hw/scsi/virtio-scsi.c | 6 +- block/meson.build | 1 + block/trace-events| 6 +- 18 files changed, 239 insertions(+), 256 deletions(-) create mode 100644 block/plug.c -- 2.40.1