Hi Stefan, I tested this series with the vDPA block simulator in Linux v6.0. It worked well, but I had a segfault when blkio_map_mem_region() fails.
In my case, it failed because I forgot to increase the `memlock` limit with `ulimit -l unlimited` since the simulator still requires locking all the VM memory. QEMU failed with this messages: $ qemu-system-x86_64 ... \ -blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60bfe00000 with size 536870912: Bad address (os error 14) qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60be400000 with size 16777216: Bad address (os error 14) [1] 20803 segmentation fault IIUC this could be related to a double call to ram_block_notifier_remove() in the error path of ram_block_added() (block/block-ram-registrar.c) callback. Maybe we should call blk_register_buf() only if r->ok is true. The stack trace is the following: #0 0x00005641a8b097dd in ram_block_notifier_remove (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:850 #1 0x00005641a89c4701 in ram_block_added (n=0x5641ab8354d8, host=<optimized out>, size=<optimized out>, max_size=<optimized out>) at ../block/block-ram-registrar.c:20 #2 0x00005641a8b083af in ram_block_notify_add_single (rb=0x5641ab3d2810, opaque=0x5641ab8354d8) at ../hw/core/numa.c:820 #3 0x00005641a8b92d8d in qemu_ram_foreach_block (func=func@entry=0x5641a8b08370 <ram_block_notify_add_single>, opaque=opaque@entry=0x5641ab8354d8) at ../softmmu/physmem.c:3571 #4 0x00005641a8b097af in ram_block_notifier_add (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:844 #5 0x00005641a89c474f in blk_ram_registrar_init (r=r@entry=0x5641ab8354d0, blk=<optimized out>) at ../block/block-ram-registrar.c:46 #6 0x00005641a8affe88 in virtio_blk_device_realize (dev=0x5641ab835230, errp=0x7ffc72657190) at ../hw/block/virtio-blk.c:1239 #7 0x00005641a8b4b716 in virtio_device_realize (dev=0x5641ab835230, errp=0x7ffc726571f0) at ../hw/virtio/virtio.c:3684 #8 0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc72657380) at ../hw/core/qdev.c:553 #9 0x00005641a8c24738 in property_set_bool (obj=0x5641ab835230, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc72657380) at ../qom/object.c:2273 #10 0x00005641a8c27683 in object_property_set (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", v=v@entry=0x5641ab83f5b0, errp=errp@entry=0x7ffc72657380) at ../qom/object.c:1408 #11 0x00005641a8c2a97f in object_property_set_qobject (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", value=value@entry=0x5641ab83f4f0, errp=errp@entry=0x7ffc72657380) at ../qom/qom-qobject.c:28 #12 0x00005641a8c27c84 in object_property_set_bool (obj=0x5641ab835230, name=0x5641a8e9719a "realized", value=<optimized out>, errp=0x7ffc72657380) at ../qom/object.c:1477 #13 0x00005641a8929b74 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2218 #14 0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc726575b0) at ../hw/core/qdev.c:553 #15 0x00005641a8c24738 in property_set_bool (obj=0x5641ab82ce70, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc726575b0) at ../qom/object.c:2273 #16 0x00005641a8c27683 in object_property_set Thanks, Stefano On Tue, Sep 27, 2022 at 9:34 PM Stefan Hajnoczi <stefa...@redhat.com> wrote: > > v5: > - Drop "RFC" since libblkio 1.0 has been released and the library API is > stable > - Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The > bounce buffer slow path is taken when there are not enough blkio_mem_regions > to cover guest RAM. [Hanna & David Hildenbrand] > - Call ram_block_discard_disable() when mem-region-pinned property is true or > absent [David Hildenbrand] > - Use a bounce buffer pool instead of allocating/freeing a buffer for each > request. This reduces the number of blkio_mem_regions required for bounce > buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls. > - Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce > buffer > pool's CoQueue. > v4: > - Patch 1: > - Add virtio-blk-vhost-user driver [Kevin] > - Drop .bdrv_parse_filename() and .bdrv_needs_filename for > virtio-blk-vhost-vdpa [Stefano] > - Add copyright and license header [Hanna] > - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna] > - Clarify that "filename" is always non-NULL for io_uring [Hanna] > - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna] > - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna] > - Use macros for driver names [Hanna] > - Assert that the driver name is valid [Hanna] > - Update "readonly" property name to "read-only" [Hanna] > - Call blkio_detach_aio_context() in blkio_close() [Hanna] > - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna] > - Remove write zeroes and discard from the todo list [Hanna] > - Use PRIu32 instead of %d for uint32_t [Hanna] > - Fix error messages with buf-alignment instead of optimal-io-size [Hanna] > - Call map/unmap APIs since libblkio alloc/free APIs no longer do that > - Update QAPI schema QEMU version to 7.2 > - Patch 5: > - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop > assert(!flags) > in drivers [Hanna] > - Patch 7: > - Fix BLK->BDRV typo [Hanna] > - Make BlockRAMRegistrar handle failure [Hanna] > - Patch 8: > - Replace memory_region_get_fd() approach with qemu_ram_get_fd() > - Patch 10: > - Use (void)ret; to discard unused return value [Hanna] > - libblkio's blkio_unmap_mem_region() API no longer has a return value > - Check for registered bufs that cross RAMBlocks [Hanna] > - Patch 11: > - Handle bdrv_register_buf() errors [Hanna] > v3: > - Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE > - Add discard and write zeroes support > - Rebase and adopt latest libblkio APIs > v2: > - Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano] > - Use new blkioq_get_num_completions() API > - Implement .bdrv_refresh_limits() > > This patch series adds a QEMU BlockDriver for libblkio > (https://gitlab.com/libblkio/libblkio/), a library for high-performance block > device I/O. This work was presented at KVM Forum 2022 and slides are available > here: > https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf > > The second patch adds the core BlockDriver and most of the libblkio API usage. > Three libblkio drivers are included: > - io_uring > - virtio-blk-vhost-user > - virtio-blk-vhost-vdpa > > The remainder of the patch series reworks the existing QEMU > bdrv_register_buf() > API so virtio-blk emulation efficiently map guest RAM for libblkio - some > libblkio drivers require that I/O buffer memory is pre-registered (think VFIO, > vhost, etc). > > Vladimir requested performance results that show the effect of the > BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's > vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead > due > to bounce buffer allocation/mapping: > > Name IOPS Error > bounce-buf 4373.81 ± 0.01% > registered-buf 13062.80 ± 0.67% > > The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster. > > See the BlockDriver struct in block/blkio.c for a list of APIs that still need > to be implemented. The core functionality is covered. > > Regarding the design: each libblkio driver is a separately named BlockDriver. > That means there is an "io_uring" BlockDriver and not a generic "libblkio" > BlockDriver. This way QAPI and open parameters are type-safe and mandatory > parameters can be checked by QEMU. > > Stefan Hajnoczi (12): > coroutine: add flag to re-queue at front of CoQueue > blkio: add libblkio block driver > numa: call ->ram_block_removed() in ram_block_notifer_remove() > block: pass size to bdrv_unregister_buf() > block: use BdrvRequestFlags type for supported flag fields > block: add BDRV_REQ_REGISTERED_BUF request flag > block: return errors from bdrv_register_buf() > block: add BlockRAMRegistrar > exec/cpu-common: add qemu_ram_get_fd() > stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd() > blkio: implement BDRV_REQ_REGISTERED_BUF optimization > virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint > > MAINTAINERS | 7 + > meson_options.txt | 2 + > qapi/block-core.json | 53 +- > meson.build | 9 + > include/block/block-common.h | 9 + > include/block/block-global-state.h | 10 +- > include/block/block_int-common.h | 15 +- > include/exec/cpu-common.h | 1 + > include/hw/virtio/virtio-blk.h | 2 + > include/qemu/coroutine.h | 15 +- > include/sysemu/block-backend-global-state.h | 4 +- > include/sysemu/block-ram-registrar.h | 37 + > block.c | 14 + > block/blkio.c | 1017 +++++++++++++++++++ > block/blkverify.c | 4 +- > block/block-backend.c | 8 +- > block/block-ram-registrar.c | 54 + > block/crypto.c | 4 +- > block/file-posix.c | 1 - > block/gluster.c | 1 - > block/io.c | 101 +- > block/mirror.c | 2 + > block/nbd.c | 1 - > block/nvme.c | 20 +- > block/parallels.c | 1 - > block/qcow.c | 2 - > block/qed.c | 1 - > block/raw-format.c | 2 + > block/replication.c | 1 - > block/ssh.c | 1 - > block/vhdx.c | 1 - > hw/block/virtio-blk.c | 39 +- > hw/core/numa.c | 17 + > qemu-img.c | 6 +- > softmmu/physmem.c | 5 + > stubs/physmem.c | 13 + > tests/qtest/modules-test.c | 3 + > util/qemu-coroutine-lock.c | 9 +- > util/vfio-helpers.c | 5 +- > block/meson.build | 2 + > scripts/meson-buildoptions.sh | 3 + > stubs/meson.build | 1 + > 42 files changed, 1412 insertions(+), 91 deletions(-) > create mode 100644 include/sysemu/block-ram-registrar.h > create mode 100644 block/blkio.c > create mode 100644 block/block-ram-registrar.c > create mode 100644 stubs/physmem.c > > -- > 2.37.3 >