On 01/05/14 16:54, Stefan Hajnoczi wrote: > This patch series switches virtio-blk data-plane from a custom Linux AIO > request queue to the QEMU block layer. The previous "raw files only" > limitation is lifted. All image formats and protocols can now be used with > virtio-blk data-plane.
Nice. Is there a git branch somewhere, so that we can test this on s390? Christian > > How to review this series > ------------------------- > I CCed the maintainer of each block driver that I modified. You probably > don't > need to review the entire series, just your patch. > > From now on fd handlers, timers, BHs, and event loop wait must explicitly use > BlockDriverState's AioContext instead of the main loop. Use > bdrv_get_aio_context(bs) to get the AioContext. The following function calls > need to be converted: > > * qemu_aio_set_fd_handler() -> aio_set_fd_handler() > * timer_new*() -> aio_timer_new() > * qemu_bh_new() -> aio_bh_new() > * qemu_aio_wait() -> aio_poll(aio_context, true) > > For simple block drivers this modification suffices and it is now safe to use > outside the QEMU global mutex. > > Block drivers that keep fd handlers, timers, or BHs registered when requests > have been drained need a little bit more work. Examples of this are network > block drivers with keepalive timers, like iSCSI. > > This series adds a new bdrv_set_aio_context(bs, aio_context) function that > moves a BlockDriverState into a new AioContext. This function calls the block > driver's optional .bdrv_detach_aio_context() and .bdrv_attach_aio_context() > functions. Implement detach/attach to move the fd handlers, timers, or BHs to > the new AioContext. > > Finally, block drivers that manage their own child nodes also need to > implement detach/attach because the generic block layer doesn't know about > their children. Both ->file and ->backing_hd are automatically taken care of > but blkverify, quorum, and VMDK need to manually propagate detach/attach to > their children. > > I have audited and modified all block drivers. Block driver maintainers, > please check I did it correctly and didn't break your code. > > Background > ---------- > The block layer is currently tied to the QEMU main loop for fd handlers, timer > callbacks, and BHs. This means that even on hosts with many cores, parts of > block I/O processing happen in one thread and depend on the QEMU global mutex. > > virtio-blk data-plane has shown that 1,000,000 IOPS is achievable if we use > additional threads that are not under the QEMU global mutex. > > It is necessary to make the QEMU block layer aware that there may be more than > one event loop. This way BlockDriverState can be used from a thread without > contention on the QEMU global mutex. > > This series builds on the aio_context_acquire/release() interface that allows > a > thread to temporarily grab an AioContext. We add bdrv_set_aio_context(bs, > aio_context) for changing which AioContext a BlockDriverState uses. > > The final patches convert virtio-blk data-plane to use the QEMU block layer > and > let the BlockDriverState run in the IOThread AioContext. > > What's next? > ------------ > I have already made block I/O throttling work in another AioContext and will > send the series out next week. > > In order to keep this series reviewable, I'm holding back those patches for > now. One could say, "throttling" them. > > Thank you, thank you, I'll be here all night! > > Stefan Hajnoczi (22): > block: use BlockDriverState AioContext > block: acquire AioContext in bdrv_close_all() > block: add bdrv_set_aio_context() > blkdebug: use BlockDriverState's AioContext > blkverify: implement .bdrv_detach/attach_aio_context() > curl: implement .bdrv_detach/attach_aio_context() > gluster: use BlockDriverState's AioContext > iscsi: implement .bdrv_detach/attach_aio_context() > nbd: implement .bdrv_detach/attach_aio_context() > nfs: implement .bdrv_detach/attach_aio_context() > qed: use BlockDriverState's AioContext > quorum: implement .bdrv_detach/attach_aio_context() > block/raw-posix: implement .bdrv_detach/attach_aio_context() > block/linux-aio: fix memory and fd leak > rbd: use BlockDriverState's AioContext > sheepdog: implement .bdrv_detach/attach_aio_context() > ssh: use BlockDriverState's AioContext > vmdk: implement .bdrv_detach/attach_aio_context() > dataplane: use the QEMU block layer for I/O > dataplane: delete IOQueue since it is no longer used > dataplane: implement async flush > raw-posix: drop raw_get_aio_fd() since it is no longer used > > block.c | 88 +++++++++++++-- > block/blkdebug.c | 2 +- > block/blkverify.c | 47 +++++--- > block/curl.c | 194 +++++++++++++++++++------------- > block/gluster.c | 7 +- > block/iscsi.c | 79 +++++++++---- > block/linux-aio.c | 24 +++- > block/nbd-client.c | 24 +++- > block/nbd-client.h | 4 + > block/nbd.c | 87 +++++++++------ > block/nfs.c | 80 ++++++++++---- > block/qed-table.c | 8 +- > block/qed.c | 35 +++++- > block/quorum.c | 48 ++++++-- > block/raw-aio.h | 3 + > block/raw-posix.c | 82 ++++++++------ > block/rbd.c | 5 +- > block/sheepdog.c | 118 +++++++++++++------- > block/ssh.c | 36 +++--- > block/vmdk.c | 23 ++++ > hw/block/dataplane/Makefile.objs | 2 +- > hw/block/dataplane/ioq.c | 117 -------------------- > hw/block/dataplane/ioq.h | 57 ---------- > hw/block/dataplane/virtio-blk.c | 233 > +++++++++++++++------------------------ > include/block/block.h | 20 ++-- > include/block/block_int.h | 36 ++++++ > 26 files changed, 829 insertions(+), 630 deletions(-) > delete mode 100644 hw/block/dataplane/ioq.c > delete mode 100644 hw/block/dataplane/ioq.h >