So the fine-grained locking series has grown from 2 parts to 3... This first part stops where we remove RFifoLock.
In the previous submission, the bulk of the series took care of associating an AioContext to a thread, so that aio_poll could run event handlers only on that thread. This was done to fix a race in bdrv_drain. There were two disadvantages: 1) reporting progress from aio_poll to the main thread was Really Hard. Despite being relatively sure of the correctness---also thanks to formal models---it's not the kind of code that I'd lightly donate to posterity. 2) the strict dependency between bdrv_drain and a single AioContext would have been bad for multiqueue. Instead, this series does the same trick (do not run dataplane event handlers from the main thread) but reports progress directly at the BlockDriverState level. To do so, the mechanism to track in-flight requests is changed to a simple counter. This is more flexible than BdrvTrackedRequest, and lets the block/io.c code track precisely the initiation and completion of the requests. In particular, the counter remains nonzero when a bottom half is required to "really" complete the request. bdrv_drain doesn't rely anymore on a non-blocking aio_poll to execute those bottom halves. It is also more efficient; while BdrvTrackedRequest has to remain for request serialisation, with this series we can in principle make BdrvTrackedRequest optional and enable it only when something (automatic RMW or copy-on-read) requires request serialisation. In general this flows nicely, the only snag being patch 5. The problem here is that mirror is calling bdrv_drain from an AIO callback, which used to be okay (because bdrv_drain more or less tried to guess when all AIO callbacks were done) but now causes a deadlock (because bdrv_drain really checks for AIO callbacks to be complete). To add to the complication, the bdrv_drain happens far away from the AIO callback, in the coroutine that the AIO callback enters. The situation here is admittedly underdefined, and Stefan pointed out that the same solution is found in many other places in the QEMU block layer. Therefore I think it's acceptable. I'm pointing it out explicitly though, because (together with patch 8) it is an example of the issues caused by the new, stricter definition of bdrv_drain. Patches 1-7 organize bdrv_drain into separate phases, with a well-defined order between the BDS and it children. The phases are: 1) disable throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait for I/O to finish; 5) re-enable io_plug and throttling. Previously, throttling of throttling and io_plug were handled by bdrv_flush_io_queue, which was repeatedly called as part of the I/O wait loop. This part of the series removes bdrv_flush_io_queue. Patch 8-11 replace aio_poll with bdrv_drain so that the work done so far applies to all former callers of aio_poll. Patches 12-13 introduce the logic that lets the main thread wait remotely for an iothread to drain the BDS. Patches 14-16 then achieve the purpose of this series, removing the long-running AioContext critical section from iothread_run and removing RFifoLock. The series passes check-block.sh with a few fixes applied for pre-existing bugs: vl: fix migration from prelaunch state migration: fix incorrect memory_global_dirty_log_start outside BQL qed: fix bdrv_qed_drain Paolo Paolo Bonzini (16): block: make bdrv_start_throttled_reqs return void block: move restarting of throttled reqs to block/throttle-groups.c block: introduce bdrv_no_throttling_begin/end block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end mirror: use bottom half to re-enter coroutine block: add BDS field to count in-flight requests block: change drain to look only at one child at a time blockjob: introduce .drain callback for jobs block: wait for all pending I/O when doing synchronous requests nfs: replace aio_poll with bdrv_drain sheepdog: disable dataplane aio: introduce aio_context_in_iothread block: only call aio_poll from iothread iothread: release AioContext around aio_poll qemu-thread: introduce QemuRecMutex aio: convert from RFifoLock to QemuRecMutex async.c | 28 +--- block.c | 4 +- block/backup.c | 7 + block/io.c | 281 +++++++++++++++++++++++++--------------- block/linux-aio.c | 13 +- block/mirror.c | 37 +++++- block/nfs.c | 50 ++++--- block/qed-table.c | 16 +-- block/qed.c | 4 +- block/raw-aio.h | 2 +- block/raw-posix.c | 16 +-- block/sheepdog.c | 19 +++ block/throttle-groups.c | 19 +++ blockjob.c | 16 ++- docs/multiple-iothreads.txt | 40 +++--- include/block/aio.h | 13 +- include/block/block.h | 3 +- include/block/block_int.h | 22 +++- include/block/blockjob.h | 7 + include/block/throttle-groups.h | 1 + include/qemu/rfifolock.h | 54 -------- include/qemu/thread-posix.h | 6 + include/qemu/thread-win32.h | 10 ++ include/qemu/thread.h | 3 + iothread.c | 20 +-- stubs/Makefile.objs | 1 + stubs/iothread.c | 8 ++ tests/.gitignore | 1 - tests/Makefile | 2 - tests/qemu-iotests/060 | 8 +- tests/qemu-iotests/060.out | 4 +- tests/test-aio.c | 22 ++-- tests/test-rfifolock.c | 91 ------------- util/Makefile.objs | 1 - util/qemu-thread-posix.c | 13 ++ util/qemu-thread-win32.c | 25 ++++ util/rfifolock.c | 78 ----------- 37 files changed, 471 insertions(+), 474 deletions(-) delete mode 100644 include/qemu/rfifolock.h create mode 100644 stubs/iothread.c delete mode 100644 tests/test-rfifolock.c delete mode 100644 util/rfifolock.c -- 2.5.0