Paolo's patches have been getting us closer to multiqueue block layer support but there is a final set of changes required that has become clearer to me just recently. I'm curious if this matches Paolo's vision and whether anyone else has comments.
Multiqueue block layer means that I/O requests for a single disk image can be processed by multiple threads safely. Requests will be processed simultaneously where possible, but in some cases synchronization is necessary to protect shared metadata. Imagine a virtio-blk device with multiple virtqueues, each with an ioeventfd that is handled by a different IOThread. Each IOThread should be able to process I/O requests and invoke completion functions in the AioContext that submitted the request. Paolo has made key parts of AioContext and coroutine locks (e.g. CoQueue) thread-safe. Coroutine code can therefore safely execute in multiple IOThreads and locking works correctly. That's not to say that block layer code and block drivers are thread-safe today. They are not because some code still relies on the fact that coroutines only execute in one AioContext. They rely on the AioContext acquire/release lock for thread safety. We need to push the AioContext lock down into BlockDriverState so that thread-safety is not tied to a single AioContext but to the BlockDriverState itself. We also need to audit block layer code to identify places that assume everything is run from a single AioContext. After this is done the final piece is to eliminate bdrv_set_aio_context(). BlockDriverStates should not be associated with an AioContext. Instead they should use whichever AioContext they are invoked under. The current thread's AioContext can be fetched using qemu_get_current_aio_context(). This is either the main loop AioContext or an IOThread AioContext. The .bdrv_attach/detach_aio_context() callbacks will no longer be necessary in a world where block driver code is thread-safe and any AioContext can be used. bdrv_drain_all() and friends do not require extensive modifications because the bdrv_wakeup() mechanism already works properly when there are multiple IOThreads involved. Block jobs no longer need to be in the same AioContext as the BlockDriverState. For simplicity we may choose to always run them in the main loop AioContext by default. This may have a performance impact on tight loops like bdrv_is_allocated() and the initial mirroring phase, but maybe not. The upshot of all this is that bdrv_set_aio_context() goes away while all block driver code needs to be more aware of thread-safety. It can no longer assume that everything is called from one AioContext. We should optimize file-posix.c and qcow2.c for maximum parallelism using fine-grained locks and other techniques. The remaining block drivers can use one CoMutex per BlockDriverState. I'm excited that we're relatively close to multiqueue now. I don't want to jinx it by saying 2018 is the year of the multiqueue block layer, but I'll say it anyway :). Thoughts? Stefan