On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote: > bdrv_unref() is called by a lot of places that need to hold the graph > lock (it naturally happens in the context of operations that change the > graph). However, bdrv_unref() takes the graph writer lock internally, so > it can't actually be called while already holding a graph lock without > causing a deadlock. > > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the > node before closing it, and draining requires that the graph is > unlocked. > > The solution is to defer deleting the node until we don't hold the lock > any more and draining is possible again. > > Note that keeping images open for longer than necessary can create > problems, too: You can't open an image again before it is really closed > (if image locking didn't prevent it, it would cause corruption). > Reopening an image immediately happens at least during bdrv_open() and > bdrv_co_create(). > > In order to solve this problem, make sure to run the deferred unref in > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK. > > The output of iotest 051 is updated because the additional polling > changes the order of HMP output, resulting in a new "(qemu)" prompt in > the test output that was previously on a separate line and filtered out. > > Signed-off-by: Kevin Wolf <[email protected]> > --- > include/block/block-global-state.h | 1 + > block.c | 9 +++++++++ > block/graph-lock.c | 23 ++++++++++++++++------- > tests/qemu-iotests/051.pc.out | 6 +++--- > 4 files changed, 29 insertions(+), 10 deletions(-) > > diff --git a/include/block/block-global-state.h > b/include/block/block-global-state.h > index f347199bff..e570799f85 100644 > --- a/include/block/block-global-state.h > +++ b/include/block/block-global-state.h > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char > *fmt, > void bdrv_ref(BlockDriverState *bs); > void no_coroutine_fn bdrv_unref(BlockDriverState *bs); > void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs); > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs); > void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child); > BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, > BlockDriverState *child_bs, > diff --git a/block.c b/block.c > index 6376452768..9c4f24f4b9 100644 > --- a/block.c > +++ b/block.c > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs) > } > } > > +void bdrv_schedule_unref(BlockDriverState *bs)
Please add a doc comment explaining when and why this should be used.
> +{
> + if (!bs) {
> + return;
> + }
> + aio_bh_schedule_oneshot(qemu_get_aio_context(),
> + (QEMUBHFunc *) bdrv_unref, bs);
> +}
> +
> struct BdrvOpBlocker {
> Error *reason;
> QLIST_ENTRY(BdrvOpBlocker) list;
> diff --git a/block/graph-lock.c b/block/graph-lock.c
> index 5e66f01ae8..395d387651 100644
> --- a/block/graph-lock.c
> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> void bdrv_graph_wrunlock(void)
> {
> GLOBAL_STATE_CODE();
> - QEMU_LOCK_GUARD(&aio_context_list_lock);
> assert(qatomic_read(&has_writer));
>
> + WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> + /*
> + * No need for memory barriers, this works in pair with
> + * the slow path of rdlock() and both take the lock.
> + */
> + qatomic_store_release(&has_writer, 0);
> +
> + /* Wake up all coroutine that are waiting to read the graph */
s/coroutine/coroutines/
> + qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> + }
> +
> /*
> - * No need for memory barriers, this works in pair with
> - * the slow path of rdlock() and both take the lock.
> + * Run any BHs that were scheduled during the wrlock section and that
> + * callers might expect to have finished (e.g. bdrv_unref() calls). Do
> this
Referring directly to bdrv_schedule_unref() would help make it clearer
what you mean.
> + * only after restarting coroutines so that nested event loops in BHs
> don't
> + * deadlock if their condition relies on the coroutine making progress.
> */
> - qatomic_store_release(&has_writer, 0);
> -
> - /* Wake up all coroutine that are waiting to read the graph */
> - qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> + aio_bh_poll(qemu_get_aio_context());
Keeping a dedicated list of BDSes pending unref seems cleaner than using
the aio_bh_poll(). There's a lot of stuff happening in the event loop
and I wonder if those other BHs might cause issues if they run at the
end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
an example of this, but other things could happen too (e.g. monitor
command processing).
> }
>
> void coroutine_fn bdrv_graph_co_rdlock(void)
> diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
> index 4d4af5a486..7e10c5fa1b 100644
> --- a/tests/qemu-iotests/051.pc.out
> +++ b/tests/qemu-iotests/051.pc.out
> @@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs
> media, but drive is empty
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> ide-hd,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of
> active block backend
> +(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change
> iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-blk-pci,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change
> iothread of active block backend
> +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot
> change iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> @@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on:
> Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change
> iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-scsi,id=virtio-scsi1,iothread=thread0 -device
> scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> --
> 2.41.0
>
signature.asc
Description: PGP signature
