Am 08.05.2025 um 16:09 hat Fiona Ebner geschrieben: > While bdrv_drained_begin() is now marked as GRAPH_UNLOCKED, TSA is not > able to catch a remaining instance where the function is called with > the graph lock held in bdrv_change_aio_context(). This is because the > call is preceded by a bdrv_graph_rdunlock_main_loop(), but for callers > that hold the exclusive lock that does not actually release the lock.
And releasing it wouldn't be correct anyway. > In combination with block-stream, there is a deadlock that can happen > because of this [0]. In particular, it can happen that > main thread IO thread > 1. acquires write lock > in blk_co_do_preadv_part(): > 2. have non-zero blk->in_flight > 3. try to acquire read lock > 4. begin drain > > Steps 3 and 4 might be switched. Draining will poll and get stuck, > because it will see the non-zero in_flight counter. But the IO thread > will not make any progress either, because it cannot acquire the read > lock. > > More granular draining is not trivially possible, because > bdrv_change_aio_context() can recursively call itself e.g. via > bdrv_child_change_aio_context(). > > [0]: > https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63...@virtuozzo.com/ > > Reported-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com> > Signed-off-by: Fiona Ebner <f.eb...@proxmox.com> > --- > block.c | 91 ++++++++++++++++++++---------- > block/backup.c | 2 + > block/blklogwrites.c | 4 ++ > block/blkverify.c | 2 + > block/block-backend.c | 4 ++ > block/commit.c | 4 ++ > block/mirror.c | 5 ++ > block/qcow2.c | 2 + > block/quorum.c | 4 ++ > block/replication.c | 7 +++ > block/snapshot.c | 2 + > block/stream.c | 10 ++-- > block/vmdk.c | 10 ++++ > blockdev.c | 17 ++++-- > blockjob.c | 6 ++ > include/block/block-global-state.h | 8 ++- > tests/unit/test-bdrv-drain.c | 18 ++++++ > tests/unit/test-bdrv-graph-mod.c | 10 ++++ > 18 files changed, 164 insertions(+), 42 deletions(-) > > diff --git a/block.c b/block.c > index 10052027cd..d84ac2f394 100644 > --- a/block.c > +++ b/block.c > @@ -1720,12 +1720,14 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver > *drv, const char *node_name, > open_failed: > bs->drv = NULL; > > + bdrv_drain_all_begin(); > bdrv_graph_wrlock(); > if (bs->file != NULL) { > bdrv_unref_child(bs, bs->file); > assert(!bs->file); > } > bdrv_graph_wrunlock(); > + bdrv_drain_all_end(); I feel this patch is doing too much at once, pushing drain up three levels to all callers of bdrv_unref_child(). Please split it, doing only one level per patch, i.e. the first patch only marks bdrv_change_aio_context() GRAPH_RDLOCK, removes the double locking and moves the drain (which becomes drain_all) to bdrv_try_change_aio_context(). Then the next patch marks that one GRAPH_RDLOCK, etc. This will make it more obvious that each step is done correctly. For example, checking this hunk I noticed that bdrv_unref_child() doesn't document that child->bs must be drained, but doing everything at once, it's not obvious if and which other functions have the same problem. > @@ -7702,14 +7699,12 @@ static bool bdrv_change_aio_context(BlockDriverState > *bs, AioContext *ctx, > } > > /* > - * Change bs's and recursively all of its parents' and children's AioContext > - * to the given new context, returning an error if that isn't possible. > - * > - * If ignore_child is not NULL, that child (and its subgraph) will not > - * be touched. > + * Use bdrv_try_change_aio_context() or bdrv_try_change_aio_context_locked(). > */ > -int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx, > - BdrvChild *ignore_child, Error **errp) > +static int bdrv_try_change_aio_context_common(BlockDriverState *bs, > + AioContext *ctx, > + BdrvChild *ignore_child, > + Error **errp) This is exactly the same as bdrv_try_change_aio_context_locked(), so this should be called bdrv_try_change_aio_context_locked() and the wrapper below should go away. Don't forget to add TSA annotations to functions whose interface you touch (but I already mentioned this above). > { > Transaction *tran; > GHashTable *visited; I'll defer review of the rest of the patch to the next version when it's split. As it is, it's too hard to understand where each specific change comes from. (That is, I could probably work it out for each individual hunk, but I'd lose the big picture of what I'm really looking at and if the changes are complete.) Kevin