Am 09.01.25 um 11:47 schrieb Kevin Wolf: > Am 08.01.2025 um 13:46 hat Fiona Ebner geschrieben: >> Setting blk->root is a graph change operation and thus needs to be >> protected by the block graph write lock in blk_remove_bs(). The >> assignment to blk->root in blk_insert_bs() is already protected by >> the block graph write lock. > > Hm, if that's the case, then we should also enforce this in the > declaration of BlockBackend: > > BdrvChild * GRAPH_RDLOCK_PTR root; > > However, this results in more compiler failures that we need to fix. You > caught the only remaining writer, but the lock is only fully effective > if all readers take it, too.
I started giving this a try, but quickly ran into some issues/questions: 1. For global state code, is it preferred to use GRAPH_RDLOCK_GUARD_MAINLOOP() to cover the whole function or better to use bdrv_graph_rd(un)lock_main_loop() to keep the locked section as small as necessary? I feel like the former can be more readable, e.g. in blk_insert_bs(), blk_new_open(), where blk->root is used in conditionals. 2. In particular, protecting blk->root means that blk_bs() needs to have the read lock. In fact, blk_bs() is reading blk->root twice in a row, so it seems like it could suffer from a potential NULL pointer dereference (or I guess after compiler optimization a potential use-after-free)? Since blk_bs() is IO_CODE() and not a coroutine, I tried to mark it GRAPH_RDLOCK and move on to the callers. However, one caller is blk_nb_sectors() which itself is called by blk_get_geometry(). Both of these are manually-written coroutine wrappers: > commit 81f730d4d0e8af9c0211c3fedf406df0046341a9 > Author: Paolo Bonzini <pbonz...@redhat.com> > Date: Fri Apr 7 17:33:03 2023 +0200 > > block, block-backend: write some hot coroutine wrappers by hand > > The introduction of the graph lock is causing blk_get_geometry, a hot > function > used in the I/O path, to create a coroutine. However, the only part that > really > needs to run in coroutine context is the call to > bdrv_co_refresh_total_sectors, > which in turn only happens in the rare case of host CD-ROM devices. > > So, write by hand the three wrappers on the path from blk_co_get_geometry > to > bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only > created > if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors. Both the blk_bs() and blk_nb_sectors() functions are IO_CODE(), but not coroutines, and callers of blk_get_geometry are already in the device code. I'm not sure how to proceed here, happy to hear suggestions :) ---snip--- >> diff --git a/block/block-backend.c b/block/block-backend.c >> index c93a7525ad..9678615318 100644 >> --- a/block/block-backend.c >> +++ b/block/block-backend.c >> @@ -887,9 +887,9 @@ void blk_remove_bs(BlockBackend *blk) >> */ >> blk_drain(blk); >> root = blk->root; >> - blk->root = NULL; >> >> bdrv_graph_wrlock(); >> + blk->root = NULL; >> bdrv_root_unref_child(root); >> bdrv_graph_wrunlock(); >> } > > I think the 'root = blk->root' needs to be inside the locked section, > too. Otherwise blk->root could change during bdrv_graph_wrlock() (which > has a nested event loop) and root would be stale. I assume clang would > complain about this with the added GRAPH_RDLOCK_PTR. Oh I see, good catch! Best Regards, Fiona