Suggestions hold pattern.. programmers obviously understand control logic. Systems built with operational flaws. Will freeze up when u start repairing it while it's running.
On Thu, Jan 9, 2025, 4:48 AM Kevin Wolf <kw...@redhat.com> wrote: > Am 08.01.2025 um 13:46 hat Fiona Ebner geschrieben: > > Setting blk->root is a graph change operation and thus needs to be > > protected by the block graph write lock in blk_remove_bs(). The > > assignment to blk->root in blk_insert_bs() is already protected by > > the block graph write lock. > > Hm, if that's the case, then we should also enforce this in the > declaration of BlockBackend: > > BdrvChild * GRAPH_RDLOCK_PTR root; > > However, this results in more compiler failures that we need to fix. You > caught the only remaining writer, but the lock is only fully effective > if all readers take it, too. > > > In particular, the graph read lock in blk_co_do_flush() could > > previously not ensure that blk_bs(blk) would always return the same > > value during the locked section, which could lead to a segfault [0] in > > combination with migration [1]. > > > > From the user-provided backtraces in the forum thread [1], it seems > > like blk_co_do_flush() managed to get past the > > blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a > > non-NULL value during the check, but then, when calling > > bdrv_co_flush(), blk_bs(blk) returned NULL. > > > > [0]: > > > > > 0 bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287 > > > 1 bdrv_co_flush (bs=0x0) at ../block/io.c:2948 > > > 2 bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901 > > > > [1]: https://forum.proxmox.com/threads/158072 > > > > Cc: qemu-sta...@nongnu.org > > Signed-off-by: Fiona Ebner <f.eb...@proxmox.com> > > --- > > block/block-backend.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block/block-backend.c b/block/block-backend.c > > index c93a7525ad..9678615318 100644 > > --- a/block/block-backend.c > > +++ b/block/block-backend.c > > @@ -887,9 +887,9 @@ void blk_remove_bs(BlockBackend *blk) > > */ > > blk_drain(blk); > > root = blk->root; > > - blk->root = NULL; > > > > bdrv_graph_wrlock(); > > + blk->root = NULL; > > bdrv_root_unref_child(root); > > bdrv_graph_wrunlock(); > > } > > I think the 'root = blk->root' needs to be inside the locked section, > too. Otherwise blk->root could change during bdrv_graph_wrlock() (which > has a nested event loop) and root would be stale. I assume clang would > complain about this with the added GRAPH_RDLOCK_PTR. > > Kevin > > >