On Mon, Oct 27, 2025 at 4:55 AM Kevin Wolf <[email protected]> wrote:
>
> Am 24.10.2025 um 20:07 hat Wesley Hershberger geschrieben:
>
> Sorry, Wesley, that this turns out to be a bit more complicated! We'll
> probably need some further discussion before we can know if or which
> adjustments to the patch are needed.
Hi Kevin, thanks for your thoughts.
> Before I send this, I just had another thought: Why does copy-on-read
> even drop the child in bdrv_cor_filter_drop()? Wouldn't the normal
> mode of operation be that for a short time you have both the cor node
> and its parent point to the same child node, and that would just work?
> I see commit messages about conflicting node permissions (3108a15cf09 by
> Vladimir), but I don't understand this. A filter without parents
> shouldn't try to take any permissions.
>
> So another option for never letting the situation arise would be that
> cor_filter_bs just keeps its child until it's really deleted.
It sounds like you'd be open to a patch that partially reverts 3108a15cf09?
In my local testing I did verify that preventing the child from being detached
resolves the bug and am currently double-checking it with this diff:
diff --git a/block.c b/block.c
index 8848e9a7ed..72261ea1d4 100644
--- a/block.c
+++ b/block.c
@@ -5386,17 +5386,13 @@ bdrv_replace_node_noperm(BlockDriverState *from,
*
* With auto_skip=false the error is returned if from has a parent which should
* not be updated.
- *
- * With @detach_subchain=true @to must be in a backing chain of @from. In this
- * case backing link of the cow-parent of @to is removed.
*/
static int GRAPH_WRLOCK
bdrv_replace_node_common(BlockDriverState *from, BlockDriverState *to,
- bool auto_skip, bool detach_subchain, Error **errp)
+ bool auto_skip, Error **errp)
{
Transaction *tran = tran_new();
g_autoptr(GSList) refresh_list = NULL;
- BlockDriverState *to_cow_parent = NULL;
int ret;
GLOBAL_STATE_CODE();
@@ -5405,17 +5401,6 @@ bdrv_replace_node_common(BlockDriverState
*from, BlockDriverState *to,
assert(to->quiesce_counter);
assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
- if (detach_subchain) {
- assert(bdrv_chain_contains(from, to));
- assert(from != to);
- for (to_cow_parent = from;
- bdrv_filter_or_cow_bs(to_cow_parent) != to;
- to_cow_parent = bdrv_filter_or_cow_bs(to_cow_parent))
- {
- ;
- }
- }
-
/*
* Do the replacement without permission update.
* Replacement may influence the permissions, we should calculate new
@@ -5427,11 +5412,6 @@ bdrv_replace_node_common(BlockDriverState
*from, BlockDriverState *to,
goto out;
}
- if (detach_subchain) {
- /* to_cow_parent is already drained because from is drained */
- bdrv_remove_child(bdrv_filter_or_cow_child(to_cow_parent), tran);
- }
-
refresh_list = g_slist_prepend(refresh_list, to);
refresh_list = g_slist_prepend(refresh_list, from);
@@ -5450,7 +5430,7 @@ out:
int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
Error **errp)
{
- return bdrv_replace_node_common(from, to, true, false, errp);
+ return bdrv_replace_node_common(from, to, true, errp);
}
int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
@@ -5466,7 +5446,7 @@ int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
bdrv_drained_begin(child_bs);
bdrv_graph_wrlock();
- ret = bdrv_replace_node_common(bs, child_bs, true, true, errp);
+ ret = bdrv_replace_node_common(bs, child_bs, true, errp);
bdrv_graph_wrunlock();
bdrv_drained_end(child_bs);
@@ -5917,17 +5897,7 @@ int bdrv_drop_intermediate(BlockDriverState
*top, BlockDriverState *base,
updated_children = g_slist_prepend(updated_children, c);
}
- /*
- * It seems correct to pass detach_subchain=true here, but it triggers
- * one more yet not fixed bug, when due to nested aio_poll loop
we switch to
- * another drained section, which modify the graph (for example, removing
- * the child, which we keep in updated_children list). So, it's a TODO.
- *
- * Note, bug triggered if pass detach_subchain=true here and run
- * test-bdrv-drain. test_drop_intermediate_poll() test-case will crash.
- * That's a FIXME.
- */
- bdrv_replace_node_common(top, base, false, false, &local_err);
+ bdrv_replace_node_common(top, base, false, &local_err);
bdrv_graph_wrunlock();
if (local_err) {
> Vladimir, do you remember what the specific problem was?
I'd be happy to submit this if Vladimir is happy that it won't cause
other issues
(I've run make check-block with it and saw no failures).
Thanks so much!
~Wesley