Am 24.10.2025 um 20:07 hat Wesley Hershberger geschrieben:
> Some operations insert an implicit node above the top node in the block
> graph (e.g. block-stream or blockdev-backup). The implicit node is
> removed when the operation finishes. If QMP query-named-block-nodes is
> called while the operation is removing the implicit node, it may observe
> a node with no children, causing a segfault.

How can a QMP command run in the middle of removing a node? Is this the
real bug?

You say block-stream is affected, so let's look at stream.c. The
interesting part here is bdrv_cor_filter_drop(s->cor_filter_bs).

bdrv_drop_filter() calls bdrv_drained_begin() and bdrv_graph_wrlock(),
which can run arbitrary callbacks, but not QMP commands. It's too eary
anyway, the filter is still in the graph at this point.

Between bdrv_replace_node_common(), which removes the node from its
parent, and bdrv_unref(cor_filter_bs), I don't see any place that could
run a QMP command.

Does cor_filter_bs have a refcount > 1 before running stream_prepare()
or stream_clean()?

Aha, your previous commit message [1] is actually clearer on this:

    The cor_filter_bs was added to the blockjob as the main BDS (by
    passing it to block_job_create), so bdrv_cor_filter_drop doesn't
    actually remove it from global_bdrv_states.

[1] https://patchew.org/QEMU/[email protected]/

So we _are_ creating a state in which cor_filter_bs still exists, but
doesn't have a child any more. Which is rather untypical for a filter
(in fact, it's against the definition of a filter). And your two
different patches address this from two different angles:

1. Don't even let this situation arise. We need to make sure that
   cor_filter_bs never exists without a child - or at least, that it's
   happening only for a short time while we know that no other code is
   running. This is what your previous patch attempted.

2. Make sure that everything else in QEMU can deal with a filter node
   that doesn't have a child. This is what this one does.

Hanna, do you have an opinion on these two options?

I'm not sure myself, but I see that both aren't mutually exclusive. I
would say that having 1. is certainly a good thing that makes everything
else simpler and less likely to fail, so that's the one I would take in
any case. I'm not completely sure if that means v2 of "stream: Remove
bdrv from job in .clean()", or if another version that just removes this
one node from the job would be better. We do have bdrv_drain_all_begin()
later in stream_prepare(), which must be assumed to run any sorts of
callbacks, so removing the node in stream_clean() might be too late in
some cases.

And then we could think of having this patch for 2., probably split
into two patches - though what tells us that this is complete? I would
be surprised if there aren't more places in QEMU that assume that a
filter node has exactly one child. So I think in this case we would have
to audit the rest of the block layer to make sure we caught all of them.

Hm, I think I am relatively sure now actually that 2. is a bad idea...

So what if we don't actually do 2., but then add an assertion to
bdrv_cor_filter_drop() that verifies that the refcount is 1 before
dropping the filter node? This should help us make sure that the patch
for 1. actually does what it's supposed to do.

> This is hypothesized to only affect the block-stream operation as other
> operations use the workaround bdc4c4c5e372756a5ba3fb3a61e585b02f0dd7f4
> or do not detach their children during cleanup (see
> 3108a15cf09865456d499b08fe14e3dbec4ccbb3).
> 
> This backtrace was observed in #3149 on a relatively recent build. The
> bs passed to bdrv_refresh_filename is the cor_filter_bs from the
> block-stream operation; bs->implicit was "true".
> 
> 0  bdrv_refresh_filename (bs=0x5efed72f8350)
>     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
> 1  0x00005efea73cf9dc in bdrv_block_device_info
>     (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
>     at block/qapi.c:62
> 2  0x00005efea7391ed3 in bdrv_named_nodes_list
>     (flat=<optimized out>, errp=0x7ffeb829ebd8)
>     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
> 3  0x00005efea7471993 in qmp_query_named_block_nodes
>     (has_flat=<optimized out>, flat=<optimized out>, errp=0x7ffeb829ebd8)
>     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
> 4  qmp_marshal_query_named_block_nodes
>     (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
>     at qapi/qapi-commands-block-core.c:553
> 5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0)
>     at qapi/qmp-dispatch.c:128
> 6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430)
>     at util/async.c:219
> ...

This is one change...

> The get_allocated_file_size change resolves a second segfault after the
> first was resolved. Here, bdrv_filter_bs returns NULL as the
> bs (cor_filter_bs) has no children:
> 
> 0  bdrv_co_get_allocated_file_size (bs=<optimized out>)
>     at /usr/src/qemu-1:10.1.0+ds-5ubuntu2+test8/b/qemu/block.c:6018
> 1  0x0000631d078522be in bdrv_co_get_allocated_file_size_entry
>     (opaque=0x7ffd375c5dd0) at block/block-gen.c:288
> 2  0x0000631d07929ec2 in coroutine_trampoline
>     (i0=<optimized out>, i1=<optimized out>)
>     at util/coroutine-ucontext.c:175
> ...

...and this is logically a separate one.

So if we were to go down this route, I think we would have a full patch
serie to make filter nodes without a child safe. These two parts would
be separate patches, and I'm almost sure that we would get more patches
for the series to make it fully safe everywhere.

As I said above, I think this is too hard to get correct to be a good
idea, though.

> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
> Buglink: https://bugs.launchpad.net/bugs/2126951
> Suggested-by: Vladimir Sementsov-Ogievskiy <[email protected]>
> Signed-off-by: Wesley Hershberger <[email protected]>
> ---
> As discussed in the previous thread:
> https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg05458.html
> 
> This resolves the issue with my reproducer.
> 
> make check-block passes locally.
> 
> Please let me know if any adjustments to the patch are needed.
> 
> Thanks for the quick & helpful reviews!

Sorry, Wesley, that this turns out to be a bit more complicated! We'll
probably need some further discussion before we can know if or which
adjustments to the patch are needed.

Before I send this, I just had another thought: Why does copy-on-read
even drop the child in bdrv_cor_filter_drop()? Wouldn't the normal
mode of operation be that for a short time you have both the cor node
and its parent point to the same child node, and that would just work?
I see commit messages about conflicting node permissions (3108a15cf09 by
Vladimir), but I don't understand this. A filter without parents
shouldn't try to take any permissions.

So another option for never letting the situation arise would be that
cor_filter_bs just keeps its child until it's really deleted.

Vladimir, do you remember what the specific problem was?

Kevin


Reply via email to