Am 24.10.2025 um 20:07 hat Wesley Hershberger geschrieben:
> Some operations insert an implicit node above the top node in the block
> graph (e.g. block-stream or blockdev-backup). The implicit node is
> removed when the operation finishes. If QMP query-named-block-nodes is
> called while the operation is removing the implicit node, it may observe
> a node with no children, causing a segfault.
How can a QMP command run in the middle of removing a node? Is this the
real bug?
You say block-stream is affected, so let's look at stream.c. The
interesting part here is bdrv_cor_filter_drop(s->cor_filter_bs).
bdrv_drop_filter() calls bdrv_drained_begin() and bdrv_graph_wrlock(),
which can run arbitrary callbacks, but not QMP commands. It's too eary
anyway, the filter is still in the graph at this point.
Between bdrv_replace_node_common(), which removes the node from its
parent, and bdrv_unref(cor_filter_bs), I don't see any place that could
run a QMP command.
Does cor_filter_bs have a refcount > 1 before running stream_prepare()
or stream_clean()?
Aha, your previous commit message [1] is actually clearer on this:
The cor_filter_bs was added to the blockjob as the main BDS (by
passing it to block_job_create), so bdrv_cor_filter_drop doesn't
actually remove it from global_bdrv_states.
[1] https://patchew.org/QEMU/[email protected]/
So we _are_ creating a state in which cor_filter_bs still exists, but
doesn't have a child any more. Which is rather untypical for a filter
(in fact, it's against the definition of a filter). And your two
different patches address this from two different angles:
1. Don't even let this situation arise. We need to make sure that
cor_filter_bs never exists without a child - or at least, that it's
happening only for a short time while we know that no other code is
running. This is what your previous patch attempted.
2. Make sure that everything else in QEMU can deal with a filter node
that doesn't have a child. This is what this one does.
Hanna, do you have an opinion on these two options?
I'm not sure myself, but I see that both aren't mutually exclusive. I
would say that having 1. is certainly a good thing that makes everything
else simpler and less likely to fail, so that's the one I would take in
any case. I'm not completely sure if that means v2 of "stream: Remove
bdrv from job in .clean()", or if another version that just removes this
one node from the job would be better. We do have bdrv_drain_all_begin()
later in stream_prepare(), which must be assumed to run any sorts of
callbacks, so removing the node in stream_clean() might be too late in
some cases.
And then we could think of having this patch for 2., probably split
into two patches - though what tells us that this is complete? I would
be surprised if there aren't more places in QEMU that assume that a
filter node has exactly one child. So I think in this case we would have
to audit the rest of the block layer to make sure we caught all of them.
Hm, I think I am relatively sure now actually that 2. is a bad idea...
So what if we don't actually do 2., but then add an assertion to
bdrv_cor_filter_drop() that verifies that the refcount is 1 before
dropping the filter node? This should help us make sure that the patch
for 1. actually does what it's supposed to do.
> This is hypothesized to only affect the block-stream operation as other
> operations use the workaround bdc4c4c5e372756a5ba3fb3a61e585b02f0dd7f4
> or do not detach their children during cleanup (see
> 3108a15cf09865456d499b08fe14e3dbec4ccbb3).
>
> This backtrace was observed in #3149 on a relatively recent build. The
> bs passed to bdrv_refresh_filename is the cor_filter_bs from the
> block-stream operation; bs->implicit was "true".
>
> 0 bdrv_refresh_filename (bs=0x5efed72f8350)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
> 1 0x00005efea73cf9dc in bdrv_block_device_info
> (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
> at block/qapi.c:62
> 2 0x00005efea7391ed3 in bdrv_named_nodes_list
> (flat=<optimized out>, errp=0x7ffeb829ebd8)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
> 3 0x00005efea7471993 in qmp_query_named_block_nodes
> (has_flat=<optimized out>, flat=<optimized out>, errp=0x7ffeb829ebd8)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
> 4 qmp_marshal_query_named_block_nodes
> (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
> at qapi/qapi-commands-block-core.c:553
> 5 0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0)
> at qapi/qmp-dispatch.c:128
> 6 0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430)
> at util/async.c:219
> ...
This is one change...
> The get_allocated_file_size change resolves a second segfault after the
> first was resolved. Here, bdrv_filter_bs returns NULL as the
> bs (cor_filter_bs) has no children:
>
> 0 bdrv_co_get_allocated_file_size (bs=<optimized out>)
> at /usr/src/qemu-1:10.1.0+ds-5ubuntu2+test8/b/qemu/block.c:6018
> 1 0x0000631d078522be in bdrv_co_get_allocated_file_size_entry
> (opaque=0x7ffd375c5dd0) at block/block-gen.c:288
> 2 0x0000631d07929ec2 in coroutine_trampoline
> (i0=<optimized out>, i1=<optimized out>)
> at util/coroutine-ucontext.c:175
> ...
...and this is logically a separate one.
So if we were to go down this route, I think we would have a full patch
serie to make filter nodes without a child safe. These two parts would
be separate patches, and I'm almost sure that we would get more patches
for the series to make it fully safe everywhere.
As I said above, I think this is too hard to get correct to be a good
idea, though.
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
> Buglink: https://bugs.launchpad.net/bugs/2126951
> Suggested-by: Vladimir Sementsov-Ogievskiy <[email protected]>
> Signed-off-by: Wesley Hershberger <[email protected]>
> ---
> As discussed in the previous thread:
> https://lists.gnu.org/archive/html/qemu-devel/2025-10/msg05458.html
>
> This resolves the issue with my reproducer.
>
> make check-block passes locally.
>
> Please let me know if any adjustments to the patch are needed.
>
> Thanks for the quick & helpful reviews!
Sorry, Wesley, that this turns out to be a bit more complicated! We'll
probably need some further discussion before we can know if or which
adjustments to the patch are needed.
Before I send this, I just had another thought: Why does copy-on-read
even drop the child in bdrv_cor_filter_drop()? Wouldn't the normal
mode of operation be that for a short time you have both the cor node
and its parent point to the same child node, and that would just work?
I see commit messages about conflicting node permissions (3108a15cf09 by
Vladimir), but I don't understand this. A filter without parents
shouldn't try to take any permissions.
So another option for never letting the situation arise would be that
cor_filter_bs just keeps its child until it's really deleted.
Vladimir, do you remember what the specific problem was?
Kevin