Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> This test uses a callback of an I/O function (blk_aio_preadv)
> to modify the graph, using bdrv_attach_child.
> This is simply not allowed anymore. I/O cannot change the graph.

The callback of an I/O function isn't I/O, though. It is code _after_
the I/O has completed. If this doesn't work any more, it feels like this
is a bug.

I think it becomes a bit more obvious when you translate the AIO into
the equivalent coroutine function:

    void coroutine_fn foo(...)
    {
        GLOBAL_STATE_CODE();

        blk_co_preadv(...);
        detach_by_parent_aio_cb(...);
    }

This is obviously correct code that must work. Calling an I/O function
from a GS function is allowed, and of course the GS function may
continue to do GS stuff after the I/O function returned.

(Actually, I'm not sure if it really works when blk is not in the main
AioContext, but your API split patches claim that it does, so let's
assume for the moment that this is already true :-))

> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
> and introduce bdrv_drained_begin_no_poll", the test would simply
> be at risk of failure, because if bdrv_replace_child_noperm()
> (called to modify the graph) would call a drain,
> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
> that specifically asserts that we are not in a coroutine.
> 
> Now that we fixed the behavior, the drain will invoke a bh in the
> main loop, so we don't have such problem. However, this test is still
> illegal and fails because we forbid graph changes from I/O paths.
> 
> Once we add the required subtree_drains to protect
> bdrv_replace_child_noperm(), the key problem in this test is in:

Probably a question for a different patch, but why is a subtree drain
required instead of just a normal node drain? It feels like a bigger
hammer than what is needed for replacing a single child.

> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
> /* Drain and check the expected result */
> bdrv_subtree_drained_begin(parent_b);
> 
> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
> modifies the graph and is invoked during bdrv_subtree_drained_begin().
> The call stack is the following:
> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
> and enters the coroutine running blk_aio_read_entry()
> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>    complete (blk_aio_complete)
> 3. at this point, subtree_drained_begin() kicks in and waits for all
>    in_flight requests, polling
> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
> 5. blk_aio_complete *first* invokes the callback
>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>    so both bdrv_unref_child() and bdrv_attach_child() will have
>    subtree_drains inside. And this causes a deadlock, because the
>    nested drain will wait for in_flight counter to go to zero, which
>    is only happening once the drain itself finishes.

So the problem has nothing to do with detach_by_parent_aio_cb() being
an I/O function in the sense of locking rules (which it isn't), but with
calling a drain operation while having the in_flight counter increased.

In other words, an AIO callback like detach_by_parent_aio_cb() must
never call drain - and it doesn't before your changes to
bdrv_replace_child_noperm() break it. How confident are we that this
only breaks tests and not real code?

Part of the problem is probably that drain is serving two slightly
different purposes inside the block layer (just make sure that nothing
touches the node any more) and callers outside of the block layer (make
sure that everything has completed and no more callbacks will be
called). This bug sits exactly in the difference between those two
concepts - we're waiting for more than we would have to wait for, and it
causes a deadlock in this combination.

I guess it could be fixed if BlockBackend accounted for requests that
are already completed, but their callback hasn't yet. blk_drain() would
then have to wait for those requests, but blk_root_drained_poll()
wouldn't because these requests don't affect the root node any more.

> Different story is test_detach_by_driver_cb(): in this case,
> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
> but it is instead called as a bh running in the main loop by
> detach_by_driver_cb_drained_begin(), the callback for
> .drained_begin().
> 
> This test was added in 231281ab42 and part of the series
> "Drain fixes and cleanups, part 3"
> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
> but as explained above I believe that it is not valid anymore, and
> can be discarded.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com>

I feel throwing tests away just because they don't pass any more is a
bit too simple. :-)

Kevin


Reply via email to