Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben: > Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin() > is not a good idea: the callback might be called when running > a drain in a coroutine, and bdrv_drained_begin_poll() does not > handle that case, resulting in assertion failure.
I remembered that we talked about this only recently on IRC, but it didn't make any sense to me again when I read this commit message. So I think we need --verbose. The .drained_begin callback was always meant to run outside of coroutine context, so the unexpected part isn't that it calls a function that can't run in coroutine context, but that it is already called itself in coroutine context. The problematic path is bdrv_replace_child_noperm() which then calls bdrv_parent_drained_begin_single(poll=true). Polling in coroutine context is dangerous, it can cause deadlocks because the caller of the coroutine can't make progress. So I believe this call is already wrong in coroutine context. Now I don't know the call path up to bdrv_replace_child_noperm(), but as far as I remember, that was another function that was originally never run in coroutine context. Maybe we have good reason to change this, I can't point to anything that would be inherently wrong with it, but I would still be curious in which context it does run in a coroutine now. Anyway, whatever the specific place is, I believe we must drop out of coroutine context _before_ calling bdrv_parent_drained_begin_single(), not only in callbacks called by it. > Instead, bdrv_do_drained_begin with no recursion and poll > will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce) > but will firstly check if we are already in a coroutine, and exit > from that via bdrv_co_yield_to_drain(). > > Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com> Kevin