On 3/13/23 13:29, Fiona Ebner wrote:
In fact, shouldn't request queuing was enabled at the _end_ of
bdrv_drained_begin (once the BlockBackend has reached a quiescent
state on its own terms), rather than at the beginning (which leads to
deadlocks like this one)?
Couldn't this lead to scenarios where a busy or malicious guest, which
continues to submit new requests, slows down draining or even prevents
it from finishing?
Possibly, but there is also a .drained_begin/.drained_end callback that
can be defined in order to apply backpressure. (For some other devices,
there's also aio_disable_external/aio_enable_external that do the
equivalent of request queuing but without the deadlocks)
Since starting the queuing of requests at the end of bdrv_drained_begin
wouldn't hurt correctness, and it would fix this kind of deadlock, I
think it would be worth giving it a try.
Paolo