Re: Deadlock with SATA CD I/O and eject

John Levon Tue, 19 Sep 2023 05:58:08 -0700

On Tue, Sep 19, 2023 at 12:54:59PM +0200, Kevin Wolf wrote:

> > In the meantime, we start processing the blk_drain() code, so by the time 
> > this
> > blk_pread() actually gets handled, quiesce is set, and we get stuck in the
> > blk_wait_while_drained().
> > 
> > I don't know the qemu block stack well enough to propose an actual fix.
> > 
> > Experimentally, waiting for ->in_flight to drop to zero *before* we quiesce 
> > in
> > blk_remove_bs() via an AIO_WAIT_WHILE() avoids the symptom, but I'm pretty 
> > sure
> > that's just a band-aid instead of fixing the deadlock.
> 
> Related discussion: 
> https://lists.gnu.org/archive/html/qemu-block/2023-03/msg00284.html
> 
> Actually, it seems we never fixed that problem either?
> 
> Back then I suggested that blk_drain*() should disable request queuing
> because its callers don't want to quiesce the BlockBackend, but rather
> get their own requests completed. I think this approach would solve this
> one as well.


In this case though, it's not their own requests right? We have incoming I/O
from the guest and the eject is a separate operation.

So why it would be OK to disable queuing and have ongoing I/Os (i.e.
blk->in_flight > 0) racing against the block remove?

> Your experiment with moving the queuing later is basically what Paolo

I think it's much more flaky than that, isn't it? It's just clearing out the
*current* pending queue, but nothing would stop new I/Os from being started
before we dropped into the poll for blk->in_flight to be zero again. In my case,
this just happens to work because the prior tray open notification has stopped
new I/O being filed, right?

thanks
john

Re: Deadlock with SATA CD I/O and eject

Reply via email to