On Wed, May 18, 2022 at 02:43:50PM +0200, Paolo Bonzini wrote:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
> 
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> 
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
>
> 
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
> 
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
> 
> To sum up the patch ordering deadlock that we have right now:
> 
> * in some cases, graph manipulations are protected by the AioContext lock
> 
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
> 
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
> 
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
> 
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

Me too, I don't think the rwlock was particularly nice either.

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to