On Fri, 04/07 14:54, Fam Zheng wrote:
>
> main loop iothread
> -----------------------------------------------------------------------
> blockdev_snapshot
> aio_context_acquire(bs->ctx)
> bdrv_flush(bs)
> bdrv_co_flush(bs)
> ...
> qemu_coroutine_yield(co)
> BDRV_POLL_WHILE()
> aio_context_release(bs->ctx)
> aio_context_acquire(bs->ctx)
> ...
> aio_co_wake(co)
> aio_poll(qemu_aio_context) ...
> co_schedule_bh_cb() ...
> qemu_coroutine_enter(co) ...
> /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */
> continues... */
> aio_context_release(bs->ctx)
After talking to Kevin on IRC, this aio_context_acquire() in iothread could be
the one in vq handler:
main loop iothread
-----------------------------------------------------------------------
blockdev_snapshot
aio_context_acquire(bs->ctx)
virtio_scsi_data_plane_handle_cmd
bdrv_drained_begin(bs->ctx)
bdrv_flush(bs)
bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter
...
qemu_coroutine_yield(co)
BDRV_POLL_WHILE()
aio_context_release(bs->ctx)
aio_context_acquire(bs->ctx).return
...
aio_co_wake(co)
aio_poll(qemu_aio_context) ...
co_schedule_bh_cb() ...
qemu_coroutine_enter(co) ...
/* (A) bdrv_co_flush(bs) /* (B) I/O on bs */
continues... */
aio_context_release(bs->ctx)
aio_context_acquire(bs->ctx)
Note that in this special case, bdrv_drained_begin() doesn't do the "release,
poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. This might be the
root cause of the race? (Ed's test case showed that apart from vq handlers,
block jobs in the iothread can also trigger the same pattern of race. For this
part we need John's patches to pause block jobs in bdrv_drained_begin.)
Fam