Am 10.04.2017 um 17:05 hat Fam Zheng geschrieben: > BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling > the main context, which relies on the yielded the coroutine would continue on > bs->ctx and notify qemu_aio_context with bdrv_wakeup(). Thus, using > qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered > from main loop, co->ctx will be qemu_aio_context, as a result of the "release, > poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both > main thread and the iothread access the same BDS: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > virtio_scsi_data_plane_handle_cmd > bdrv_drained_begin(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > aio_context_acquire(bs->ctx).enter > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > > aio_context_acquire(bs->ctx).return > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > > Note that in above case, bdrv_drained_begin() doesn't do the "release, > poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. > > Fix this by using bdrv_coroutine_enter and enter coroutine in the right > context. > > iotests 109 output is updated because the coroutine reenter flow during > mirror job complete is different (now through co_queue_wakeup, instead > of the unconditional qemu_coroutine_switch before), making the end job > len different. > > Signed-off-by: Fam Zheng <f...@redhat.com> > > fixup > --- > block/block-backend.c | 4 ++-- > block/io.c | 14 +++++++-------
These changes are okay. The question is whether we need more of them. We do have a few more callers of qemu_coroutine_create(): * blkverify, quorum: Here, we are already in a coroutine context and therefore the caller has made sure that we're in the right AioContext. The usual functions simply inherit it, which is correct. * nbd-client: Has its own handling for getting the coroutine in to the right AioContexts, I'll assume it's okay. * sheepdog: co_read_response() calls aio_co_wake() immediately after qemu_coroutine_create(). Is this allowed? I don't think co->ctx is even initialised at this time. Is this a bug introduced in Paolo's 9d456654? ('block: explicitly acquire aiocontext in callbacks that need it') Anyway, called only in AioContext callbacks, so we're fine. do_req() is called from the main loop context in a few instances. Not sure if this is a problem, but maybe using bdrv_coroutine_enter() there would be safer. * 9p, migration: Outside the block layer, trivially ok * nbd/server: nbd_co_client_start() runs in the caller's AioContext. Should be fine, it doesn't directly issue any requests, but just spawns other coroutines that are put in the right AioContext. * qemu-img, tests: Nobody cares about AioContexts there So I think we should have another look at Sheepdog, the rest seems to be fine. > tests/qemu-iotests/109.out | 10 +++++----- This one is curious. Why do we copy more data depending on how the job coroutine is reentered? Kevin