Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
Am 18.05.2017 um 10:06 hat Stefan Hajnoczi geschrieben: > On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf wrote: > > Am 17.05.2017 um 22:16 hat Eric Blake geschrieben: > >> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote: > >> > Calling aio_poll() directly may have been fine previously, but this is > >> > the future, man! > >> > >> lol > >> > >> > The difference between an aio_poll() loop and > >> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext > >> > around aio_poll(). > >> > > >> > This allows the IOThread to run fd handlers or BHs to complete the > >> > request. Failure to release the AioContext causes deadlocks. > >> > > >> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object > >> > iothread. > >> > >> I'm surprised at how many separate hangs we actually had! > > > > How hard would it be to write some test cases for this? Dataplane has > > a serious lack of automated testing. > > And this hang doesn't even require in-flight guest I/O, so it would be > easy to reproduce in qemu-iotests if we add an IOThread mode. I don't think I would make it a separate mode (because people would run only one or the other - when did you last run qcow2 v2 tests?), but just add some test cases that make use of it during the normal -qcow2 or -raw run. Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf wrote: > Am 17.05.2017 um 22:16 hat Eric Blake geschrieben: >> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote: >> > Calling aio_poll() directly may have been fine previously, but this is >> > the future, man! >> >> lol >> >> > The difference between an aio_poll() loop and >> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext >> > around aio_poll(). >> > >> > This allows the IOThread to run fd handlers or BHs to complete the >> > request. Failure to release the AioContext causes deadlocks. >> > >> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object >> > iothread. >> >> I'm surprised at how many separate hangs we actually had! > > How hard would it be to write some test cases for this? Dataplane has > a serious lack of automated testing. And this hang doesn't even require in-flight guest I/O, so it would be easy to reproduce in qemu-iotests if we add an IOThread mode. Stefan
Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
On Wed, May 17, 2017 at 9:16 PM, Eric Blake wrote: > On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote: >> diff --git a/block/io.c b/block/io.c >> index cc56e90..f0041cd 100644 >> --- a/block/io.c >> +++ b/block/io.c >> @@ -2031,9 +2031,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector >> *qiov, int64_t pos, >> Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, >> &data); >> >> bdrv_coroutine_enter(bs, co); >> -while (data.ret == -EINPROGRESS) { >> -aio_poll(bdrv_get_aio_context(bs), true); >> -} >> +BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS); >> return data.ret; >> } > > Do we have other culprits (not necessarily for vmsave, but for other > situations), where we should be using BDRV_POLL_WHILE in separate > patches? For example, a quick grep show that at least hw/9pfs/9p.c makes > a direct call to aio_poll(,true) in a while loop. virtio-9p doesn't use IOThread (dataplane) so it is not affected. It also doesn't use BlockDriverState so it cannot use BDRV_POLL_WHILE(). I did grep for other aio_poll() callers and didn't spot any obvious cases that need to be fixed. They tend to run in situations where this deadlock cannot occur. Stefan