On Tue, Apr 10, 2018 at 12:49:42PM +0800, Peter Xu wrote: > Eric Auger reported the problem days ago that OOB broke ARM when running > with libvirt: > > http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html > > The problem was that the monitor dispatcher bottom half was bound to > qemu_aio_context now, which could be polled unexpectedly in block code.
And TPM and 9P code, who all use nested event loops. > We should keep the dispatchers run in iohandler_ctx just like what we > did before the Out-Of-Band series (chardev uses qio, and qio binds > everything with iohandler_ctx). > > If without this change, QMP dispatcher might be run even before reaching > main loop in block IO path, for example, in a stack like (the ARM case, > "cont" command handler run even during machine init phase): > > #0 qmp_cont () > #1 0x00000000006bd210 in qmp_marshal_cont () > #2 0x0000000000ac05c4 in do_qmp_dispatch () > #3 0x0000000000ac07a0 in qmp_dispatch () > #4 0x0000000000472d60 in monitor_qmp_dispatch_one () > #5 0x000000000047302c in monitor_qmp_bh_dispatcher () > #6 0x0000000000acf374 in aio_bh_call () > #7 0x0000000000acf428 in aio_bh_poll () > #8 0x0000000000ad5110 in aio_poll () > #9 0x0000000000a08ab8 in blk_prw () > #10 0x0000000000a091c4 in blk_pread () > #11 0x0000000000734f94 in pflash_cfi01_realize () > #12 0x000000000075a3a4 in device_set_realized () > #13 0x00000000009a26cc in property_set_bool () > #14 0x00000000009a0a40 in object_property_set () > #15 0x00000000009a3a08 in object_property_set_qobject () > #16 0x00000000009a0c8c in object_property_set_bool () > #17 0x0000000000758f94 in qdev_init_nofail () > #18 0x000000000058e190 in create_one_flash () > #19 0x000000000058e2f4 in create_flash () > #20 0x00000000005902f0 in machvirt_init () > #21 0x00000000007635cc in machine_run_board_init () > #22 0x00000000006b135c in main () > > Actually the problem is more severe than that. After we switched to the > qemu AIO handler it means the monitor dispatcher code can even be called > with nested aio_poll(), then it can be an explicit aio_poll() inside > another main loop aio_poll() which could be racy too. > > Switch to use the iohandler_ctx for monitor dispatchers. > > My sincere thanks to Eric Auger who offered great help during both > debugging and verifying the problem. The ARM test was carried out by > applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the > patch. > > A quick test of mine shows that after this patch applied we can pass all > raw iotests even with OOB on by default. > > CC: Eric Blake <ebl...@redhat.com> > CC: Markus Armbruster <arm...@redhat.com> > CC: Stefan Hajnoczi <stefa...@redhat.com> > CC: Fam Zheng <f...@redhat.com> > Reported-by: Eric Auger <eric.au...@redhat.com> > Tested-by: Eric Auger <eric.au...@redhat.com> > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > v2: > - enhanced commit message > --- > monitor.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/monitor.c b/monitor.c > index 51f4cf480f..39f8ee17ba 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -4467,7 +4467,7 @@ static void monitor_iothread_init(void) > * have assumption to be run on main loop thread. It would be > * nice that one day we can remove this assumption in the future. > */ > - mon_global.qmp_dispatcher_bh = aio_bh_new(qemu_get_aio_context(), > + mon_global.qmp_dispatcher_bh = aio_bh_new(iohandler_get_aio_context(), > monitor_qmp_bh_dispatcher, > NULL); Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
signature.asc
Description: PGP signature