On Tue, Apr 10, 2018 at 12:49:42PM +0800, Peter Xu wrote:
> Eric Auger reported the problem days ago that OOB broke ARM when running
> with libvirt:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html
> 
> The problem was that the monitor dispatcher bottom half was bound to
> qemu_aio_context now, which could be polled unexpectedly in block code.

And TPM and 9P code, who all use nested event loops.

> We should keep the dispatchers run in iohandler_ctx just like what we
> did before the Out-Of-Band series (chardev uses qio, and qio binds
> everything with iohandler_ctx).
> 
> If without this change, QMP dispatcher might be run even before reaching
> main loop in block IO path, for example, in a stack like (the ARM case,
> "cont" command handler run even during machine init phase):
> 
>         #0  qmp_cont ()
>         #1  0x00000000006bd210 in qmp_marshal_cont ()
>         #2  0x0000000000ac05c4 in do_qmp_dispatch ()
>         #3  0x0000000000ac07a0 in qmp_dispatch ()
>         #4  0x0000000000472d60 in monitor_qmp_dispatch_one ()
>         #5  0x000000000047302c in monitor_qmp_bh_dispatcher ()
>         #6  0x0000000000acf374 in aio_bh_call ()
>         #7  0x0000000000acf428 in aio_bh_poll ()
>         #8  0x0000000000ad5110 in aio_poll ()
>         #9  0x0000000000a08ab8 in blk_prw ()
>         #10 0x0000000000a091c4 in blk_pread ()
>         #11 0x0000000000734f94 in pflash_cfi01_realize ()
>         #12 0x000000000075a3a4 in device_set_realized ()
>         #13 0x00000000009a26cc in property_set_bool ()
>         #14 0x00000000009a0a40 in object_property_set ()
>         #15 0x00000000009a3a08 in object_property_set_qobject ()
>         #16 0x00000000009a0c8c in object_property_set_bool ()
>         #17 0x0000000000758f94 in qdev_init_nofail ()
>         #18 0x000000000058e190 in create_one_flash ()
>         #19 0x000000000058e2f4 in create_flash ()
>         #20 0x00000000005902f0 in machvirt_init ()
>         #21 0x00000000007635cc in machine_run_board_init ()
>         #22 0x00000000006b135c in main ()
> 
> Actually the problem is more severe than that.  After we switched to the
> qemu AIO handler it means the monitor dispatcher code can even be called
> with nested aio_poll(), then it can be an explicit aio_poll() inside
> another main loop aio_poll() which could be racy too.
> 
> Switch to use the iohandler_ctx for monitor dispatchers.
> 
> My sincere thanks to Eric Auger who offered great help during both
> debugging and verifying the problem.  The ARM test was carried out by
> applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the
> patch.
> 
> A quick test of mine shows that after this patch applied we can pass all
> raw iotests even with OOB on by default.
> 
> CC: Eric Blake <ebl...@redhat.com>
> CC: Markus Armbruster <arm...@redhat.com>
> CC: Stefan Hajnoczi <stefa...@redhat.com>
> CC: Fam Zheng <f...@redhat.com>
> Reported-by: Eric Auger <eric.au...@redhat.com>
> Tested-by: Eric Auger <eric.au...@redhat.com>
> Signed-off-by: Peter Xu <pet...@redhat.com>
> ---
> v2:
> - enhanced commit message
> ---
>  monitor.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 51f4cf480f..39f8ee17ba 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4467,7 +4467,7 @@ static void monitor_iothread_init(void)
>       * have assumption to be run on main loop thread.  It would be
>       * nice that one day we can remove this assumption in the future.
>       */
> -    mon_global.qmp_dispatcher_bh = aio_bh_new(qemu_get_aio_context(),
> +    mon_global.qmp_dispatcher_bh = aio_bh_new(iohandler_get_aio_context(),
>                                                monitor_qmp_bh_dispatcher,
>                                                NULL);

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>

Attachment: signature.asc
Description: PGP signature

Reply via email to