Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
On Sat, 3 Feb 2024 at 06:30, Michael Tokarev wrote: > > 03.02.2024 12:01, Michael Tokarev wrote: > ... > > This change broke something in 7.2. I'm still debugging it, will > > come with a follow-up once some more details are found, I'll also > > check current master with and without this commit. > > > > The prob happens with multiple suspend-resume cycles, - with this > > change applied, guest does not work as expected after *second* > > suspend-resume. > > So, it turned out the prob here exists on master too, and manifests > itself the same way on 7.2.9 or on 8.2.1, - in all cases where we > have this change applied it works (or breaks) equally. > > A (simple) reproducer so far is a hibernate test, - it fails *only* > after suspend-to-ram, but works fine after just hibernate. > > I used just an initrd (with a drive image used for swap - > for hibernation space). > > qemu-img create s.img 256M > mkswap s.img > qemu-system-x86_64 \ >-serial stdio -vga none -display none -parallel none -net none \ >-machine q35 \ >-drive file=s.img,if=ide,format=raw \ >-m 256 \ >-monitor unix:ttyS0,server,nowait \ >-kernel /boot/vmlinuz-6.1.0-15-amd64 \ >-initrd /boot/initrd.img-6.1.0-15-amd64 \ >-append "shell=/bin/sh console=ttyS0 root=none" > > There, in the guest (it has busybox only here): > # swapon /dev/sda > # echo mem > /sys/power/state > (system_wakeup on the monitor) > # echo disk > /sys/power/state > > The system will hibernate but *not* turn off power, qemu > will continue running, while all console messages are the > same as when it works fine. qemu process is spinning up > with 100% cpu usage at this stage. > > Without the intermediate suspend-to-ram or without the > commit in question, qemu process will exit normally at > this stage. > > This is a somewhat patalogical test case, but I see it as an > indicator of something else being wrong, like we aren't saving > or restoring some state now which we should do. > > The tight loop also suggests we're not having success in there. I'm unable to reproduce this. QEMU v8.2.0, v8.1.0, and even v7.2.0 all spin with 100% CPU usage on my machine. It looks to me like this behavior is not a regression caused by this commit. Do v7.2.0, v8.1.0, and v8.2.0 terminate QEMU on your machine? Stefan
Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
03.02.2024 12:01, Michael Tokarev wrote: ... This change broke something in 7.2. I'm still debugging it, will come with a follow-up once some more details are found, I'll also check current master with and without this commit. The prob happens with multiple suspend-resume cycles, - with this change applied, guest does not work as expected after *second* suspend-resume. So, it turned out the prob here exists on master too, and manifests itself the same way on 7.2.9 or on 8.2.1, - in all cases where we have this change applied it works (or breaks) equally. A (simple) reproducer so far is a hibernate test, - it fails *only* after suspend-to-ram, but works fine after just hibernate. I used just an initrd (with a drive image used for swap - for hibernation space). qemu-img create s.img 256M mkswap s.img qemu-system-x86_64 \ -serial stdio -vga none -display none -parallel none -net none \ -machine q35 \ -drive file=s.img,if=ide,format=raw \ -m 256 \ -monitor unix:ttyS0,server,nowait \ -kernel /boot/vmlinuz-6.1.0-15-amd64 \ -initrd /boot/initrd.img-6.1.0-15-amd64 \ -append "shell=/bin/sh console=ttyS0 root=none" There, in the guest (it has busybox only here): # swapon /dev/sda # echo mem > /sys/power/state (system_wakeup on the monitor) # echo disk > /sys/power/state The system will hibernate but *not* turn off power, qemu will continue running, while all console messages are the same as when it works fine. qemu process is spinning up with 100% cpu usage at this stage. Without the intermediate suspend-to-ram or without the commit in question, qemu process will exit normally at this stage. This is a somewhat patalogical test case, but I see it as an indicator of something else being wrong, like we aren't saving or restoring some state now which we should do. The tight loop also suggests we're not having success in there. Thanks, /mjt
Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
16.01.2024 22:00, Stefan Hajnoczi пишет: monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not polled during nested event loops. The coroutine currently reschedules itself in the main loop's qemu_aio_context AioContext, which is polled during nested event loops. One known problem is that QMP device-add calls drain_call_rcu(), which temporarily drops the BQL, leading to all sorts of havoc like other vCPU threads re-entering device emulation code while another vCPU thread is waiting in device emulation code with aio_poll(). Paolo Bonzini suggested running non-coroutine QMP handlers in the iohandler AioContext. This avoids trouble with nested event loops. His original idea was to move coroutine rescheduling to monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch() because we don't know if the QMP handler needs to run in coroutine context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have been nicer since it's associated with the monitor implementation and not as general as qmp_dispatch(), which is also used by qemu-ga. A number of qemu-iotests need updated .out files because the order of QMP events vs QMP responses has changed. Solves Issue #1933. Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add") Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985 Buglink: https://issues.redhat.com/browse/RHEL-17369 Signed-off-by: Stefan Hajnoczi This change broke something in 7.2. I'm still debugging it, will come with a follow-up once some more details are found, I'll also check current master with and without this commit. The prob happens with multiple suspend-resume cycles, - with this change applied, guest does not work as expected after *second* suspend-resume. JFYI for now. Thanks, /mjt
Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
Am 16.01.24 um 20:00 schrieb Stefan Hajnoczi: > monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not > polled during nested event loops. The coroutine currently reschedules > itself in the main loop's qemu_aio_context AioContext, which is polled > during nested event loops. One known problem is that QMP device-add > calls drain_call_rcu(), which temporarily drops the BQL, leading to all > sorts of havoc like other vCPU threads re-entering device emulation code > while another vCPU thread is waiting in device emulation code with > aio_poll(). > > Paolo Bonzini suggested running non-coroutine QMP handlers in the > iohandler AioContext. This avoids trouble with nested event loops. His > original idea was to move coroutine rescheduling to > monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch() > because we don't know if the QMP handler needs to run in coroutine > context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have > been nicer since it's associated with the monitor implementation and not > as general as qmp_dispatch(), which is also used by qemu-ga. > > A number of qemu-iotests need updated .out files because the order of > QMP events vs QMP responses has changed. > > Solves Issue #1933. > > Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use > drain_call_rcu in in qmp_device_add") > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192 > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985 > Buglink: https://issues.redhat.com/browse/RHEL-17369 > Signed-off-by: Stefan Hajnoczi With the patch I can no longer see any do_qmp_dispatch_bh() calls run in vCPU threads. I also did a bit of smoke testing with some other QMP and QGA commands and didn't find any obvious breakage, so: Tested-by: Fiona Ebner P.S. Unfortunately, the patch does not solve the other issue I came across back then [0] with snapshot_save_job_bh() being executed during a vCPU thread's aio_poll(). See also [1]. I suppose this would need some other mechanism to solve or could it also be scheduled to the iohandler AioContext? It's not directly related to your patch of course, just mentioning it, because it's a similar theme. [0]: https://lore.kernel.org/qemu-devel/31757c45-695d-4408-468c-c2de560af...@proxmox.com/ [1]: https://gitlab.com/qemu-project/qemu/-/issues/2111 Best Regards, Fiona
Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
Am 16.01.2024 um 20:00 hat Stefan Hajnoczi geschrieben: > monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not > polled during nested event loops. The coroutine currently reschedules > itself in the main loop's qemu_aio_context AioContext, which is polled > during nested event loops. One known problem is that QMP device-add > calls drain_call_rcu(), which temporarily drops the BQL, leading to all > sorts of havoc like other vCPU threads re-entering device emulation code > while another vCPU thread is waiting in device emulation code with > aio_poll(). > > Paolo Bonzini suggested running non-coroutine QMP handlers in the > iohandler AioContext. This avoids trouble with nested event loops. His > original idea was to move coroutine rescheduling to > monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch() > because we don't know if the QMP handler needs to run in coroutine > context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have > been nicer since it's associated with the monitor implementation and not > as general as qmp_dispatch(), which is also used by qemu-ga. > > A number of qemu-iotests need updated .out files because the order of > QMP events vs QMP responses has changed. > > Solves Issue #1933. > > Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use > drain_call_rcu in in qmp_device_add") > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192 > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985 > Buglink: https://issues.redhat.com/browse/RHEL-17369 > Signed-off-by: Stefan Hajnoczi Reviewed-by: Kevin Wolf
[PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context
monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not polled during nested event loops. The coroutine currently reschedules itself in the main loop's qemu_aio_context AioContext, which is polled during nested event loops. One known problem is that QMP device-add calls drain_call_rcu(), which temporarily drops the BQL, leading to all sorts of havoc like other vCPU threads re-entering device emulation code while another vCPU thread is waiting in device emulation code with aio_poll(). Paolo Bonzini suggested running non-coroutine QMP handlers in the iohandler AioContext. This avoids trouble with nested event loops. His original idea was to move coroutine rescheduling to monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch() because we don't know if the QMP handler needs to run in coroutine context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have been nicer since it's associated with the monitor implementation and not as general as qmp_dispatch(), which is also used by qemu-ga. A number of qemu-iotests need updated .out files because the order of QMP events vs QMP responses has changed. Solves Issue #1933. Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add") Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985 Buglink: https://issues.redhat.com/browse/RHEL-17369 Signed-off-by: Stefan Hajnoczi --- monitor/qmp.c | 17 qapi/qmp-dispatch.c | 24 +- tests/qemu-iotests/060.out| 4 +- tests/qemu-iotests/071.out| 4 +- tests/qemu-iotests/081.out| 16 ++-- tests/qemu-iotests/087.out| 12 +-- tests/qemu-iotests/108.out| 2 +- tests/qemu-iotests/109| 4 +- tests/qemu-iotests/109.out| 78 --- tests/qemu-iotests/117.out| 2 +- tests/qemu-iotests/120.out| 2 +- tests/qemu-iotests/127.out| 2 +- tests/qemu-iotests/140.out| 2 +- tests/qemu-iotests/143.out| 2 +- tests/qemu-iotests/156.out| 2 +- tests/qemu-iotests/176.out| 16 ++-- tests/qemu-iotests/182.out| 2 +- tests/qemu-iotests/183.out| 4 +- tests/qemu-iotests/184.out| 32 tests/qemu-iotests/185| 6 +- tests/qemu-iotests/185.out| 45 +-- tests/qemu-iotests/191.out| 16 ++-- tests/qemu-iotests/195.out| 16 ++-- tests/qemu-iotests/223.out| 16 ++-- tests/qemu-iotests/227.out| 32 tests/qemu-iotests/247.out| 2 +- tests/qemu-iotests/273.out| 8 +- tests/qemu-iotests/308| 4 +- tests/qemu-iotests/308.out| 4 +- tests/qemu-iotests/tests/file-io-error| 5 +- tests/qemu-iotests/tests/iothreads-resize.out | 2 +- tests/qemu-iotests/tests/qsd-jobs.out | 4 +- 32 files changed, 207 insertions(+), 180 deletions(-) diff --git a/monitor/qmp.c b/monitor/qmp.c index 6eee450fe4..a239945e8d 100644 --- a/monitor/qmp.c +++ b/monitor/qmp.c @@ -321,14 +321,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data) qemu_coroutine_yield(); } -/* - * Move the coroutine from iohandler_ctx to qemu_aio_context for - * executing the command handler so that it can make progress if it - * involves an AIO_WAIT_WHILE(). - */ -aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co); -qemu_coroutine_yield(); - /* Process request */ if (req_obj->req) { if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) { @@ -355,15 +347,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data) } qmp_request_free(req_obj); - -/* - * Yield and reschedule so the main loop stays responsive. - * - * Move back to iohandler_ctx so that nested event loops for - * qemu_aio_context don't start new monitor commands. - */ -aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co); -qemu_coroutine_yield(); } qatomic_set(_dispatcher_co, NULL); } diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c index 28b6bb..176b549473 100644 --- a/qapi/qmp-dispatch.c +++ b/qapi/qmp-dispatch.c @@ -206,9 +206,31 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ assert(!(oob && qemu_in_coroutine())); assert(monitor_cur() == NULL); if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) { +if