Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: > Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > > > > > > Il gio 20 apr 2023, 08:11 Markus Armbruster <arm...@redhat.com > > <mailto:arm...@redhat.com>> ha scritto: > > > > So, splicing in a bottom half unmoored monitor commands from the main > > loop. We weren't aware of that, as our commit messages show. > > > > I guess the commands themselves don't care; all they need is the BQL. > > > > However, did we unwittingly change what can get blocked? Before, > > monitor commands could block only the main thread. Now they can also > > block vCPU threads. Impact? > > > > > > Monitor commands could always block vCPU threads through the BQL(*). > > However, aio_poll() only runs in the vCPU threads in very special cases; > > typically associated to resetting a device which causes a blk_drain() on > > the device's BlockBackend. So it is not a performance issue. > > > > AFAIU, all generated coroutine wrappers use aio_poll. In my backtrace > aio_poll happens via blk_pwrite for a pflash device. So a bit more > often than "very special cases" ;)
Yes, it's a common thing for devices that start requests from the vcpu thread when handling I/O (as opposed to devices that use an eventfd or similar mechanisms). > > However, liberal reuse of the main block layer AioContext could indeed > > be a *correctness* issue. I need to re-read Fiona's report instead of > > stopping at the first three lines because it's the evening. :) > > For me, being called in a vCPU thread caused problems with a custom QMP > function patched in by Proxmox. The function uses a newly opened > BlockBackend and calls qemu_mutex_unlock_iothread() after which > qemu_get_current_aio_context() returns 0x0 (when running in the main > thread, it still returns the main thread's AioContext). It then calls > blk_pwritev which is also a generated coroutine wrapper and the > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch of the AIO_WAIT_WHILE_INTERNAL macro fails. > > Sounds like there's room for improvement in our code :/ I'm not aware > of something similar in upstream QEMU. Yes, even if it didn't crash immediately, calling blk_*() without holding a lock is invalid. In many cases, this is the BQL. If you don't hold it while calling the function from a vcpu thread, you could run into races with the main thread, which would probably be very painful to debug. Kevin