On 2020-09-18 19:23, Zhenyu Ye wrote: > Thread 5 (LWP 4802): > #0 0x0000ffff83086b54 in syscall () at /lib64/libc.so.6 > #1 0x0000ffff834598b8 in io_submit () at /lib64/libaio.so.1 > #2 0x0000aaaae851e89c in ioq_submit (s=0xfffd3c001bb0) at > ../block/linux-aio.c:299 > #3 0x0000aaaae851eb50 in laio_io_unplug (bs=0xaaaaef0f2340, > s=0xfffd3c001bb0) > at ../block/linux-aio.c:344 > #4 0x0000aaaae8559f1c in raw_aio_unplug (bs=0xaaaaef0f2340) at > ../block/file-posix.c:2063 > #5 0x0000aaaae8538344 in bdrv_io_unplug (bs=0xaaaaef0f2340) at > ../block/io.c:3135 > #6 0x0000aaaae8538360 in bdrv_io_unplug (bs=0xaaaaef0eb020) at > ../block/io.c:3140 > #7 0x0000aaaae8496104 in blk_io_unplug (blk=0xaaaaef0e8f20) > at ../block/block-backend.c:2147 > #8 0x0000aaaae830e1a4 in virtio_blk_handle_vq (s=0xaaaaf0374280, > vq=0xffff700fc1d8) > at ../hw/block/virtio-blk.c:796 > #9 0x0000aaaae82e6b68 in virtio_blk_data_plane_handle_output > (vdev=0xaaaaf0374280, vq=0xffff700fc1d8) at > ../hw/block/dataplane/virtio-blk.c:165 > #10 0x0000aaaae83878fc in virtio_queue_notify_aio_vq (vq=0xffff700fc1d8) > at ../hw/virtio/virtio.c:2325 > #11 0x0000aaaae838ab50 in virtio_queue_host_notifier_aio_poll > (opaque=0xffff700fc250) > at ../hw/virtio/virtio.c:3545 > #12 0x0000aaaae85fab3c in run_poll_handlers_once > (ctx=0xaaaaef0a87b0, now=77604310618960, timeout=0xffff73ffdf78) > at ../util/aio-posix.c:398 > #13 0x0000aaaae85fae5c in run_poll_handlers > (ctx=0xaaaaef0a87b0, max_ns=4000, timeout=0xffff73ffdf78) at > ../util/aio-posix.c:492 > #14 0x0000aaaae85fb078 in try_poll_mode (ctx=0xaaaaef0a87b0, > timeout=0xffff73ffdf78) > at ../util/aio-posix.c:535 > #15 0x0000aaaae85fb180 in aio_poll (ctx=0xaaaaef0a87b0, blocking=true) > at ../util/aio-posix.c:571 > #16 0x0000aaaae8027004 in iothread_run (opaque=0xaaaaeee79a00) at > ../iothread.c:73 > #17 0x0000aaaae85f269c in qemu_thread_start (args=0xaaaaef0a8d10) > at ../util/qemu-thread-posix.c:521 > #18 0x0000ffff831428bc in () at /lib64/libpthread.so.0 > #19 0x0000ffff8308aa1c in () at /lib64/libc.so.6
I can see how blocking in a slow io_submit can cause trouble for main thread. I think one way to fix it (until it's made truly async in new kernels) is moving the io_submit call to thread pool, and wrapped in a coroutine, perhaps. I'm not sure qmp timeout is a complete solution because we would still suffer from a blocked state for a period, in this exact situation before the timeout. Fam