I have added more logging code and it seems that there is a hang that happens with 4096 MB RAM on Mac in virtio_blk_handle_vq:
#define VIRTIO_BLK_UNUSUAL_ITER_COUNT 1024 bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) { VirtIOBlockReq *req; MultiReqBuffer mrb = {}; bool progress = false; uint32_t all_iters = 0; uint32_t progress_iters = 0; aio_context_acquire(blk_get_aio_context(s->blk)); blk_io_plug(s->blk); do { ++all_iters; virtio_queue_set_notification(vq, 0); while ((req = virtio_blk_get_request(s, vq))) { progress = true; ++progress_iters; if (virtio_blk_handle_request(req, &mrb)) { virtqueue_detach_element(req->vq, &req->elem, 0); virtio_blk_free_request(req); break; } qemu_spin_warning( progress_iters, VIRTIO_BLK_UNUSUAL_ITER_COUNT, "Warning: virtio_blk_handle_vq spun %u times with progress.\n", progress_iters); } qemu_spin_warning( all_iters, VIRTIO_BLK_UNUSUAL_ITER_COUNT, "Warning: virtio_blk_handle_vq spun %u times total.\n", <----------------------------this printed all_iters); virtio_queue_set_notification(vq, 1); } while (!virtio_queue_empty(vq)); <--------------------------------makes me think virtio queue is corrupted if (mrb.num_reqs) { virtio_blk_submit_multireq(s->blk, &mrb); } blk_io_unplug(s->blk); aio_context_release(blk_get_aio_context(s->blk)); return progress; } On Tue, Sep 18, 2018 at 11:57 AM Frank Yang <l...@google.com> wrote: > We also only get those reports from users with 4G RAM configured, so it > could also have to do with overflow. > > On Tue, Sep 18, 2018 at 11:57 AM Frank Yang <l...@google.com> wrote: > >> That seems to be the case, since our 15 second detector is reset if the >> main loop runs its timers again, so no main loop iterations happened since >> that aio_dispatch_handlers call (we use a looper abstraction for it). >> >> On Tue, Sep 18, 2018 at 8:56 AM Paolo Bonzini <pbonz...@redhat.com> >> wrote: >> >>> On 15/09/2018 20:41, Frank Yang via Qemu-devel wrote: >>> > We have not reproduced this hang so far, this is from user crash >>> reports >>> > that triggered our hang detector (where 15+ seconds pass without main >>> loop >>> > / VCPU threads being able to go back and ping their loopers in main >>> loop / >>> > vcpu threads. >>> > >>> > 0x00000001024e9fcb(qemu-system-x86_64 -exec.c:511)flatview_translate >>> > 0x00000001024f2390(qemu-system-x86_64 >>> > -memory.h:1865)address_space_lduw_internal_cached >>> > 0x000000010246ff11(qemu-system-x86_64 >>> > -virtio-access.h:166)virtio_queue_set_notification >>> > 0x00000001024fa2c9(qemu-system-x86_64+ 0x000a72c9)virtio_blk_handle_vq >>> > 0x00000001024746ee(qemu-system-x86_64 >>> > -virtio.c:1521)virtio_queue_host_notifier_aio_read >>> > 0x0000000103a5ed8a(qemu-system-x86_64 >>> -aio-posix.c:406)aio_dispatch_handlers >>> > 0x0000000103a5ecc8(qemu-system-x86_64 -aio-posix.c:437)aio_dispatch >>> > 0x0000000103a5c158(qemu-system-x86_64 -async.c:261)aio_ctx_dispatch >>> > 0x0000000103a92103(qemu-system-x86_64 >>> -gmain.c:3072)g_main_context_dispatch >>> > 0x0000000103a5e4ad(qemu-system-x86_64 -main-loop.c:224)main_loop_wait >>> > 0x0000000102468ab8(qemu-system-x86_64 -vl.c:2172)main_impl >>> > 0x0000000102461a3a(qemu-system-x86_64 -vl.c:3332)run_qemu_main >>> > 0x000000010246eef3(qemu-system-x86_64 >>> > -main.cpp:577)enter_qemu_main_loop(int, char**) >>> > 0x00000001062b63a9(libQt5Core.5.dylib >>> > -qthread_unix.cpp:344)QThreadPrivate::start(void*) >>> > 0x00007fff65118660 >>> > 0x00007fff6511850c >>> > 0x00007fff65117bf8 >>> > 0x00000001062b623f(libQt5Core.5.dylib+ 0x0002623f) >>> >>> To be clear, is aio_dispatch_handlers running for 15+ seconds? >>> >>> None of the patches you point out are related however. >>> >>> Paolo >>> >>