(Not reproducible locally) On Thu, Sep 20, 2018 at 7:16 AM Frank Yang <l...@google.com> wrote:
> I have added more logging code and it seems that there is a hang that > happens with 4096 MB RAM on Mac in virtio_blk_handle_vq: > > #define VIRTIO_BLK_UNUSUAL_ITER_COUNT 1024 > > bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) > { > VirtIOBlockReq *req; > MultiReqBuffer mrb = {}; > bool progress = false; > uint32_t all_iters = 0; > uint32_t progress_iters = 0; > > aio_context_acquire(blk_get_aio_context(s->blk)); > blk_io_plug(s->blk); > > do { > ++all_iters; > > virtio_queue_set_notification(vq, 0); > > while ((req = virtio_blk_get_request(s, vq))) { > progress = true; > ++progress_iters; > if (virtio_blk_handle_request(req, &mrb)) { > virtqueue_detach_element(req->vq, &req->elem, 0); > virtio_blk_free_request(req); > break; > } > > qemu_spin_warning( > progress_iters, > VIRTIO_BLK_UNUSUAL_ITER_COUNT, > "Warning: virtio_blk_handle_vq spun %u times with > progress.\n", > progress_iters); > } > > qemu_spin_warning( > all_iters, > VIRTIO_BLK_UNUSUAL_ITER_COUNT, > "Warning: virtio_blk_handle_vq spun %u times total.\n", > <----------------------------this printed > all_iters); > > virtio_queue_set_notification(vq, 1); > } while (!virtio_queue_empty(vq)); > <--------------------------------makes me think virtio queue is corrupted > > if (mrb.num_reqs) { > virtio_blk_submit_multireq(s->blk, &mrb); > } > > blk_io_unplug(s->blk); > aio_context_release(blk_get_aio_context(s->blk)); > return progress; > } > > > On Tue, Sep 18, 2018 at 11:57 AM Frank Yang <l...@google.com> wrote: > >> We also only get those reports from users with 4G RAM configured, so it >> could also have to do with overflow. >> >> On Tue, Sep 18, 2018 at 11:57 AM Frank Yang <l...@google.com> wrote: >> >>> That seems to be the case, since our 15 second detector is reset if the >>> main loop runs its timers again, so no main loop iterations happened since >>> that aio_dispatch_handlers call (we use a looper abstraction for it). >>> >>> On Tue, Sep 18, 2018 at 8:56 AM Paolo Bonzini <pbonz...@redhat.com> >>> wrote: >>> >>>> On 15/09/2018 20:41, Frank Yang via Qemu-devel wrote: >>>> > We have not reproduced this hang so far, this is from user crash >>>> reports >>>> > that triggered our hang detector (where 15+ seconds pass without main >>>> loop >>>> > / VCPU threads being able to go back and ping their loopers in main >>>> loop / >>>> > vcpu threads. >>>> > >>>> > 0x00000001024e9fcb(qemu-system-x86_64 -exec.c:511)flatview_translate >>>> > 0x00000001024f2390(qemu-system-x86_64 >>>> > -memory.h:1865)address_space_lduw_internal_cached >>>> > 0x000000010246ff11(qemu-system-x86_64 >>>> > -virtio-access.h:166)virtio_queue_set_notification >>>> > 0x00000001024fa2c9(qemu-system-x86_64+ 0x000a72c9)virtio_blk_handle_vq >>>> > 0x00000001024746ee(qemu-system-x86_64 >>>> > -virtio.c:1521)virtio_queue_host_notifier_aio_read >>>> > 0x0000000103a5ed8a(qemu-system-x86_64 >>>> -aio-posix.c:406)aio_dispatch_handlers >>>> > 0x0000000103a5ecc8(qemu-system-x86_64 -aio-posix.c:437)aio_dispatch >>>> > 0x0000000103a5c158(qemu-system-x86_64 -async.c:261)aio_ctx_dispatch >>>> > 0x0000000103a92103(qemu-system-x86_64 >>>> -gmain.c:3072)g_main_context_dispatch >>>> > 0x0000000103a5e4ad(qemu-system-x86_64 -main-loop.c:224)main_loop_wait >>>> > 0x0000000102468ab8(qemu-system-x86_64 -vl.c:2172)main_impl >>>> > 0x0000000102461a3a(qemu-system-x86_64 -vl.c:3332)run_qemu_main >>>> > 0x000000010246eef3(qemu-system-x86_64 >>>> > -main.cpp:577)enter_qemu_main_loop(int, char**) >>>> > 0x00000001062b63a9(libQt5Core.5.dylib >>>> > -qthread_unix.cpp:344)QThreadPrivate::start(void*) >>>> > 0x00007fff65118660 >>>> > 0x00007fff6511850c >>>> > 0x00007fff65117bf8 >>>> > 0x00000001062b623f(libQt5Core.5.dylib+ 0x0002623f) >>>> >>>> To be clear, is aio_dispatch_handlers running for 15+ seconds? >>>> >>>> None of the patches you point out are related however. >>>> >>>> Paolo >>>> >>>