On Tue, 09/11 17:30, Paolo Bonzini wrote: > On 11/09/2018 16:12, Fam Zheng wrote: > > On Tue, 09/11 13:32, Paolo Bonzini wrote: > >> On 10/09/2018 16:56, Fam Zheng wrote: > >>> We have this unwanted call stack: > >>> > >>> > ... > >>> > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq > >>> > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd > >>> > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq > >>> > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll > >>> > #17 0x00005586607885da in run_poll_handlers_once > >>> > #18 0x000055866078880e in try_poll_mode > >>> > #19 0x00005586607888eb in aio_poll > >>> > #20 0x0000558660784561 in aio_wait_bh_oneshot > >>> > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop > >>> > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd > >>> > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd > >>> > #24 0x00005586605ab808 in virtio_pci_common_write > >>> > #25 0x0000558660242396 in memory_region_write_accessor > >>> > #26 0x00005586602425ab in access_with_adjusted_size > >>> > #27 0x0000558660245281 in memory_region_dispatch_write > >>> > #28 0x00005586601e008e in flatview_write_continue > >>> > #29 0x00005586601e01d8 in flatview_write > >>> > #30 0x00005586601e04de in address_space_write > >>> > #31 0x00005586601e052f in address_space_rw > >>> > #32 0x00005586602607f2 in kvm_cpu_exec > >>> > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn > >>> > #34 0x000055866078bde7 in qemu_thread_start > >>> > #35 0x00007f5784906594 in start_thread > >>> > #36 0x00007f5784639e6f in clone > >>> > >>> Avoid it with the aio_disable_external/aio_enable_external pair, so that > >>> no vq poll handlers can be called in aio_wait_bh_oneshot. > >> > >> I don't understand. We are in the vCPU thread, so not in the > >> AioContext's home thread. Why is aio_wait_bh_oneshot polling rather > >> than going through the aio_wait_bh path? > > > > What do you mean by 'aio_wait_bh path'? Here is aio_wait_bh_oneshot: > > Sorry, I meant the "atomic_inc(&wait_->num_waiters);" path. But if this > backtrace is obtained without dataplane, that's the answer I was seeking. > > > void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque) > > { > > AioWaitBHData data = { > > .cb = cb, > > .opaque = opaque, > > }; > > > > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > > > > aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data); > > AIO_WAIT_WHILE(&data.wait, ctx, !data.done); > > } > > > > ctx is qemu_aio_context here, so there's no interaction with IOThread. > > In this case, it should be okay to have the reentrancy, what is the bug > that this patch is fixing?
The same symptom as in the previous patch: virtio_scsi_handle_cmd_vq hangs. The reason it hangs is fixed by the previous patch, but I don't think it should be invoked as we're in the middle of virtio_scsi_dataplane_stop(). Applying either one of the two patches avoids the problem, but this one is more superficial. What do you think? Fam