Re: [RFC PATCH 13/13] HACK: schedule fence return on main AIO context

Stefan Hajnoczi Tue, 25 Apr 2023 05:05:12 -0700

On Fri, 21 Apr 2023 at 19:21, Gurchetan Singh
<gurchetansi...@chromium.org> wrote:
>
> On Fri, Apr 21, 2023 at 9:00 AM Stefan Hajnoczi <stefa...@gmail.com> wrote:
> >
> > On Thu, 20 Apr 2023 at 21:13, Gurchetan Singh
> > <gurchetansi...@chromium.org> wrote:
> > >
> > > gfxstream and both cross-domain (and even newer versions
> > > virglrenderer: see VIRGL_RENDERER_ASYNC_FENCE_CB) like to signal
> > > fence completion on threads ("callback threads") that are
> > > different from the thread that processes the command queue
> > > ("main thread").
> > >
> > > This is generally possible with locking, and this what we do
> > > in crosvm and other virtio-gpu1.1 implementations.  However, on
> > > QEMU a deadlock is observed if virtio_gpu_ctrl_response_nodata(..)
> > > [used in the fence callback] is used from a thread that is not the
> > > main thread.
> > >
> > > The reason is the main thread takes the big QEMU lock (bql) somewhere
> > > when processing the command queue, and virtio_gpu_ctrl_response_nodata(..)
> > > needs that lock.  If you add in the lock needed to protect &g->fenceq
> > > from concurrent access by the main thread and the callback threads,
> > > you end can end up with deadlocks.
> > >
> > > It's possible to workaround this by scheduling the return of the fence
> > > descriptors via aio_bh_schedule_oneshot_full(..), but that somewhat
> > > negates the rationale for the asynchronous callbacks.
> > >
> > > I also played around with aio_context_acquire()/aio_context_release(),
> > > doesn't seem to help.
> > >
> > > Is signaling the virtio_queue outside of the main thread possible?  If
> > > so, how?
> >
> > Yes, if you want a specific thread to process a virtqueue, monitor the
> > virtqueue's host_notifier (aka ioeventfd) from the thread. That's how
> > --device virtio-blk,iothread= works. It attaches the host_notifier to
> > the IOThread's AioContext. Whenever the guest kicks the virtqueue, the
> > IOThread will respond instead of QEMU's main loop thread.
> >
> > That said, I don't know the threading model of your code. Could you
> > explain which threads are involved? Do you want to process all buffers
> > from virtqueue in a specific thread or only these fence buffers?
>
> Only the fence callback would be signalled via these callback threads.
> The virtqueue would not be processed by the callback thread.
>
> There can be multiple callback threads: for example, one for the
> gfxstream context and another for the Wayland context.  These threads
> could be a C++ thread, a Rust thread or any other.
>
> The strategy used by crosvm is to have a mutex around the fence state
> to synchronize between multiple callback threads (which signal fences)
> and main thread (which generates fences).
>
> I tried to use aio_context_acquire(..)/aio_context_release(..) to
> synchronize these threads, but that results in a deadlock.  I think
> those functions may assume an IOThread and not necessarily any thread.
> It seems aio_context_acquire(..) succeeds for the callback thread
> though.
>
> Here's what tried (rather than this patch which works, but is less
> ideal than the solution below):


I don't have time to study the code and understand the threading
model, but one thing stood out:

> Callback Thread deadlocked with above patch:
>
> [Switching to thread 56 (Thread 0x7ffd2c9ff640 (LWP 8649))]
> #0  futex_wait (private=0, expected=2, futex_word=0x5555569893a0
> <qemu_global_mutex>) at ../sysdeps/nptl/futex-internal.h:146
> 146 in ../sysdeps/nptl/futex-internal.h
> (gdb) bt
> #0  futex_wait (private=0, expected=2, futex_word=0x5555569893a0
> <qemu_global_mutex>) at ../sysdeps/nptl/futex-internal.h:146
> #1  __GI___lll_lock_wait (futex=futex@entry=0x5555569893a0
> <qemu_global_mutex>, private=0) at ./nptl/lowlevellock.c:49
> #2  0x00007ffff7098082 in lll_mutex_lock_optimized
> (mutex=0x5555569893a0 <qemu_global_mutex>) at
> ./nptl/pthread_mutex_lock.c:48
> #3  ___pthread_mutex_lock (mutex=0x5555569893a0 <qemu_global_mutex>)
> at ./nptl/pthread_mutex_lock.c:93
> #4  0x00005555560019da in qemu_mutex_lock_impl (mutex=0x5555569893a0
> <qemu_global_mutex>, file=0x5555561df60c "../softmmu/physmem.c",
> line=2581) at ../util/qemu-thread-posix.c:94
> #5  0x0000555555afdf2d in qemu_mutex_lock_iothread_impl
> (file=0x5555561df60c "../softmmu/physmem.c", line=2581) at
> ../softmmu/cpus.c:504
> #6  0x0000555555d69a1a in prepare_mmio_access (mr=0x555556c10ca0) at
> ../softmmu/physmem.c:2581
> #7  0x0000555555d6b7a8 in address_space_stl_internal
> (as=0x5555578270f8, addr=4276125700, val=33, attrs=..., result=0x0,
> endian=DEVICE_LITTLE_ENDIAN) at
> /home/lenovo/qemu/memory_ldst.c.inc:318
> #8  0x0000555555d6b91d in address_space_stl_le (as=0x5555578270f8,
> addr=4276125700, val=33, attrs=..., result=0x0) at
> /home/lenovo/qemu/memory_ldst.c.inc:357
> #9  0x0000555555a0657c in pci_msi_trigger (dev=0x555557826ec0,
> msg=...) at ../hw/pci/pci.c:356
> #10 0x0000555555a02a5a in msi_send_message (dev=0x555557826ec0,
> msg=...) at ../hw/pci/msi.c:379
> #11 0x0000555555a045a0 in msix_notify (dev=0x555557826ec0, vector=1)
> at ../hw/pci/msix.c:542
> #12 0x0000555555ac8780 in virtio_pci_notify (d=0x555557826ec0,
> vector=1) at ../hw/virtio/virtio-pci.c:77
> #13 0x0000555555d15ddc in virtio_notify_vector (vdev=0x55555783ffd0,
> vector=1) at ../hw/virtio/virtio.c:1985
> #14 0x0000555555d17393 in virtio_irq (vq=0x555558716d80) at
> ../hw/virtio/virtio.c:2461
> #15 0x0000555555d17439 in virtio_notify (vdev=0x55555783ffd0,
> vq=0x555558716d80) at ../hw/virtio/virtio.c:2473
> #16 0x0000555555b98732 in virtio_gpu_ctrl_response (g=0x55555783ffd0,
> cmd=0x7ffdd80068b0, resp=0x7ffd2c9fe150, resp_len=24) at
> ../hw/display/virtio-gpu.c:177
> #17 0x0000555555b9879c in virtio_gpu_ctrl_response_nodata
> (g=0x55555783ffd0, cmd=0x7ffdd80068b0, type=VIRTIO_GPU_RESP_OK_NODATA)
> at ../hw/display/virtio-gpu.c:189
> #18 0x0000555555ba6b82 in virtio_gpu_rutabaga_fence_cb
> (user_data=93825028849616, fence_data=...) at
> ../hw/display/virtio-gpu-rutabaga.c:860
> #19 0x00007ffff75b9381 in
> _ZN131_$LT$rutabaga_gfx..rutabaga_utils..RutabagaFenceClosure$LT$T$GT$$u20$as$u20$rutabaga_gfx..rutabaga_utils..RutabagaFenceCallback$GT$4call17hcbf1b4ac094a2a60E.llvm.14356420492591683714
> () at /usr/local/lib/librutabaga_gfx_ffi.so
> #20 0x00007ffff75d501b in
> rutabaga_gfx::cross_domain::cross_domain::CrossDomainWorker::run () at
> /usr/local/lib/librutabaga_gfx_ffi.so
> #21 0x00007ffff75dd651 in
> std::sys_common::backtrace::__rust_begin_short_backtrace () at
> /usr/local/lib/librutabaga_gfx_ffi.so
> #22 0x00007ffff75c62ba in
> core::ops::function::FnOnce::call_once{{vtable.shim}} () at
> /usr/local/lib/librutabaga_gfx_ffi.so
> #23 0x00007ffff75e3b73 in alloc::boxed::{impl#44}::call_once<(), dyn
> core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>
> (self=..., args=()) at library/alloc/src/boxed.rs:1940
> #24 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn
> core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>,
> alloc::alloc::Global> (self=0x5555572e27d0, args=())
>     at library/alloc/src/boxed.rs:1940
> #25 std::sys::unix::thread::{impl#2}::new::thread_start
> (main=0x5555572e27d0) at library/std/src/sys/unix/thread.rs:108
> #26 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at
> ./nptl/pthread_create.c:442
> #27 0x00007ffff7126a00 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This Rust thread is calling QEMU emulation code that is not designed
to run outside the big QEMU lock (virtio_notify()).

I think prepare_mmio_access() is deadlocking on the big QEMU lock.
Another thread must be holding it (maybe waiting for this thread?) and
therefore the hang occurs.

Also, there is some additional setup like rcu_register_thread() that
may also be missing in this Rust thread.

The conservative approach is to only call QEMU emulation functions
like virtio_notify() from a QEMU thread. Some parts of QEMU are
thread-safe and don't need the big QEMU lock, but that requires
carefully calling only safe functions and at first glance I guess this
patch series isn't doing that.

Stefan

Re: [RFC PATCH 13/13] HACK: schedule fence return on main AIO context

Reply via email to