On Thu, May 22, 2025 at 10:03 AM Akihiko Odaki <akihiko.od...@daynix.com> wrote:
>
> On 2025/05/22 15:45, Alex Bennée wrote:
> > Akihiko Odaki <akihiko.od...@daynix.com> writes:
> >
> >> On 2025/05/22 1:42, Alex Bennée wrote:
> >>> From: Manos Pitsidianakis <manos.pitsidiana...@linaro.org>
> >>> This commit fixes an indefinite hang when using VIRTIO GPU blob
> >>> objects
> >>> under TCG in certain conditions.
> >>> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
> >>> MemoryRegion and attaches it to an offset on a PCI BAR of the
> >>> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
> >>> it.
> >>> Because virglrenderer commands are not thread-safe they are only
> >>> called on the main context and QEMU performs the cleanup in three steps
> >>> to prevent a use-after-free scenario where the guest can access the
> >>> region after it’s unmapped:
> >>> 1. From the main context, the region’s field finish_unmapping is
> >>> false
> >>>      by default, so it sets a variable cmd_suspended, increases the
> >>>      renderer_blocked variable, deletes the blob subregion, and unparents
> >>>      the blob subregion causing its reference count to decrement.
> >>> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
> >>>      recalculated, the free callback of the blob region
> >>>      virtio_gpu_virgl_hostmem_region_free is called which sets the
> >>>      region’s field finish_unmapping to true, allowing the main thread
> >>>      context to finish replying to the command
> >>> 3. From the main context, the command is processed again, but this
> >>> time
> >>>      finish_unmapping is true, so virgl_renderer_resource_unmap can be
> >>>      called and a response is sent to the guest.
> >>> It happens so that under TCG, if the guest has no timers configured
> >>> (and
> >>> thus no interrupt will cause the CPU to exit), the RCU thread does not
> >>> have enough time to grab the locks and recalculate the FlatView.
> >>> That’s not a big problem in practice since most guests will assume a
> >>> response will happen later in time and go on to do different things,
> >>> potentially triggering interrupts and allowing the RCU context to run.
> >>> If the guest waits for the unmap command to complete though, it blocks
> >>> indefinitely. Attaching to the QEMU monitor and force quitting the guest
> >>> allows the cleanup to continue.
> >>> There's no reason why the FlatView recalculation can't occur right
> >>> away
> >>> when we delete the blob subregion, however. It does not, because when we
> >>> create the subregion we set the object as its own parent:
> >>>       memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
> >>> The extra reference is what prevents freeing the memory region
> >>> object in
> >>> the memory transaction of deleting the subregion.
> >>> This commit changes the owner object to the device, which removes
> >>> the
> >>> extra owner reference in the memory region and causes the MR to be
> >>> freed right away in the main context.
> >>> Acked-by: Michael S. Tsirkin <m...@redhat.com>
> >>> Signed-off-by: Manos Pitsidianakis <manos.pitsidiana...@linaro.org>
> >>> Reviewed-by: Alex Bennée <alex.ben...@linaro.org>
> >>> Tested-by: Alex Bennée <alex.ben...@linaro.org>
> >>> Message-Id: <20250410122643.1747913-3-manos.pitsidiana...@linaro.org>
> >>> Cc: qemu-sta...@nongnu.org
> >>> ---
> >>>    hw/display/virtio-gpu-virgl.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>> diff --git a/hw/display/virtio-gpu-virgl.c
> >>> b/hw/display/virtio-gpu-virgl.c
> >>> index 71a7500de9..8fbe4e70cc 100644
> >>> --- a/hw/display/virtio-gpu-virgl.c
> >>> +++ b/hw/display/virtio-gpu-virgl.c
> >>> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
> >>>        vmr->g = g;
> >>>        mr = g_new0(MemoryRegion, 1);
> >>>    -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size,
> >>> data);
> >>> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
> >>>        memory_region_add_subregion(&b->hostmem, offset, mr);
> >>>        memory_region_set_enabled(mr, true);
> >>>
> >>
> >> I suggest dropping this patch for now due to the reason I pointed out
> >> for the first version of this series.
> >
> > This fixes an actual bug - without it we get a hang.
> >
>
> I understand that but it also introduces a regression; "[PATCH v3 14/20]
> ui/gtk-gl-area: Remove extra draw call in refresh" is also a similar case.
>
> Ideally such a bug should be fixed without regression, but I understand
> it is sometimes difficult to do that and postponing the bug resolution
> until figuring out the correct way does not make sense.
>
> In such a case, a bug should be fixed minimizing the regression and the
> documentation of the regression should be left in the code.
>
> In particular, this patch can cause use-after-free whether TCG is used
> or not. Instead, I suggest to avoid freeing memory regions at all on
> TCG. It will surely leak memory, but won't result in use-after-free at
> least and the other accelerators will be unaffected.
>
> Regards,
> Akihiko Odaki

We tested this fix with ASAN and didn't see anything. Do you have a
test case in mind that can reproduce this use-after-free? It'd help
make a certain decision on whether to drop this patch or not. I'm not
doubting that this can cause a use-after-free by the way, it's just
that it is hypothetical only. If it causes a use-after-free for sure
we should definitely drop it.

> Instead, I suggest to avoid freeing memory regions at all on
> TCG. It will surely leak memory, but won't result in use-after-free at
> least and the other accelerators will be unaffected.

Leaking memory for blob objects is also not ideal, since they are
frequently allocated. It's memory-safe but the leak can accumulate
over time.

-- 
Manos Pitsidianakis
Emulation and Virtualization Engineer at Linaro Ltd

Reply via email to