On Thu, Aug 21, 2025 at 03:47:06PM +0100, Peter Maydell wrote: > On Thu, 21 Aug 2025 at 15:28, Peter Xu <pet...@redhat.com> wrote: > > > On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.mayd...@linaro.org> > > > wrote: > > > > In memory_region_unref_subregion(), subregion->container is NULL. > > > > > > > > This is because in memory_region_del_subregion() we do: > > > > > > > > subregion->container = NULL; > > > > > > > > and then after that we call > > > > memory_region_unref_subregion(subregion); > > > > which dereferences subregion->container. > > > > > > > > Won't this always SEGV ? > > > Peter, could you try the v3 version patch 8/9 instead? > > > > https://lore.kernel.org/all/20240708-san-v3-8-b03f671c4...@daynix.com/ > > > > I still prefer that one, and I hope that one doesn't have this issue. > > That one fails like this: > qemu-system-arm: ../../system/memory.c:1799: memory_region_finalize: > Assertion `!mr->container' failed. > > See the discussion on v2 (which was the same for this patch): > https://lore.kernel.org/all/cafeaca9ktsjwf1rabpm5nv9ufukqzzk6+qo4pef4+rtirni...@mail.gmail.com/
My apologies, it was a long discussion and I totally forgot that.. I remember I provided a draft somewhere during the discussion, anyway.. I redid it, and attached a formal patch below that will contain the removal of the mr->container check (plus auto-detach when it happens). The hope is this should work.. and comparing to the v8 of Akihiko's, it won't make MR's own refcount any more complicated. If necessary, I can send a formal patch. Thanks, ===8<=== >From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001 From: Peter Xu <pet...@redhat.com> Date: Thu, 21 Aug 2025 12:59:02 -0400 Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references Currently, QEMU refcounts the MR by always taking it from the owner. It should be non-issue if all the owners of MRs will properly detach all the MRs from their parents by memory_region_del_subregion() when the owner will be freed. However, it turns out many of the device emulations do not do that properly. It might be a challenge to fix all of such occurances. Without fixing it, QEMU faces circular reference issue when an owner can contain more than one MRs, then when they are linked within each other in form of subregions, those links keep the owner alive forever, causing memory leaks even if all the external refcounts are released for the owner. To fix that, teach memory API to stop refcount on MRs that share the same owner because if they share the lifecycle of the owner, then they share the same lifecycle between themselves. Meanwhile, allow auto-detachments of MRs during finalize() of MRs even against its container, as long as they belong to the same owner. The latter is needed because now it's possible to have MR finalize() happen in any order when they exactly share the same lifespan. In this case, we should allow finalize() to happen in any order of either the parent or child MR, however it should be guaranteed that the mr->container shares the same owner of this MR to be finalized. Proper document this behavior in code. This patch is heavily based on the work done by Akihiko Odaki: https://lore.kernel.org/r/cafeaca8dv40fgsci76r4yep1p-sp_qjnrdd2ozpxjx5wrs0...@mail.gmail.com Signed-off-by: Peter Xu <pet...@redhat.com> --- docs/devel/memory.rst | 9 +++++++-- system/memory.c | 45 ++++++++++++++++++++++++++++++++++--------- 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst index 57fb2aec76..1367c7caf7 100644 --- a/docs/devel/memory.rst +++ b/docs/devel/memory.rst @@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle. They take effect as soon as the region is made visible. This can be immediately, later, or never. -Destruction of a memory region happens automatically when the owner -object dies. +Destruction of a memory region happens automatically when the owner object +dies. Note that the MRs under the same owner can be destroyed in any order +when the owner object dies. It's because the MRs will share the same +lifespan of the owner, no matter if its a container or child MR. It's +suggested to always cleanly detach the MRs under the owner object when the +owner object is going to be destroyed, however it is not required, as the +memory core will automatically detach the links when MRs are destroyed. If however the memory region is part of a dynamically allocated data structure, you should call object_unparent() to destroy the memory region diff --git a/system/memory.c b/system/memory.c index 5646547940..d7f6ad9be2 100644 --- a/system/memory.c +++ b/system/memory.c @@ -1796,16 +1796,36 @@ static void memory_region_finalize(Object *obj) { MemoryRegion *mr = MEMORY_REGION(obj); - assert(!mr->container); - - /* We know the region is not visible in any address space (it - * does not have a container and cannot be a root either because - * it has no references, so we can blindly clear mr->enabled. - * memory_region_set_enabled instead could trigger a transaction - * and cause an infinite loop. + /* + * Each memory region (that can be dynamically freed..) must has an + * owner, and it always has the same lifecycle of its owner. It means + * when reaching here, the memory region's owner refcount is zero. + * + * Here it is possible that the MR has: + * + * (1) mr->container set, which means this MR can be a subregion of a + * container MR, in this case it must share the same owner + * (otherwise the container should have kept a refcount of this + * MR's owner), or, + * + * (2) mr->subregions non-empty, which means this MR can be a container + * of other MRs (share the owner or not). + * + * We know the MR, or any MR that is attached to this one as either + * container or children, is not visible in any address space, because + * otherwise the address space should have taken at least one refcount + * of this MR's owner. So we can blindly clear mr->enabled. + * + * memory_region_set_enabled instead could trigger a transaction and + * cause an infinite loop. */ mr->enabled = false; memory_region_transaction_begin(); + if (mr->container) { + /* Must share the owner; see above comments */ + assert(mr->container->owner == mr->owner); + memory_region_del_subregion(mr->container, mr); + } while (!QTAILQ_EMPTY(&mr->subregions)) { MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions); memory_region_del_subregion(mr, subregion); @@ -2625,7 +2645,10 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion) memory_region_transaction_begin(); - memory_region_ref(subregion); + if (mr->owner != subregion->owner) { + memory_region_ref(subregion); + } + QTAILQ_FOREACH(other, &mr->subregions, subregions_link) { if (subregion->priority >= other->priority) { QTAILQ_INSERT_BEFORE(other, subregion, subregions_link); @@ -2683,7 +2706,11 @@ void memory_region_del_subregion(MemoryRegion *mr, assert(alias->mapped_via_alias >= 0); } QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link); - memory_region_unref(subregion); + + if (mr->owner != subregion->owner) { + memory_region_unref(subregion); + } + memory_region_update_pending |= mr->enabled && subregion->enabled; memory_region_transaction_commit(); } -- 2.50.1 -- Peter Xu