On Thu, Aug 21, 2025 at 03:47:06PM +0100, Peter Maydell wrote:
> On Thu, 21 Aug 2025 at 15:28, Peter Xu <pet...@redhat.com> wrote:
> > > On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.mayd...@linaro.org> 
> > > wrote:
> > > > In memory_region_unref_subregion(), subregion->container is NULL.
> > > >
> > > > This is because in memory_region_del_subregion() we do:
> > > >
> > > >     subregion->container = NULL;
> > > >
> > > > and then after that we call
> > > >     memory_region_unref_subregion(subregion);
> > > > which dereferences subregion->container.
> > > >
> > > > Won't this always SEGV ?
> 
> > Peter, could you try the v3 version patch 8/9 instead?
> >
> > https://lore.kernel.org/all/20240708-san-v3-8-b03f671c4...@daynix.com/
> >
> > I still prefer that one, and I hope that one doesn't have this issue.
> 
> That one fails like this:
> qemu-system-arm: ../../system/memory.c:1799: memory_region_finalize:
> Assertion `!mr->container' failed.
> 
> See the discussion on v2 (which was the same for this patch):
> https://lore.kernel.org/all/cafeaca9ktsjwf1rabpm5nv9ufukqzzk6+qo4pef4+rtirni...@mail.gmail.com/

My apologies, it was a long discussion and I totally forgot that..

I remember I provided a draft somewhere during the discussion, anyway.. I
redid it, and attached a formal patch below that will contain the removal
of the mr->container check (plus auto-detach when it happens).  The hope is
this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
own refcount any more complicated.

If necessary, I can send a formal patch.

Thanks,

===8<===
>From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
From: Peter Xu <pet...@redhat.com>
Date: Thu, 21 Aug 2025 12:59:02 -0400
Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references

Currently, QEMU refcounts the MR by always taking it from the owner.

It should be non-issue if all the owners of MRs will properly detach all
the MRs from their parents by memory_region_del_subregion() when the owner
will be freed.  However, it turns out many of the device emulations do not
do that properly.  It might be a challenge to fix all of such occurances.

Without fixing it, QEMU faces circular reference issue when an owner can
contain more than one MRs, then when they are linked within each other in
form of subregions, those links keep the owner alive forever, causing
memory leaks even if all the external refcounts are released for the owner.

To fix that, teach memory API to stop refcount on MRs that share the same
owner because if they share the lifecycle of the owner, then they share the
same lifecycle between themselves.

Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
against its container, as long as they belong to the same owner.

The latter is needed because now it's possible to have MR finalize() happen
in any order when they exactly share the same lifespan.  In this case, we
should allow finalize() to happen in any order of either the parent or
child MR, however it should be guaranteed that the mr->container shares the
same owner of this MR to be finalized.

Proper document this behavior in code.

This patch is heavily based on the work done by Akihiko Odaki:

https://lore.kernel.org/r/cafeaca8dv40fgsci76r4yep1p-sp_qjnrdd2ozpxjx5wrs0...@mail.gmail.com

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 docs/devel/memory.rst |  9 +++++++--
 system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
index 57fb2aec76..1367c7caf7 100644
--- a/docs/devel/memory.rst
+++ b/docs/devel/memory.rst
@@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  
They take effect
 as soon as the region is made visible.  This can be immediately, later,
 or never.
 
-Destruction of a memory region happens automatically when the owner
-object dies.
+Destruction of a memory region happens automatically when the owner object
+dies.  Note that the MRs under the same owner can be destroyed in any order
+when the owner object dies.  It's because the MRs will share the same
+lifespan of the owner, no matter if its a container or child MR.  It's
+suggested to always cleanly detach the MRs under the owner object when the
+owner object is going to be destroyed, however it is not required, as the
+memory core will automatically detach the links when MRs are destroyed.
 
 If however the memory region is part of a dynamically allocated data
 structure, you should call object_unparent() to destroy the memory region
diff --git a/system/memory.c b/system/memory.c
index 5646547940..d7f6ad9be2 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1796,16 +1796,36 @@ static void memory_region_finalize(Object *obj)
 {
     MemoryRegion *mr = MEMORY_REGION(obj);
 
-    assert(!mr->container);
-
-    /* We know the region is not visible in any address space (it
-     * does not have a container and cannot be a root either because
-     * it has no references, so we can blindly clear mr->enabled.
-     * memory_region_set_enabled instead could trigger a transaction
-     * and cause an infinite loop.
+    /*
+     * Each memory region (that can be dynamically freed..) must has an
+     * owner, and it always has the same lifecycle of its owner.  It means
+     * when reaching here, the memory region's owner refcount is zero.
+     *
+     * Here it is possible that the MR has:
+     *
+     * (1) mr->container set, which means this MR can be a subregion of a
+     *     container MR, in this case it must share the same owner
+     *     (otherwise the container should have kept a refcount of this
+     *     MR's owner), or,
+     *
+     * (2) mr->subregions non-empty, which means this MR can be a container
+     *     of other MRs (share the owner or not).
+     *
+     * We know the MR, or any MR that is attached to this one as either
+     * container or children, is not visible in any address space, because
+     * otherwise the address space should have taken at least one refcount
+     * of this MR's owner.  So we can blindly clear mr->enabled.
+     *
+     * memory_region_set_enabled instead could trigger a transaction and
+     * cause an infinite loop.
      */
     mr->enabled = false;
     memory_region_transaction_begin();
+    if (mr->container) {
+        /* Must share the owner; see above comments */
+        assert(mr->container->owner == mr->owner);
+        memory_region_del_subregion(mr->container, mr);
+    }
     while (!QTAILQ_EMPTY(&mr->subregions)) {
         MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
         memory_region_del_subregion(mr, subregion);
@@ -2625,7 +2645,10 @@ static void 
memory_region_update_container_subregions(MemoryRegion *subregion)
 
     memory_region_transaction_begin();
 
-    memory_region_ref(subregion);
+    if (mr->owner != subregion->owner) {
+        memory_region_ref(subregion);
+    }
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2683,7 +2706,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
         assert(alias->mapped_via_alias >= 0);
     }
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+
+    if (mr->owner != subregion->owner) {
+        memory_region_unref(subregion);
+    }
+
     memory_region_update_pending |= mr->enabled && subregion->enabled;
     memory_region_transaction_commit();
 }
-- 
2.50.1


-- 
Peter Xu


Reply via email to