On Fri, Nov 07, 2025 at 11:16:56AM +0900, Akihiko Odaki wrote:
> On 2025/11/07 2:50, Peter Xu wrote:
> > On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
> > > Generally speaking, we will not necessarily "always" get an issue report
> > > when things went wrong with memory management. A bug in memory management
> > > may not cause an immediate crash but corrupt the memory state which you 
> > > will
> > > find only later. The end result of memory corruption may look random and
> > > result in a hard-to-debug issue report. A user may not even bother writing
> > > an issue report at all; this is especially true for this kind of corner
> > > cases that rarely happen.
> > > 
> > > There should have been no such a hazard of memory corruption if the code 
> > > did
> > > exactly what the documentation said in the first place. The consistency of
> > > the code and the documentation is essential, especially for this kind of
> > > complex and fundamental code.
> > 
> > Do you have encountered such case before?
> > 
> > I wasn't expecting that, because what you were saying looks more like what
> > Linux kernel would have a bug in mm.  QEMU is still special as it has the
> > default unassigned MR trapping everything by default, meanwhile normally
> > what is moving is MMIO / alias regions rather than real ramblocks.  It
> > means when things go wrong we have much higher chance to trap them
> > properly.
> 
> When I said "memory management" I meant the methods we use to allocate and
> free memory (the Linux equivalents would be kmalloc()/free()/kref), not the
> MM tracking or unassigned MR trapping behavior you mentioned. The unassigned
> MR trap and MMIO/alias movement are a separate concern and don’t change the
> underlying risk.
> 
> Concrete example: imagine an alias is allocated with g_new() and freed
> immediately after object_unparent(). If that alias accidentally becomes the
> FlatView root, destroying the FlatView later will call memory_region_unref()
> and produce a use-after-free. We cannot predict what memory_region_unref()
> will read or write in that scenario — the result can be arbitrary memory
> corruption that surfaces much later as a hard-to-debug, intermittent
> problem. Users often won’t file an issue for these rare corner cases.

OK I see what you meant now.  Yes it's a valid concern.

> 
> > 
> > I also confess though that I'm pretty conservative on fixing things with
> > hypothetical issues.  In general, I prefer fixing things with explicit
> > problems, so we know how to measure and justify a fix (depending on how
> > aggressive the fix is and how much maintanence burden it will bring to
> > QEMU).  Without a real problem, it's harder to quantify that even if such
> > evaluation will also normally be subjective too.
> 
> Regarding your preference to fix only explicit problems: I understand the
> conservatism, but here are the facts we need to weigh:
> 
> - The documentation claims we may free aliases because
>   memory_region_ref() is never called, yet there is code that does call
>   memory_region_ref().
> - The patch adds code to align behavior with the documentation.
> 
> The significance of both potential impacts (the behavioral divergence for
> devices other than pci-bridge, and the added complexity needed for
> consistency) may be subjective and hypothetical, but that applies equally to
> both sides.
> 
> In this case, the long-term reliability and maintainability of QEMU depend
> on having the code behave as documented. Correctness should take precedence
> over simplicity.

Fair enough.

Let's then still try to see whether we can brainstorm something that can
still at least avoid the "let's clear a remote pointer in a finalize(),
because it looks safe" approach.. I'm not sure if I'm the only one that
feels nervous with it.

Fundamentally, if you can remotely clear a pointer, it means it's not used
at all. In practise, that's correct because as I also discussed before I
don't think RCU readers use flatview->root at all.  It was almost only
because we have some very light references on flatview->root.  The major
"hidden" reference is actually the key in flat_views hash, however I don't
think it will have any stale root MR VA as key, as long as after a proper
commit() completes.

In short, I want to discuss with you on whether we can completely remove
the flatview->root reference, instead of resetting it in finalize().

Since that'll need to be justified, I prepared some sample patches myself.
It survives all my smoke tests.  Once again, please treat them as comments
only.  Would you think that's slightly better?  Attached at the end.

IMHO removing view->root makes sure nothing will surprise us in the future
v.s. remote resets.

Thanks,

===8<===

>From 4b495d935dfb145ed7ff57b3f6e4d6b9cb5ce038 Mon Sep 17 00:00:00 2001
From: Peter Xu <[email protected]>
Date: Fri, 7 Nov 2025 10:47:40 -0500
Subject: [PATCH 1/3] memory: Remove flatview root reference in
 mtree_print_dispatch()

This is so far only used to dump an extra "[ROOT]" tag when the termination
MR is the root MR.  This is so far the only real necessary reference to
flatview's root.  Remove it.  We lose this tag, but the hope is with the
MR's name still being available it's still clear on what it represents.

This paves way for a complete removal of root MR reference on flatviews.

Signed-off-by: Peter Xu <[email protected]>
---
 system/memory-internal.h | 3 +--
 system/memory.c          | 4 ++--
 system/physmem.c         | 5 ++---
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/system/memory-internal.h b/system/memory-internal.h
index 46f758fa7e..c5841a603c 100644
--- a/system/memory-internal.h
+++ b/system/memory-internal.h
@@ -35,8 +35,7 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView 
*fv);
 void address_space_dispatch_compact(AddressSpaceDispatch *d);
 void address_space_dispatch_free(AddressSpaceDispatch *d);
 
-void mtree_print_dispatch(struct AddressSpaceDispatch *d,
-                          MemoryRegion *root);
+void mtree_print_dispatch(struct AddressSpaceDispatch *d);
 
 /* returns true if end is big endian. */
 static inline bool devend_big_endian(enum device_endian end)
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae3..0d14e60d26 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -3573,8 +3573,8 @@ static void mtree_print_flatview(gpointer key, gpointer 
value,
     }
 
 #if !defined(CONFIG_USER_ONLY)
-    if (fvi->dispatch_tree && view->root) {
-        mtree_print_dispatch(view->dispatch, view->root);
+    if (fvi->dispatch_tree) {
+        mtree_print_dispatch(view->dispatch);
     }
 #endif
 
diff --git a/system/physmem.c b/system/physmem.c
index c9869e4049..26ae6e3acd 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -4257,7 +4257,7 @@ static void mtree_print_phys_entries(int start, int end, 
int skip, int ptr)
 #define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
                            int128_sub((size), int128_one())) : 0)
 
-void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
+void mtree_print_dispatch(AddressSpaceDispatch *d)
 {
     int i;
 
@@ -4270,13 +4270,12 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, 
MemoryRegion *root)
                                 " [ROM]", " [watch]" };
 
         qemu_printf("      #%d @" HWADDR_FMT_plx ".." HWADDR_FMT_plx
-                    " %s%s%s%s%s",
+                    " %s%s%s%s",
             i,
             s->offset_within_address_space,
             s->offset_within_address_space + MR_SIZE(s->size),
             s->mr->name ? s->mr->name : "(noname)",
             i < ARRAY_SIZE(names) ? names[i] : "",
-            s->mr == root ? " [ROOT]" : "",
             s == d->mru_section ? " [MRU]" : "",
             s->mr->is_iommu ? " [iommu]" : "");
 
-- 
2.50.1


>From f96e31623f6d13e27b835f06b3c131e91590b13f Mon Sep 17 00:00:00 2001
From: Peter Xu <[email protected]>
Date: Fri, 7 Nov 2025 10:59:15 -0500
Subject: [PATCH 2/3] memory: Introduce flatview's root name

One other reason to reference a Flatview's root MR is to fetch a name.
Remember that inside Flatview itself so as to drop this reference to
Flatview's root MR.

Signed-off-by: Peter Xu <[email protected]>
---
 include/system/memory.h | 1 +
 system/memory.c         | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/system/memory.h b/include/system/memory.h
index 3bd5ffa5e0..301cda1a8f 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1202,6 +1202,7 @@ struct FlatView {
     unsigned nr_allocated;
     struct AddressSpaceDispatch *dispatch;
     MemoryRegion *root;
+    char *root_name;
 };
 
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
diff --git a/system/memory.c b/system/memory.c
index 0d14e60d26..86891c50b4 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -266,6 +266,7 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
     view = g_new0(FlatView, 1);
     view->ref = 1;
     view->root = mr_root;
+    view->root_name = g_strdup(mr_root ? memory_region_name(mr_root) : 
"(none)");
     memory_region_ref(mr_root);
     trace_flatview_new(view, mr_root);
 
@@ -301,6 +302,7 @@ static void flatview_destroy(FlatView *view)
         memory_region_unref(view->ranges[i].mr);
     }
     g_free(view->ranges);
+    g_free(view->root_name);
     memory_region_unref(view->root);
     g_free(view);
 }
@@ -3522,8 +3524,7 @@ static void mtree_print_flatview(gpointer key, gpointer 
value,
         qemu_printf("\n");
     }
 
-    qemu_printf(" Root memory region: %s\n",
-      view->root ? memory_region_name(view->root) : "(none)");
+    qemu_printf(" Root memory region: %s\n", view->root_name);
 
     if (n <= 0) {
         qemu_printf(MTREE_INDENT "No rendered FlatView\n\n");
-- 
2.50.1


>From 19f4451a1a7ea2ea3c0c41849f9c118f2b909293 Mon Sep 17 00:00:00 2001
From: Peter Xu <[email protected]>
Date: Fri, 7 Nov 2025 11:10:19 -0500
Subject: [PATCH 3/3] memory: Remove Flatview->root reference completely

memory.rst has this paragraph, which is currently inaccurate:

  This exceptional usage is valid because aliases and containers only help
  QEMU building the guest's memory map; they are never accessed directly.
  memory_region_ref and memory_region_unref are never called on aliases or
  containers, and the above situation then cannot happen.  Exploiting this
  exception is rarely necessary, and therefore it is discouraged, but
  nevertheless it is used in a few places.

It was inaccurate because flatviews so far can take a refcount on an alias
MR, which may cause an object_unparent() on top of alias MR unsafe too.

However, Flatview logically doesn't need to take refcount on its root MR
that was used to generate the topology.

When generating the topology, the root MR is used within the whole process.
Said that, it doesn't need the refcount because it keeps holding BQL
without releasing, so the root MR cannot have chance to be detached or
freed.

There's another very implicit use of the root MR, which is as the index of
the flat_views hash.  However that is fine too without refcount, because
for each memory whole transaction, it will guarantee that after memory
region commit()ed and returned, the flat_views hash will not have any key
that points to an invalid root MR.  It's because the role is the MR needs
to be fully detached first before object_unparent(), then it guarantees it
won't appear in any of the Flatviews.

Signed-off-by: Peter Xu <[email protected]>
---
 include/system/memory.h |  1 -
 system/memory.c         | 13 +++++++------
 system/trace-events     |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/system/memory.h b/include/system/memory.h
index 301cda1a8f..d383081155 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1201,7 +1201,6 @@ struct FlatView {
     unsigned nr;
     unsigned nr_allocated;
     struct AddressSpaceDispatch *dispatch;
-    MemoryRegion *root;
     char *root_name;
 };
 
diff --git a/system/memory.c b/system/memory.c
index 86891c50b4..6576e673d4 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -265,9 +265,7 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
 
     view = g_new0(FlatView, 1);
     view->ref = 1;
-    view->root = mr_root;
     view->root_name = g_strdup(mr_root ? memory_region_name(mr_root) : 
"(none)");
-    memory_region_ref(mr_root);
     trace_flatview_new(view, mr_root);
 
     return view;
@@ -294,7 +292,7 @@ static void flatview_destroy(FlatView *view)
 {
     int i;
 
-    trace_flatview_destroy(view, view->root);
+    trace_flatview_destroy(view, view->root_name);
     if (view->dispatch) {
         address_space_dispatch_free(view->dispatch);
     }
@@ -303,7 +301,6 @@ static void flatview_destroy(FlatView *view)
     }
     g_free(view->ranges);
     g_free(view->root_name);
-    memory_region_unref(view->root);
     g_free(view);
 }
 
@@ -315,8 +312,7 @@ static bool flatview_ref(FlatView *view)
 void flatview_unref(FlatView *view)
 {
     if (qatomic_fetch_dec(&view->ref) == 1) {
-        trace_flatview_destroy_rcu(view, view->root);
-        assert(view->root);
+        trace_flatview_destroy_rcu(view, view->root_name);
         call_rcu(view, flatview_destroy, rcu);
     }
 }
@@ -751,6 +747,11 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
     int i;
     FlatView *view;
 
+    /*
+     * Holding BQL makes sure @mr (if non-NULL) will not be gone from under
+     * us, hence we do not need a refcount while using it.
+     */
+    assert(bql_locked());
     view = flatview_new(mr);
 
     if (mr) {
diff --git a/system/trace-events b/system/trace-events
index 82856e44f2..21efd8047e 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -24,8 +24,8 @@ memory_region_ram_device_read(int cpu_index, void *mr, 
uint64_t addr, uint64_t v
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, 
uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" 
size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr 
'%s' listener '%s' synced (global=%d)"
 flatview_new(void *view, void *root) "%p (root %p)"
-flatview_destroy(void *view, void *root) "%p (root %p)"
-flatview_destroy_rcu(void *view, void *root) "%p (root %p)"
+flatview_destroy(void *view, char *root) "%p (root %p)"
+flatview_destroy_rcu(void *view, char *root) "%p (root %p)"
 global_dirty_changed(unsigned int bitmask) "bitmask 0x%"PRIx32
 
 # physmem.c
-- 
2.50.1


-- 
Peter Xu


Reply via email to