On Wed, Oct 29, 2025 at 01:06:47PM +0900, Akihiko Odaki wrote:
> On 2025/10/29 7:09, Peter Xu wrote:
> > On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
> > > docs/devel/memory.rst says "memory_region_ref and memory_region_unref
> > > are never called on aliases", but these functions are called for
> > > FlatView roots, which can be aliases.
> > 
> > IMHO the quoted doc was in a special context, where it was talking about
> > the example of address_space_map() holding adhoc refcounts of a MR, in
> > which case "memory_region_ref and memory_region_unref are never called on
> > aliases" was correct..
> > 
> > The full context:
> > 
> >    ...
> >    If you break this rule, the following situation can happen:
> >    - the memory region's owner had a reference taken via memory_region_ref
> >      (for example by address_space_map)
> >    - the region is unparented, and has no owner anymore
> >    - when address_space_unmap is called, the reference to the memory 
> > region's
> >      owner is leaked.
> >    There is an exception to the above rule: it is okay to call
> >    object_unparent at any time for an alias or a container region.  It is
> >    therefore also okay to create or destroy alias and container regions
> >    dynamically during a device's lifetime.
> >    This exceptional usage is valid because aliases and containers only help
> >    QEMU building the guest's memory map; they are never accessed directly.
> >    memory_region_ref and memory_region_unref are never called on aliases
> >    or containers, and the above situation then cannot happen.  Exploiting
> >    this exception is rarely necessary, and therefore it is discouraged,
> >    but nevertheless it is used in a few places.
> >    ...
> > 
> > So I can't say the doc is wrong, but maybe it can be at least be clearer on
> > the scope of that sentence.. indeed.
> 
> I think statement "it is okay to call object_unparent at any time for an
> alias or a container region" can be corrected. Practically, developers will
> want call object_unparent() only when:
> - the memory region is not added to a container and
> - there is no manual references created with memory_region_ref().
> 
> These two conditions can be satisfied by auditing the device code that owns
> the memory region instead of multiple devices.

Yes.  I think there're other ways to implicitly taking mr refcounts though
(e.g. directly used as root address space when address_space_init()).  From
that POV maybe the 1st requirement isn't as special.

> 
> > 
> > > 
> > > This causes object overwrite hazard in pci-bridge. Specifically,
> > > pci_bridge_region_init() expects that there are no references to
> > > w->alias_io after object_unparent() is called, allowing it to reuse the
> > > associated storage. However, if a parent bus still holds a reference to
> > > the existing object as a FlatView's root, the storage is still in use,
> > > leading to an overwrite. This hazard can be confirmed by adding the
> > > following code to pci_bridge_region_init():
> > > 
> > > PCIDevice *parent_dev = parent->parent_dev;
> > > assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
> > >         PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
> > 
> > What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
> > used anywhere..  because it looks like the bridge code uses MRs to operate
> > rather than address spaces.
> > 
> > Does it mean we can drop the two ASes?  Then if they're the only holder of
> > the refcounts of these problematic MRs, does it solve the problem too in an
> > easier way?  Maybe there're other issues you want to fix too with this 
> > patch?
> 
> Apparently we cannot drop the ASes. See commit 55fa4be6f76a ("virtio-pci:
> fix memory_region_find for VirtIOPCIRegion's MR"), which introduced them.

Ah, this is definitely obscure.. at least it should have some comments
explaining why the ASes are there.

Now reading a bit into the problem, I'm not even sure if this is the right
thing to do, starting from ffa8a3e3b2 where it starts to introduce
memory_region_find() for virtio_address_space_lookup().

I don't know the piece of code well enough to say, but IMHO logically it
shouldn't need to depend on global address space information for the
lookup.

> 
> I don't know any other existing devices affected by this FlatView behavior,
> but it is also difficult to show that they are *not* affected because it
> requires traversing MemoryRegion graphs that span across several devices.
> 
> We will also need to update the documentation for future devices, but it is
> not trivial either as the condition where aliases are referenced from
> FlatView is complex.
> 
> Considering that, I think this patch is a pragmatic solution that ensures
> correctness of object_unparent() on aliases.

I think this patch should still be the last resort, let's still try to
discuss if there's other options.

For example, afaiu RCU readers at least do not rely on view->root to be
present, can we already release the refcount for the view->root within the
BQL section?  I mean something like this:

===8<===
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae3..ceb774530f 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -301,7 +301,6 @@ static void flatview_destroy(FlatView *view)
         memory_region_unref(view->ranges[i].mr);
     }
     g_free(view->ranges);
-    memory_region_unref(view->root);
     g_free(view);
 }
 
@@ -314,7 +313,16 @@ void flatview_unref(FlatView *view)
 {
     if (qatomic_fetch_dec(&view->ref) == 1) {
         trace_flatview_destroy_rcu(view, view->root);
+        /* Root pointer must exist until now */
         assert(view->root);
+        /*
+         * Release the root pointer first without waiting for a grace
+         * period, as the root is not used by RCU readers.  Early releasing
+         * of root MR helps stablizing alias MR refcounts in use cases like
+         * pci_bridge_region_init(), where the caller might want to reuse
+         * the same MR right away.
+         */
+        g_clear_pointer(&view->root, memory_region_unref);
         call_rcu(view, flatview_destroy, rcu);
     }
 }
===8<===

That at least do not introduce weak-refcount concepts.

> 
> > 
> > > 
> > > This assertion fails when running:
> > > meson test -C build qtest-x86_64/bios-tables-test \
> > >      '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
> > > 
> > > Make the references of FlatView roots "weak" (i.e., remove the
> > > reference to a root automatically removed when it is finalized) to
> > > avoid calling memory_region_ref and memory_region_unref and fix the
> > > hazard with pci-bridge.
> > > 
> > > Alternative solutions (like removing the "never called on aliases"
> > > statement or detailing the exception) were rejected because the alias
> > > invariant is still relied upon in several parts of the codebase, and
> > > updating existing code to align with a new condition is non-trivial.
> > > 
> > > Signed-off-by: Akihiko Odaki <[email protected]>
> > > ---
> > >   system/memory.c | 8 ++++++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/system/memory.c b/system/memory.c
> > > index 8b84661ae36c..08fe5e791224 100644
> > > --- a/system/memory.c
> > > +++ b/system/memory.c
> > > @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
> > >       view = g_new0(FlatView, 1);
> > >       view->ref = 1;
> > >       view->root = mr_root;
> > > -    memory_region_ref(mr_root);
> > >       trace_flatview_new(view, mr_root);
> > >       return view;
> > > @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
> > >           memory_region_unref(view->ranges[i].mr);
> > >       }
> > >       g_free(view->ranges);
> > > -    memory_region_unref(view->root);
> > >       g_free(view);
> > >   }
> > > @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
> > >   static void memory_region_finalize(Object *obj)
> > >   {
> > >       MemoryRegion *mr = MEMORY_REGION(obj);
> > > +    gpointer key;
> > > +    gpointer value;
> > > +
> > > +    if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
> > > +        ((FlatView *)value)->root = NULL;
> > > +    }
> > 
> > This is definitely very tricky.. The translation path (from
> > AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
> > view->root.. however at least I saw this:
> > 
> > void flatview_unref(FlatView *view)
> > {
> >      if (qatomic_fetch_dec(&view->ref) == 1) {
> >          trace_flatview_destroy_rcu(view, view->root);
> >          assert(view->root);                            <-------------------
> >          call_rcu(view, flatview_destroy, rcu);
> >      }
> > }
> > 
> > I wonder how it didn't already crash.
> 
> In case of pci-bridge, I guess flatview_unref() is synchronously called, but
> memory_region_unref(view->root) is not because of flatview_destroy() is
> delayed with RCU.

True.

> 
> > 
> > The other stupid but working solution is we can always make the 6 aliases
> > to not be reused, IOW we can always use dynamic MRs considering
> > pci_bridge_update_mappings() should be rare?
> 
> Perhaps we may introduce memory_region_new_alias() (that calls object_new())
> and allow calling object_unparent() only for aliases created with the
> function.

IMHO we can see feasibility of above "early unref view->root" idea, then
this one, before the original solution.

Thanks,

-- 
Peter Xu


Reply via email to