On Wed, Oct 29, 2025 at 01:06:47PM +0900, Akihiko Odaki wrote:
> On 2025/10/29 7:09, Peter Xu wrote:
> > On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
> > > docs/devel/memory.rst says "memory_region_ref and memory_region_unref
> > > are never called on aliases", but these functions are called for
> > > FlatView roots, which can be aliases.
> >
> > IMHO the quoted doc was in a special context, where it was talking about
> > the example of address_space_map() holding adhoc refcounts of a MR, in
> > which case "memory_region_ref and memory_region_unref are never called on
> > aliases" was correct..
> >
> > The full context:
> >
> > ...
> > If you break this rule, the following situation can happen:
> > - the memory region's owner had a reference taken via memory_region_ref
> > (for example by address_space_map)
> > - the region is unparented, and has no owner anymore
> > - when address_space_unmap is called, the reference to the memory
> > region's
> > owner is leaked.
> > There is an exception to the above rule: it is okay to call
> > object_unparent at any time for an alias or a container region. It is
> > therefore also okay to create or destroy alias and container regions
> > dynamically during a device's lifetime.
> > This exceptional usage is valid because aliases and containers only help
> > QEMU building the guest's memory map; they are never accessed directly.
> > memory_region_ref and memory_region_unref are never called on aliases
> > or containers, and the above situation then cannot happen. Exploiting
> > this exception is rarely necessary, and therefore it is discouraged,
> > but nevertheless it is used in a few places.
> > ...
> >
> > So I can't say the doc is wrong, but maybe it can be at least be clearer on
> > the scope of that sentence.. indeed.
>
> I think statement "it is okay to call object_unparent at any time for an
> alias or a container region" can be corrected. Practically, developers will
> want call object_unparent() only when:
> - the memory region is not added to a container and
> - there is no manual references created with memory_region_ref().
>
> These two conditions can be satisfied by auditing the device code that owns
> the memory region instead of multiple devices.
Yes. I think there're other ways to implicitly taking mr refcounts though
(e.g. directly used as root address space when address_space_init()). From
that POV maybe the 1st requirement isn't as special.
>
> >
> > >
> > > This causes object overwrite hazard in pci-bridge. Specifically,
> > > pci_bridge_region_init() expects that there are no references to
> > > w->alias_io after object_unparent() is called, allowing it to reuse the
> > > associated storage. However, if a parent bus still holds a reference to
> > > the existing object as a FlatView's root, the storage is still in use,
> > > leading to an overwrite. This hazard can be confirmed by adding the
> > > following code to pci_bridge_region_init():
> > >
> > > PCIDevice *parent_dev = parent->parent_dev;
> > > assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
> > > PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
> >
> > What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
> > used anywhere.. because it looks like the bridge code uses MRs to operate
> > rather than address spaces.
> >
> > Does it mean we can drop the two ASes? Then if they're the only holder of
> > the refcounts of these problematic MRs, does it solve the problem too in an
> > easier way? Maybe there're other issues you want to fix too with this
> > patch?
>
> Apparently we cannot drop the ASes. See commit 55fa4be6f76a ("virtio-pci:
> fix memory_region_find for VirtIOPCIRegion's MR"), which introduced them.
Ah, this is definitely obscure.. at least it should have some comments
explaining why the ASes are there.
Now reading a bit into the problem, I'm not even sure if this is the right
thing to do, starting from ffa8a3e3b2 where it starts to introduce
memory_region_find() for virtio_address_space_lookup().
I don't know the piece of code well enough to say, but IMHO logically it
shouldn't need to depend on global address space information for the
lookup.
>
> I don't know any other existing devices affected by this FlatView behavior,
> but it is also difficult to show that they are *not* affected because it
> requires traversing MemoryRegion graphs that span across several devices.
>
> We will also need to update the documentation for future devices, but it is
> not trivial either as the condition where aliases are referenced from
> FlatView is complex.
>
> Considering that, I think this patch is a pragmatic solution that ensures
> correctness of object_unparent() on aliases.
I think this patch should still be the last resort, let's still try to
discuss if there's other options.
For example, afaiu RCU readers at least do not rely on view->root to be
present, can we already release the refcount for the view->root within the
BQL section? I mean something like this:
===8<===
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae3..ceb774530f 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -301,7 +301,6 @@ static void flatview_destroy(FlatView *view)
memory_region_unref(view->ranges[i].mr);
}
g_free(view->ranges);
- memory_region_unref(view->root);
g_free(view);
}
@@ -314,7 +313,16 @@ void flatview_unref(FlatView *view)
{
if (qatomic_fetch_dec(&view->ref) == 1) {
trace_flatview_destroy_rcu(view, view->root);
+ /* Root pointer must exist until now */
assert(view->root);
+ /*
+ * Release the root pointer first without waiting for a grace
+ * period, as the root is not used by RCU readers. Early releasing
+ * of root MR helps stablizing alias MR refcounts in use cases like
+ * pci_bridge_region_init(), where the caller might want to reuse
+ * the same MR right away.
+ */
+ g_clear_pointer(&view->root, memory_region_unref);
call_rcu(view, flatview_destroy, rcu);
}
}
===8<===
That at least do not introduce weak-refcount concepts.
>
> >
> > >
> > > This assertion fails when running:
> > > meson test -C build qtest-x86_64/bios-tables-test \
> > > '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
> > >
> > > Make the references of FlatView roots "weak" (i.e., remove the
> > > reference to a root automatically removed when it is finalized) to
> > > avoid calling memory_region_ref and memory_region_unref and fix the
> > > hazard with pci-bridge.
> > >
> > > Alternative solutions (like removing the "never called on aliases"
> > > statement or detailing the exception) were rejected because the alias
> > > invariant is still relied upon in several parts of the codebase, and
> > > updating existing code to align with a new condition is non-trivial.
> > >
> > > Signed-off-by: Akihiko Odaki <[email protected]>
> > > ---
> > > system/memory.c | 8 ++++++--
> > > 1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/system/memory.c b/system/memory.c
> > > index 8b84661ae36c..08fe5e791224 100644
> > > --- a/system/memory.c
> > > +++ b/system/memory.c
> > > @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
> > > view = g_new0(FlatView, 1);
> > > view->ref = 1;
> > > view->root = mr_root;
> > > - memory_region_ref(mr_root);
> > > trace_flatview_new(view, mr_root);
> > > return view;
> > > @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
> > > memory_region_unref(view->ranges[i].mr);
> > > }
> > > g_free(view->ranges);
> > > - memory_region_unref(view->root);
> > > g_free(view);
> > > }
> > > @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
> > > static void memory_region_finalize(Object *obj)
> > > {
> > > MemoryRegion *mr = MEMORY_REGION(obj);
> > > + gpointer key;
> > > + gpointer value;
> > > +
> > > + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
> > > + ((FlatView *)value)->root = NULL;
> > > + }
> >
> > This is definitely very tricky.. The translation path (from
> > AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
> > view->root.. however at least I saw this:
> >
> > void flatview_unref(FlatView *view)
> > {
> > if (qatomic_fetch_dec(&view->ref) == 1) {
> > trace_flatview_destroy_rcu(view, view->root);
> > assert(view->root); <-------------------
> > call_rcu(view, flatview_destroy, rcu);
> > }
> > }
> >
> > I wonder how it didn't already crash.
>
> In case of pci-bridge, I guess flatview_unref() is synchronously called, but
> memory_region_unref(view->root) is not because of flatview_destroy() is
> delayed with RCU.
True.
>
> >
> > The other stupid but working solution is we can always make the 6 aliases
> > to not be reused, IOW we can always use dynamic MRs considering
> > pci_bridge_update_mappings() should be rare?
>
> Perhaps we may introduce memory_region_new_alias() (that calls object_new())
> and allow calling object_unparent() only for aliases created with the
> function.
IMHO we can see feasibility of above "early unref view->root" idea, then
this one, before the original solution.
Thanks,
--
Peter Xu