On 2025/11/06 5:07, Peter Xu wrote:
On Mon, Nov 03, 2025 at 08:18:00PM +0900, Akihiko Odaki wrote:
It unfortunately does not work for pci-bridge. It has the following
function:

void pci_bridge_update_mappings(PCIBridge *br)
{
     PCIBridgeWindows *w = &br->windows;

     /* Make updates atomic to: handle the case of one VCPU updating the
bridge
      * while another accesses an unaffected region. */
     memory_region_transaction_begin();
     pci_bridge_region_del(br, w);
     pci_bridge_region_cleanup(br, w);
     pci_bridge_region_init(br);
     memory_region_transaction_commit();
}

object_unparent() happens in pci_bridge_region_cleanup().
pci_bridge_region_init() reuses the storage.
memory_region_transaction_commit() triggers flatview_unref(), but it needs
to happen before pci_bridge_region_init().

memory_region_transaction_commit() also has an undesirable characteristic
that its effect may be delayed due to nesting. To make sure flatview_unref()
happens with a particular call of memory_region_transaction_commit(), you
need to traverse the possible call graph that lead to the function.

So I'm afraid but I don't think there is a better way to ensure correctness
without a codebase-wide audit.

Ah indeed, I missed that. :(

One way to work this around is providing a helper (abstraction from the
current memory_region_transaction_commit) to enforce a flatview reset
before reusing.  However I feel like it's an overkill too, but at least
that would also avoid weak-refs.

Enforcing a FlatView reset for *one* memory_region_transaction_commit() call is incompatible with nesting, which require delaying it until all memory_region_transaction_commit() calls to finish.


I think in practise I'd vote we fix pci-bridge only, either with your other
proposal to dynamically allocate the alias MRs, or something like you
posted previously:

https://lore.kernel.org/all/[email protected]/#t

Personally, I don't mind fixing pci-bridge only even if we don't audit the
whole code base.  The audit work is time consuming, and I'd simply trust
the tests from all the QEMU users covering whatever devices are still being
used. We will always get an issue report when something was wrong.

What do you think?

Generally speaking, we will not necessarily "always" get an issue report when things went wrong with memory management. A bug in memory management may not cause an immediate crash but corrupt the memory state which you will find only later. The end result of memory corruption may look random and result in a hard-to-debug issue report. A user may not even bother writing an issue report at all; this is especially true for this kind of corner cases that rarely happen.

There should have been no such a hazard of memory corruption if the code did exactly what the documentation said in the first place. The consistency of the code and the documentation is essential, especially for this kind of complex and fundamental code.

Regards,
Akihiko Odaki

Reply via email to