Re: [PATCH 00/20] Add SMMUv3 Stage 1 Support for XEN guests

Julien Grall Wed, 03 Dec 2025 02:32:52 -0800

Hi,

On 02/12/2025 22:08, Milan Djokic wrote:

Hi Julien,
On 11/27/25 11:22, Julien Grall wrote:
We have changed vIOMMU design from 1-N to N-N mapping between vIOMMU and
pIOMMU. Considering single vIOMMU model limitation pointed out by
Volodymyr (SID overlaps), vIOMMU-per-pIOMMU model turned out to be the
only proper solution.
I am not sure to fully understand. My assumption with the single vIOMMU
is you have a virtual SID that would be mapped to a (pIOMMU, physical
SID).
In the original single vIOMMU implementation, vSID was also equal topSID, we didn't have SW mapping layer between them. Once SID overlapissue was discovered with this model, I have switched to vIOMMU-per-pIOMMU model. Alternative was to introduce a SW mapping layer and stickwith a single vIOMMU model. Imo, vSID->pSID mapping layer wouldovercomplicate the design, especially for PCI RC streamIDs handling.On the other hand, if even a multi-vIOMMU model introduces problems thatI am not aware of yet, adding a complex mapping layer would be the onlyviable solution.
 > Does this means in your solution you will end up with multiple
 > vPCI as well and then map pBDF == vBDF? (this because the SID have to be
 > fixed at boot)
 >
The important thing which I haven't mentioned here is that our focus ison non-PCI devices for this feature atm. If I'm not mistaken, arm PCIpassthrough is still work in progress, so our plan was to implement fullvIOMMU PCI support in the future, once PCI passthrough support iscomplete for arm. Of course, we need to make sure that vIOMMU designprovides a suitable infrastructure for PCI.To answer your question, yes we will have multiple vPCI nodes with thismodel, establishing 1-1 vSID-pSID mapping (same iommu-map range betweenpPCI-vPCI).For pBDF to vBDF 1-1 mapping, I'm not sure if this is necessary. Myunderstanding is that vBDF->pBDF mapping does not affect vSID->pSIDmapping. Am I wrong here?

From my understanding, the mapping between a vBDF and vSID is setup atdomain creation (as this is described in ACPI/Device-Tree). As PCIdevices can be hotplug, if you want to enforce vSID == pSID, then youindirectly need to enforce vBDF == pBDF.


[...]

- **Runtime Configuration**: Introduces a `viommu` boot parameter for
dynamic enablement.

Separate vIOMMU device is exposed to guest for every physical IOMMU in
the system.
vIOMMU feature is designed in a way to provide a generic vIOMMU
framework and a backend implementation
for target IOMMU as separate components.
Backend implementation contains specific IOMMU structure and commands
handling (only SMMUv3 currently supported).
This structure allows potential reuse of stage-1 feature for other IOMMU
types.

Security Considerations
=======================

**viommu security benefits:**

- Stage-1 translation ensures guest devices cannot perform unauthorized
DMA (device I/O address mapping managed by guest).
- Emulated IOMMU removes guest direct dependency on IOMMU hardware,
while maintaining domains isolation.


Sorry, I don't follow this argument. Are you saying that it would be
possible to emulate a SMMUv3 vIOMMU on top of the IPMMU?


No, this would not work. Emulated IOMMU has to match with the pIOMMU type.

The argument only points out that we are emulating IOMMU, so the guestdoes not need direct HW interface for IOMMU functions.


Sorry, but I am still missing how this is a security benefits.

[...]


2. Observation:
---------------
Guests can now invalidate Stage-1 caches; invalidation needs forwarding
to SMMUv3 hardware to maintain coherence.

**Risk:**
Failing to propagate cache invalidation could allow stale mappings,
enabling access to old mappings and possibly
data leakage or misrouting.


You are referring to data leakage/misrouting between two devices own by
the same guest, right? Xen would still be in charge of flush when the
stage-2 is updated.


Yes, this risk could affect only guests, not xen.

But it would affect a single guest right? IOW, it is not possible forguest A to leak data to guest B even if we don't properly invalidatestage-1. Correct?


**Mitigation:** *(Handled by design)*
This feature ensures that guest-initiated invalidations are correctly
forwarded to the hardware,
preserving IOMMU coherency.


How is this a mitigation? You have to properly handle commands. If you
don't properly handle them, then yes it will break.

Not really a mitigation, will remove it. Guest is responsible for theregular initiation of invalidation requests to mitigate this risk.


4. Observation:
---------------
The code includes transformations to handle nested translation versus
standard modes and uses guest-configured
command queues (e.g., `CMD_CFGI_STE`) and event notifications.

**Risk:**
Malicious or malformed queue commands from guests could bypass
validation, manipulate SMMUv3 state,
or cause system instability.

**Mitigation:** *(Handled by design)*
Built-in validation of command queue entries and sanitization mechanisms
ensure only permitted configurations
are applied.

This is true as long as we didn't make an mistake in theconfigurations ;).


Yes, but I don’t see anything we can do to prevent configuration mistakes.


There is nothing really preventing it. Same for ...

This is supported via additions in `vsmmuv3` and `cmdqueue`
handling code.

5. Observation:
---------------
Device Tree modifications enable device assignment and configuration
through guest DT fragments (e.g., `iommus`)
are added via `libxl`.

**Risk:**
Erroneous or malicious Device Tree injection could result in device
misbinding or guest access to unauthorized
hardware.


The DT fragment are not security support and will never be at least
until you have can a libfdt that is able to detect malformed Device-Tree
(I haven't checked if this has changed recently).

But this should still be considered a risk? Similar to the previousobservation, system integrator should ensure that DT fragments are correct.

... this one. I agree they are risks, but they don't provide much inputin the design of the vIOMMU.

I am a lot more concerned for the scheduling part because the resourcesare shared.

My understanding is there is only a single physical event queue. Xen
would be responsible to handle the events in the queue and forward to
the respective guests. If so, it is not clear what you mean by "disable
event queue".
I was referring to emulated IOMMU event queue. The idea is to make itoptional for guests. When disabled, events won't be propagated to theguest.

But Xen will still receive the events, correct? If so, how does it makeit better?


Performance Impact
==================

With iommu stage-1 and nested translation inclusion, performance
overhead is introduced comparing to existing,
stage-2 only usage in Xen. Once mappings are established, translations
should not introduce significant overhead.
Emulated paths may introduce moderate overhead, primarily affecting
device initialization and event handling.
Performance impact highly depends on target CPU capabilities.
Testing is performed on QEMU virt and Renesas R-Car (QEMU emulated)
platforms.


I am afraid QEMU is not a reliable platform to do performance testing.
Don't you have a real HW with vIOMMU support?


Yes, I will provide performance measurement for Renesas HW also.

FWIW, I don't need to know the performance right now. I am mostlypointing out that if you want to provide performance number, then theyshould really come from real HW rather than QEMU.


Cheers,

--
Julien Grall

Re: [PATCH 00/20] Add SMMUv3 Stage 1 Support for XEN guests

Reply via email to