Re: [PATCH 3/3] amd_iommu: Generate XT interrupts when xt support is enabled

Sairaj Kodilkar Thu, 04 Dec 2025 00:24:53 -0800



On 12/3/2025 11:39 PM, Alejandro Jimenez wrote:

Hi alejandro,

Hi Sairaj,

On 11/18/25 3:24 AM, Sairaj Kodilkar wrote:
AMD IOMMU uses MMIO registers 0x170-0x180 to generate the interrupts
when guest enables xt support through control register [IntCapXTEn]. The
"the interrupts" is a bit vague, considering that it is only aspecific type of interrupts that originate from the IOMMU itself.I think the quote from the spec is pretty clear and we might want touse just that:
"
When MMIO 0x18[IntCapXTEn]=1, interrupts originating from the IOMMUitself are sent based on the programming in XT IOMMU Interrupt ControlRegisters in MMIO 0x170-0x180 instead of the programming in theIOMMU's MSI capability registers.
"
I'd prefer if this was documented in the code comments, but I would goa step further and mention the specific interrupts, e.g.:
The mapping to MMIO offsets and interrupts controlled by each is asfollows:
XT IOMMU General Interrupt Control Register (0x170): Event Logexception interrupts and Completion wait interrupts
XT IOMMU PPR Interrupt Control Register (0x178): PPR Log exceptioninterrupts
XT IOMMU GA Log Interrupt Control Register (0x180): GA Log exceptioninterrupts (irrelevant in vIOMMU case)

I think we can avoid mentioning other two registers (PPR and GAlog) aswe do not plan to support these

features as of now.

guest programs these registers with appropriate vector and destination
ID instead of writing to PCI MSI capability.

Current AMD vIOMMU is capable of generating interrupts only through PCI
MSI capability and does not care about xt mode. Because of this AMD
vIOMMU cannot generate event log interrupts when the guest has enabled
xt mode.
At first I thought that statement was not correct. If that were thecase, I was wondering why we don't currently have issues withCompletion Wait, since they are also controlled by MMIO 0x170 offset.But the Linux driver doesn't rely on the completion interrupt AFAICT,it just sets the completion store bit and monitors the semaphore todetect the completion. We might not be so lucky with other OSs, sogood catch.


Maybe this patch qualifies for the fixes tag ?

Introduce a new flag "intcapxten" which is set when guest writes control
And similarly to the xten case in PATCH 2, we also need to migratethis new field. It can be added to the your proposed vmstate_xt.


Right

register [IntCapXTEn] (bit 51) and use vector and destination field in
the XT MMIO register (0x170) to support XT mode.

Signed-off-by: Sairaj Kodilkar <[email protected]>
---
  hw/i386/amd_iommu.c  | 51 ++++++++++++++++++++++++++++++++++++++------
  hw/i386/amd_iommu.h  |  3 +++
  hw/i386/trace-events |  1 +
  3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 7f08fc31111a..c6bc13c93679 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c

@@ -200,18 +200,52 @@ static void amdvi_assign_andq(AMDVIState *s,hwaddr addr, uint64_t val)

     amdvi_writeq_raw(s, addr, amdvi_readq(s, addr) & val);
  }
  +union mmio_xt_intr {
+    uint64_t val;
+    struct {
+        uint64_t rsvd_1:2,
+                 destination_mode:1,
+                 rsvd_2:5,
+                 destination_lo:24,
+                 vector:8,
+                 delivery_mode:1,
+                 rsvd_3:15,
+                 destination_hi:8;
+    };
+};

We should define this union mmio_xt_intr in amd_iommu.h header, rightafter struct irte_ga.


Right.

+
+static void amdvi_build_xt_msi_msg(AMDVIState *s, MSIMessage *msg)
+{
+    union mmio_xt_intr xt_reg;
+    struct X86IOMMUIrq irq;
+
+    xt_reg.val = amdvi_readq(s, AMDVI_MMIO_XT_GEN_INTR);
+
+    irq.vector = xt_reg.vector;
+    irq.delivery_mode = xt_reg.delivery_mode;
+    irq.dest_mode = xt_reg.destination_mode;
+    irq.dest = (xt_reg.destination_hi << 24) | xt_reg.destination_lo;
+    irq.trigger_mode = 0;
+    irq.redir_hint = 0;
+
Based on my reading from the MSI details, hardcoding redir_hint=0results in the dest_mode field essentially being a nop, since themessages will be delivered in physical mode i.e. only to the APIC IDin the dest field.
From Intel's SDM Vol3, 12.11.1 Message Address Register Format:
• If RH is 0, then the DM bit is ignored and the message is sent aheadindependent of whether the physical or logical destination mode is used.

This statement sounds ambiguious. My understanding is that the RH fieldis used to determinethe "lowest priority processor" among all possible cpus that can receivethe interrupt. This meansthat in case of physical destination mode, it should forward theinterrupt to the cpu with matchingdestination mode but in case of logical destination mode it shouldforward the interrupt to the

cpu with lowest priority.

I confirmed this by looking at kernel KVM function -->kvm_irq_delivery_to_apic(). So I think thewhat SDM is trying to tell by "DM bit is ignored is that" is that itwill not try to find the cpuwith lowest priority interrupt but rather forward it to the APIChardware directly (i.e. to all the

logical processors in case of logical destination mode).

So we don't need to add a error report here.

I am not sure if the current implementations of AMD IOMMU drivers everuse the logical mode, but I am thinking we should at least catch thatcase with a warning e.g.
if (xt_reg.destination_mode) {
    error_report_once(...
}
A follow up question, since you chose to explicitly set redir_hint to0, and something that bothers me about the remap functions is thatthey set:
irq->redir_hint = irte.lo.fields_remap.rq_eoi;
where AFAICT, redir_hint and rq_eoi are semantically different andcontrol unrelated behaviors. So I've been wondering why theseassignments are done. No need to answer this specifically, but if youhave a better understanding of it please let me know.

Currently I don't have idea about this. The specification does notprovide any relation between

rq_eoi and redir_hint.

+ x86_iommu_irq_to_msi_message(&irq, msg);
+}
+
  static void amdvi_generate_msi_interrupt(AMDVIState *s)
  {
      MSIMessage msg = {};
-    MemTxAttrs attrs = {
-        .requester_id = pci_requester_id(&s->pci->dev)
-    };
  -    if (msi_enabled(&s->pci->dev)) {
+    if (s->intcapxten) {
+        trace_amdvi_generate_msi_interrupt("XT GEN");
+        amdvi_build_xt_msi_msg(s, &msg);
+    } else if (msi_enabled(&s->pci->dev)) {
+        trace_amdvi_generate_msi_interrupt("MSI");
          msg = msi_get_message(&s->pci->dev, 0);

- address_space_stl_le(&address_space_memory, msg.address,msg.data,

-                             attrs, NULL);
+    } else {
+        trace_amdvi_generate_msi_interrupt("NO MSI");
+        return;
      }
+    apic_get_class(NULL)->send_msi(&msg);

This is great. The method of writing to the address space directlystill confuses me, using the APIC helper for MSI delivery seems to bethe appropriate way.

  }
    static uint32_t get_next_eventlog_entry(AMDVIState *s)

@@ -1490,6 +1524,7 @@ static inline void amdvi_mmio_get_name(hwaddraddr,

      MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_BASE, name)
      MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_HEAD, name)
      MMIO_REG_TO_STRING(AMDVI_MMIO_PPR_TAIL, name)
+    MMIO_REG_TO_STRING(AMDVI_MMIO_XT_GEN_INTR, name)
      default:
          name = "UNHANDLED";
      }

@@ -1549,6 +1584,7 @@ static voidamdvi_handle_control_write(AMDVIState *s)

                          AMDVI_MMIO_CONTROL_CMDBUFLEN);
      s->ga_enabled = !!(control & AMDVI_MMIO_CONTROL_GAEN);
      s->xten = !!(control & AMDVI_MMIO_CONTROL_XTEN) && s->xtsup;

+ s->intcapxten = !!(control & AMDVI_MMIO_CONTROL_INTCAPXTEN) &&s->xtsup;

I think that s->intcapxten should check for s->xten instead ofs->xtsup, that way we create only one dependency on the inputparameter, and the rest flows logically from what the guest driverconfigures.


I did think about it quite a bit, the specs does not provide any dependency
between intcapxten and xtsup, that's why I avoided.

On that topic (I should have mentioned this on patch 2), I think thatit might be reasonable to make s->xten also conditional on whethers->ga_enabled is true. The spec just says they should both be set:
"In systems with x2APIC enabled, software must set MMIO 0x18[XTEn]=1and MMIO 0x18[GAEn]=1.This enables the use of the 128-bit IRTE formatwith 32-bit destination field. Even if Guest Virtual APIC
will not be used, software must set MMIO 0x18[GAEn]=1.
"
my understanding is that we can't support X2APIC mode without 128-bitIRTE format (which is controlled by ga_enabled), so it makes sense toblock X2APIC (i.e. xten) unless ga_enabled is already set.

Yep I agree with this one, we can enforce ga_enabled but then we cannothave


 s->intcapxten = !!(control & AMDVI_MMIO_CONTROL_INTCAPXTEN) && s->xten;

because it'll create a dependency between s->intcapxten and s->ga_enabled


So in the end we'd have something like:

s->ga_enabled = !!(control & AMDVI_MMIO_CONTROL_GAEN);
s->xten = !!(control & AMDVI_MMIO_CONTROL_XTEN) &&
      s->xtsup && s->ga_enabled;  // this should go in PATCH 2
s->intcapxten = !!(control & AMDVI_MMIO_CONTROL_INTCAPXTEN) && s->xten;

What do you think?

PS: just a small formatting suggestion: Please keep the charater perline within

~72 so that its easier to read on the email client

Thanks
Sairaj

Re: [PATCH 3/3] amd_iommu: Generate XT interrupts when xt support is enabled

Reply via email to