On Wed, 2023-03-08 at 18:09 -0500, Peter Xu wrote: > On Mon, Mar 06, 2023 at 05:28:24PM +0000, David Woodhouse wrote: > > Indeed, I don't think we care about the in-kernel I/OAPIC. I don't > > think we care about the kernel knowing about e.g. "GSI #11" at all. We > > can just deliver it as MSI (for the I/OAPIC) or using KVM_INTERRUPT and > > the interrupt window as we do for the PIC. Which is why I'd happily rip > > that out and let it be delivered via the APIC intercept at 0xfeexxxxx. > > > > The existing code which just keeps IRQ routes updated when they're > > valid is kind of OK, and well-behaved guests can function. But it isn't > > *right* in the case where they aren't valid. > > > > What *ought* to happen is that the IOMMU should raise a fault at the > > moment the interrupt occurs, if the translation isn't valid. And we > > don't have that at all. > > Right, that's definitely not ideal as an emulator. > > > > > As for why I care? I don't really *need* it, as I have everything I > > need for Xen PIRQ support already merged in > > https://gitlab.com/qemu-project/qemu/-/commit/6096cf7877 > > > > So while the thread at > > https://lore.kernel.org/qemu-devel/aaef9961d210ac1873153bf3cf01d984708744fc.ca...@infradead.org/ > > was partly driven by expecting to need this for Xen PIRQ support > > (because in $DAYJOB I did those things in the other order and the PIRQ > > support ended up just being a trivial different translator like the > > IOMMU's IR)... I'd still quite like to fix it up in QEMU anyway, just > > for correctness and fidelity in the faulting cases. > > > > We can do more efficient invalidation too, rather than blowing away the > > entire routing table every time. Just disconnect the IRQFD for the > > specific interrupts that get invalidated, and let them get fixed up > > again next time they occur. > > I'm curious whether there's anything else beside the "correctness of > emulation" reason.
Nah, at this point it's just because it annoys me. I did this in another VMM and it works nicely, and I'd like QEMU to do the same. ;) > I would think it nice if it existed or trivial to have as what you said. I > just don't know whether it's as easy, at least so far a new kernel > interface seems still needed, allowing a kernel irq to be paused until > being translated by QEMU from some channel we provide. It doesn't need a new kernel interface. With IRQ remapping we have a userspace I/OAPIC, so how we deliver its MSIs is entirely up to us — we absolutely don't need the kernel to have *any* KVM_IRQ_ROUTING_IRQCHIP entries. The only IRQs that are handled fully in the kernel are events arriving on some eventfd which is bound as an IRQFD to some IRQ in the KVM routing table. (Mostly MSIs coming from VFIO). If we want to "pause" one of those, all we have to do is unbind the IRQFD and then we can handle that eventfd in userspace instead. Which is what we do *anyway* in the case where IRQFD support isn't available. In https://lore.kernel.org/kvm/20201027143944.648769-1-dw...@infradead.org/ I *optimised* that dance, so userspace doesn't have to stop listening on the eventfd when the IRQFD is bound; the IRQFD code consumes the event first. But we can live without that in older kernels. Basically, it's just treating the IRQFD support as an *optimisation*, and retaining the 'slow path' of handling the event in userspace and injecting the resulting MSI. It's just that in that slow path, as we're translating and injecting the MSI, we *also* update the IRQ routing table and reattach the IRQFD for the next time that interrupt fires. Like a cache. And we stash an invalidation cookie (for Intel-IOMMU the IRTE index) for each routing table entry, so that when asked to invalidate a *range* of cookies (that's how the Intel IOMMU invalidate works), we can just detach the IRQFD from those ones, letting them get handled in userspace next time and retranslated (with associated fault report, as appropriate).
smime.p7s
Description: S/MIME cryptographic signature