Correction -- the desynchronization appears to be on the DisINTx line.

Host:
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-

Guest:
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-

This is with the driver stuck, not receiving any interrupts in the guest 
despite the card issuing them every 1ms.

----- Original Message -----
> From: "Timothy Pearson" <tpear...@raptorengineering.com>
> To: "qemu-devel" <qemu-devel@nongnu.org>
> Sent: Friday, March 11, 2022 12:35:45 PM
> Subject: XIVE VFIO kernel resample failure in INTx mode under heavy load

> All,
> 
> I've been struggling for some time with what is looking like a potential bug 
> in
> QEMU/KVM on the POWER9 platform.  It appears that in XIVE mode, when the
> in-kernel IRQ chip is enabled, an external device that rapidly asserts IRQs 
> via
> the legacy INTx level mechanism will only receive one interrupt in the KVM
> guest.
> 
> Changing any one of those items appears to avoid the glitch, e.g. XICS mode 
> with
> the in-kernel IRQ chip works (all interrupts are passed through), and XIVE 
> mode
> with the in-kernel IRQ chip disabled also works.  We are also not seeing any
> problems in XIVE mode with the in-kernel chip from MSI/MSI-X devices.
> 
> The device in question is a real time card that needs to raise an interrupt
> every 1ms.  It works perfectly on the host, but fails in the guest -- with the
> in-kernel IRQ chip and XIVE enabled, it receives exactly one interrupt, at
> which point the host continues to see INTx+ but the guest sees INTX-, and the
> IRQ handler in the guest kernel is never reentered.
> 
> We have also seen some very rare glitches where, over a long period of time, 
> we
> can enter a similar deadlock in XICS mode.  Disabling the in-kernel IRQ chip 
> in
> XIVE mode will also lead to the lockup with this device, since the userspace
> IRQ emulation cannot keep up with the rapid interrupt firing (measurements 
> show
> around 100ms required for processing each interrupt in the user mode).
> 
> My understanding is the resample mechanism does some clever tricks with level
> IRQs, but that QEMU needs to check if the IRQ is still asserted by the device
> on guest EOI.  Since a failure here would explain these symptoms I'm wondering
> if there is a bug in either QEMU or KVM for POWER / pSeries (SPAPr) where the
> IRQ is not resampled and therefore not re-fired in the guest?
> 
> Unfortunately I lack the resources at the moment to dig through the QEMU
> codebase and try to find the bug.  Any IBMers here that might be able to help
> out?  I can provide access to a test setup if desired.
> 
> Thanks!

Reply via email to