Correction -- the desynchronization appears to be on the DisINTx line. Host: Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Guest: Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- This is with the driver stuck, not receiving any interrupts in the guest despite the card issuing them every 1ms. ----- Original Message ----- > From: "Timothy Pearson" <tpear...@raptorengineering.com> > To: "qemu-devel" <qemu-devel@nongnu.org> > Sent: Friday, March 11, 2022 12:35:45 PM > Subject: XIVE VFIO kernel resample failure in INTx mode under heavy load > All, > > I've been struggling for some time with what is looking like a potential bug > in > QEMU/KVM on the POWER9 platform. It appears that in XIVE mode, when the > in-kernel IRQ chip is enabled, an external device that rapidly asserts IRQs > via > the legacy INTx level mechanism will only receive one interrupt in the KVM > guest. > > Changing any one of those items appears to avoid the glitch, e.g. XICS mode > with > the in-kernel IRQ chip works (all interrupts are passed through), and XIVE > mode > with the in-kernel IRQ chip disabled also works. We are also not seeing any > problems in XIVE mode with the in-kernel chip from MSI/MSI-X devices. > > The device in question is a real time card that needs to raise an interrupt > every 1ms. It works perfectly on the host, but fails in the guest -- with the > in-kernel IRQ chip and XIVE enabled, it receives exactly one interrupt, at > which point the host continues to see INTx+ but the guest sees INTX-, and the > IRQ handler in the guest kernel is never reentered. > > We have also seen some very rare glitches where, over a long period of time, > we > can enter a similar deadlock in XICS mode. Disabling the in-kernel IRQ chip > in > XIVE mode will also lead to the lockup with this device, since the userspace > IRQ emulation cannot keep up with the rapid interrupt firing (measurements > show > around 100ms required for processing each interrupt in the user mode). > > My understanding is the resample mechanism does some clever tricks with level > IRQs, but that QEMU needs to check if the IRQ is still asserted by the device > on guest EOI. Since a failure here would explain these symptoms I'm wondering > if there is a bug in either QEMU or KVM for POWER / pSeries (SPAPr) where the > IRQ is not resampled and therefore not re-fired in the guest? > > Unfortunately I lack the resources at the moment to dig through the QEMU > codebase and try to find the bug. Any IBMers here that might be able to help > out? I can provide access to a test setup if desired. > > Thanks!