On Sun, Sep 18, 2016 at 01:33:27PM +0800, Peter Xu wrote: > Good to know that we have a solution. :) > > I think this is a good fix, however I still do not understand why this > is happening... Please see below comment. > > On Thu, Sep 15, 2016 at 04:11:48PM +1000, David Gibson wrote: > > d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup > > of kvmchip routing configuration, that was mostly intended for x86. > > However, it also contains a subtle change in behaviour which breaks EEH[1] > > error recovery on certain VFIO passthrough devices on spapr guests. So far > > it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be > > other hardware with the same problem. It's also possible there could be > > circumstances where it causes a bug on x86 as well, though I don't know of > > any obvious candidates. > > > > Prior to d1f6af6, both vfio_msix_vector_do_use() and > > vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this > > as the "dummy" vector used to make the host hardware state sync with the > > guest expected hardware state in terms of MSI configuration. > > > > Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, > > meaning the dummy irq would always be delivered via qemu. > > AFAICT, QEMU is not delivering that *dummy* IRQ as well. IIUC here the > dummy IRQ refers to the one in vfio_msix_enable(): > > vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL); > vfio_msix_vector_release(&vdev->pdev, 0); > > Here we registered this dummy one to sync up the guest and host > hardware state for some reason. However, the handler is NULL here, > means that even QEMU won't handle it if it happens. And the only > difference is that during this extremely small period, kvm will handle > the interrupt with an uninitialized MSI message, but assuming that > would never happen since no one should start to use MSI yet. After > release(), interrupts will be dropped for all cases. > > So looks like it is not related to "whether QEMU or KVM will handle > the interrupt", but something else. > > Generally speaking, the process of registering the IRQ should be > something like (using QEMU with d1f6af6 change): > > 1. vfio_msix_enable(): when MSIX is enabled. this will trigger > vfio_add_kvm_msi_virq(), but quickly it is unregistered (virq will > be there, but VFIO will assign the VFIO handler to > vector->interrupt, rather than vector->kvm_interrupt, so that virq > won't take effect). > > 2. vfio_msix_vector_use(): when an MSIX entry is finally used and > setup. this will trigger vfio_update_kvm_msi_virq(), to update the > interrupt information to the "real one". > > IMHO we should always receive interrupts after step (2). And as we > have traced (with David), step (2) was updating the correct interrupt > information even with commit d1f6af6. But I think I might be wrong (or > say, the above assumption is not correct), because commit d1f6af6 > indeed caused problem with EHH and this specific hardware. I am just > do not know why it is triggering the issue. And that's why I want to > know whether there is anything special with that NIC's driver.
It's quite possible there's something unusual about the driver - this didn't happen for the other NIC I tried, or for an XHCI. Unfortunately, I don't know what it would be, and I really have no idea where one would start looking. > Again, this is totally a discussion only, and I am totally agree with > current change to fix the issue. Just in case someone knows the reason > behind this. > > Thanks, > > -- peterx > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature