Good to know that we have a solution. :)

I think this is a good fix, however I still do not understand why this
is happening... Please see below comment.

On Thu, Sep 15, 2016 at 04:11:48PM +1000, David Gibson wrote:
> d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
> of kvmchip routing configuration, that was mostly intended for x86.
> However, it also contains a subtle change in behaviour which breaks EEH[1]
> error recovery on certain VFIO passthrough devices on spapr guests.  So far
> it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
> other hardware with the same problem.  It's also possible there could be
> circumstances where it causes a bug on x86 as well, though I don't know of
> any obvious candidates.
> 
> Prior to d1f6af6, both vfio_msix_vector_do_use() and
> vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
> as the "dummy" vector used to make the host hardware state sync with the
> guest expected hardware state in terms of MSI configuration.
> 
> Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
> meaning the dummy irq would always be delivered via qemu.

AFAICT, QEMU is not delivering that *dummy* IRQ as well. IIUC here the
dummy IRQ refers to the one in vfio_msix_enable():

    vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
    vfio_msix_vector_release(&vdev->pdev, 0);

Here we registered this dummy one to sync up the guest and host
hardware state for some reason. However, the handler is NULL here,
means that even QEMU won't handle it if it happens. And the only
difference is that during this extremely small period, kvm will handle
the interrupt with an uninitialized MSI message, but assuming that
would never happen since no one should start to use MSI yet. After
release(), interrupts will be dropped for all cases.

So looks like it is not related to "whether QEMU or KVM will handle
the interrupt", but something else.

Generally speaking, the process of registering the IRQ should be
something like (using QEMU with d1f6af6 change):

1. vfio_msix_enable(): when MSIX is enabled. this will trigger
   vfio_add_kvm_msi_virq(), but quickly it is unregistered (virq will
   be there, but VFIO will assign the VFIO handler to
   vector->interrupt, rather than vector->kvm_interrupt, so that virq
   won't take effect).

2. vfio_msix_vector_use(): when an MSIX entry is finally used and
   setup. this will trigger vfio_update_kvm_msi_virq(), to update the
   interrupt information to the "real one".

IMHO we should always receive interrupts after step (2). And as we
have traced (with David), step (2) was updating the correct interrupt
information even with commit d1f6af6. But I think I might be wrong (or
say, the above assumption is not correct), because commit d1f6af6
indeed caused problem with EHH and this specific hardware. I am just
do not know why it is triggering the issue. And that's why I want to
know whether there is anything special with that NIC's driver.

Again, this is totally a discussion only, and I am totally agree with
current change to fix the issue. Just in case someone knows the reason
behind this.

Thanks,

-- peterx

Reply via email to