Public bug reported:

We have reports of a kdump failure in Ubuntu (in x86 machine) that was
narrowed down to a MSI irq storm coming from a PCI network device.

The bug manifests as a lack of progress in the boot process of the kdump
kernel, and a storm of kernel messages like:

[...]
[  342.265294] do_IRQ: 0.155 No irq handler for vector
[  342.266916] do_IRQ: 0.155 No irq handler for vector
[  347.258422] do_IRQ: 14053260 callbacks suppressed
[...]

The root cause of the the issue is that the kdump kernel kexec process
does not ensure PCI devices are reset and/or MSI capabilities are
disabled, so a PCI device could produce a huge amount of PCI irqs which
would take all the processing time for the CPU (specially since we
restrict the kdump kernel to use one single CPU only).

This was tested using upstream kernel version 4.18, and the problem reproduces.
In the specific test scenario, the PCI NIC was an "Intel 82599ES 10-Gigabit 
[8086:10fb]" that was used in SR-IOV PCI passthrough mode (vfio_pci), under 
high load on the guest.

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Guilherme G. Piccoli (gpiccoli)
         Status: Confirmed


** Tags: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1797990

Title:
  kdump fail due to an IRQ storm

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to