This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1758378 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1758378 Title: [Hyper-V] PCI: hv: Fix 2 hang issues in hv_compose_msi_msg Status in linux package in Ubuntu: Incomplete Status in linux-azure package in Ubuntu: Fix Released Status in linux-azure-edge package in Ubuntu: Invalid Status in linux source package in Xenial: Invalid Status in linux-azure source package in Xenial: Fix Released Status in linux-azure-edge source package in Xenial: Fix Released Status in linux source package in Bionic: In Progress Status in linux-azure source package in Bionic: Fix Released Status in linux-azure-edge source package in Bionic: Invalid Bug description: We've identified some issues in recent testing against upstream 4.15 SR-IOV and DPDK. The following commits are in Lorenzo's PCI tree on their way into 4.16 and stable: Tree: https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/log/?h=pci/hv PCI: hv: Only queue new work items in hv_pci_devices_present() if necessary If there is pending work in hv_pci_devices_present() we just need to add the new dr entry into the dr_list. Add a check to detect pending work items and update the code to skip queuing work if pending work items are detected. https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=948373b3ed1bcf05a237c24675b84804315aff14 PCI: hv: Remove the bogus test in hv_eject_device_work() When kernel is executing hv_eject_device_work(), hpdev->state value must be hv_pcichild_ejecting; any other value would consist in a bug, therefore replace the bogus check with an explicit WARN_ON() on the condition failure detection. https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=fca288c0153b2b97114b9081bc3c33c3735145b6 PCI: hv: Fix a comment typo in _hv_pcifront_read_config() Comment in _hv_pcifront_read_config() contains a typo, fix it. No functional change. https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=df3f2159f4e4146d40b244725ce79ed921530b99 PCI: hv: Fix 2 hang issues in hv_compose_msi_msg() 1. With the patch "x86/vector/msi: Switch to global reservation mode", the recent v4.15 and newer kernels always hang for 1-vCPU Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg() by request_irq() -> request_threaded_irq() ->__setup_irq()->irq_startup() -> __irq_startup() -> irq_domain_activate_irq() -> ... -> msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is disabled in __setup_irq(). Note: when we reach hv_compose_msi_msg() by another code path: pci_enable_msix_range() -> ... -> irq_domain_activate_irq() -> ... -> hv_compose_msi_msg(), local irq is not disabled. hv_compose_msi_msg() depends on an interrupt from the host. With interrupts disabled, a UP VM always hangs in the busy loop in the function, because the interrupt callback hv_pci_onchannelcallback() can not be called. We can do nothing but work it around by polling the channel. This is ugly, but we don't have any other choice. 2. If the host is ejecting the VF device before we reach hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() forever, because at this time the host doesn't respond to the CREATE_INTERRUPT request. This issue exists the first day the pci-hyperv driver appears in the kernel. Luckily, this can also by worked around by polling the channel for the PCI_EJECT message and hpdev->state, and by checking the PCI vendor ID. Note: actually the above 2 issues also happen to a SMP VM, if "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=de0aa7b2f97d348ba7d1e17a00744c989baa0cb6 PCI: hv: Serialize the present and eject work items When we hot-remove the device, we first receive a PCI_EJECT message and then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. The first message is offloaded to hv_eject_device_work(), and the second is offloaded to pci_devices_present_work(). Both the paths can be running list_del(&hpdev->list_entry), causing general protection fault, because system_wq can run them concurrently. The patch eliminates the race condition. Since access to present/eject work items is serialized, we do not need the hbus->enum_sem anymore, so remove it. https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=021ad274d7dc31611d4f47f7dd4ac7a224526f30 All 4.15-based kernels need these fixes, or any kernels that picked up: Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode") The race condition fixed by the serialization patch applies to all kernels with PCI passthrough on Hyper-V: Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") (the catch-all for PCI passthrough) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1758378/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp