** Description changed:

+ [Impact]
+ 
+ Attempts to hotplug devices shared to userspace (qemu) via vfio triggers
+ a deadlock in the kernel.  A reboot is required to resolve this.
+ 
+ [Test Case]
+ 
+ Set up a KVM instance with attached devices, attempt to hotplug those
+ using ipmitool.
+ 
+ [Regression Potential]
+ 
+ The change is to an uncommonly used driver.  There is common code
+ changes, but these are a noop in the normal case and should be easy to
+ confirm basic operation.
+ 
+ [Other Info]
+  
+ This fix has been verified by the reporter as fixing the deadlock.
+ 
+ ===
+ 
  We are seeing deadlocks during hotplug of devices under vfio.
- 
  
  As per the Linux kernel source code, there is a deadlock situation
  between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events.
  This issue can be avoided either by skipping the PCIe reset
  functionality or do device_unlock() in vfio_pci_remove() beforfe calling
  the function vfio_del_group_dev()().
  
  Code flow on PCIe hotplug event:
  
  Execution flow 1:
-   device_release_driver() ( ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
-    device_release_driver_internal() ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
-    device_lock(dev); ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
-    vfio_pci_remove() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
-      vfio_del_group_dev() 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
-        send event request to user and wait for VFIO_PCI_DEVICE release in 
vfio_pci_release() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )
+   device_release_driver() ( ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
+    device_release_driver_internal() ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
+    device_lock(dev); ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
+    vfio_pci_remove() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
+      vfio_del_group_dev() 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
+        send event request to user and wait for VFIO_PCI_DEVICE release in 
vfio_pci_release() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )
  
  Execution flow 2 triggered by above step "send event request to user":
-   vfio_pci_releas() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
-     vfio_pci_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302
 )
-       vfio_pci_try_bus_reset() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346
 )
-         pci_try_reset_bus() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
-           pci_bus_save_and_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
-             pci_dev_lock(dev); ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )
+   vfio_pci_releas() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
+     vfio_pci_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302
 )
+       vfio_pci_try_bus_reset() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346
 )
+         pci_try_reset_bus() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
+           pci_bus_save_and_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
+             pci_dev_lock(dev); ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )
  
-              DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE
+              DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE
  remove code path in DD.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1792099

Title:
  device hotplug of vfio devices can lead to deadlock in
  vfio_pci_release

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]

  Attempts to hotplug devices shared to userspace (qemu) via vfio
  triggers a deadlock in the kernel.  A reboot is required to resolve
  this.

  [Test Case]

  Set up a KVM instance with attached devices, attempt to hotplug those
  using ipmitool.

  [Regression Potential]

  The change is to an uncommonly used driver.  There is common code
  changes, but these are a noop in the normal case and should be easy to
  confirm basic operation.

  [Other Info]
   
  This fix has been verified by the reporter as fixing the deadlock.

  ===

  We are seeing deadlocks during hotplug of devices under vfio.

  As per the Linux kernel source code, there is a deadlock situation
  between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug
  events. This issue can be avoided either by skipping the PCIe reset
  functionality or do device_unlock() in vfio_pci_remove() beforfe
  calling the function vfio_del_group_dev()().

  Code flow on PCIe hotplug event:

  Execution flow 1:
    device_release_driver() ( ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
     device_release_driver_internal() ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
     device_lock(dev); ( 
https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
     vfio_pci_remove() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
       vfio_del_group_dev() 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
         send event request to user and wait for VFIO_PCI_DEVICE release in 
vfio_pci_release() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )

  Execution flow 2 triggered by above step "send event request to user":
    vfio_pci_releas() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392
 )
      vfio_pci_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302
 )
        vfio_pci_try_bus_reset() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346
 )
          pci_try_reset_bus() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
            pci_bus_save_and_disable() ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
              pci_dev_lock(dev); ( 
https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )

               DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE
  remove code path in DD.c

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792099/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to