On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote: > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of > virtio pci device"). > > Virtio drivers and PCI devices have never fully supported true > surprise (aka hot unplug) removal. Drivers historically continued > processing and waiting for pending I/O and even continued synchronous > device reset during surprise removal. Devices have also continued > completing I/Os, doing DMA and allowing device reset after surprise > removal to support such drivers. > > Supporting it correctly would require a new device capability
If a device is removed, it is removed. Windows drivers supported this since forever and it's just a Linux bug that it does not handle all the cases. This is not something you can handle with a capability. > and > driver negotiation in the virtio specification to safely stop > I/O and free queue memory. Failure to do so either breaks all the > existing drivers with call trace listed in the commit or crashes the > host on continuing the DMA. If the device is gone, then DMA does not continue. IIUC what is going on for you, is that you have developed a surprise removal emulation that pretends to remove the device but actually the device is doing DMA. So of course things break then. > Hence, until such specification and devices > are invented, restore the previous behavior of treating surprise > removal as graceful removal to avoid regressions and maintain system > stability same as before the > commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci > device"). > > As explained above, previous analysis of solving this only in driver > was incomplete and non-reliable at [1] and at [2]; Hence reverting commit > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") > is still the best stand to restore failures of virtio net and > block devices. > > [1] > https://lore.kernel.org/virtualization/cy8pr12mb719506cc5613eb100bc6c638dc...@cy8pr12mb7195.namprd12.prod.outlook.com/#t I can only repeat what I said then, this is not how we do kernel development. > [2] > https://lore.kernel.org/virtualization/20250602024358.57114-1-pa...@nvidia.com/ What was missing here, is handling corner cases. So let us please try to handle them. Here is how I would try to do it: - add a new driver callback - start a periodic timer task in virtio core on remove - in the timer, probe that the device is still present. if not, invoke a driver callback - cancel the task on device reset If you do not have the time, let me know and I will try to look into it. > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci > device") > Cc: sta...@vger.kernel.org > Reported-by: lirongq...@baidu.com > Closes: > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4...@baidu.com/ > Signed-off-by: Parav Pandit <pa...@nvidia.com> > --- > drivers/virtio/virtio_pci_common.c | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/drivers/virtio/virtio_pci_common.c > b/drivers/virtio/virtio_pci_common.c > index d6d79af44569..dba5eb2eaff9 100644 > --- a/drivers/virtio/virtio_pci_common.c > +++ b/drivers/virtio/virtio_pci_common.c > @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); > struct device *dev = get_device(&vp_dev->vdev.dev); > > - /* > - * Device is marked broken on surprise removal so that virtio upper > - * layers can abort any ongoing operation. > - */ > - if (!pci_device_is_present(pci_dev)) > - virtio_break_device(&vp_dev->vdev); > - > pci_disable_sriov(pci_dev); > > unregister_virtio_device(&vp_dev->vdev); > -- > 2.26.2