On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of 
> virtio pci device").
> 
> Virtio drivers and PCI devices have never fully supported true
> surprise (aka hot unplug) removal. Drivers historically continued
> processing and waiting for pending I/O and even continued synchronous
> device reset during surprise removal. Devices have also continued
> completing I/Os, doing DMA and allowing device reset after surprise
> removal to support such drivers.
> 
> Supporting it correctly would require a new device capability

If a device is removed, it is removed. Windows drivers supported
this since forever and it's just a Linux bug that it does not
handle all the cases. This is not something you can handle
with a capability.





> and
> driver negotiation in the virtio specification to safely stop
> I/O and free queue memory. Failure to do so either breaks all the
> existing drivers with call trace listed in the commit or crashes the
> host on continuing the DMA.

If the device is gone, then DMA does not continue.

IIUC what is going on for you, is that you have developed a surprise
removal emulation that pretends to remove the device but
actually the device is doing DMA. So of course things break then.

> Hence, until such specification and devices
> are invented, restore the previous behavior of treating surprise
> removal as graceful removal to avoid regressions and maintain system
> stability same as before the
> commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci 
> device").
> 
> As explained above, previous analysis of solving this only in driver
> was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
> 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> is still the best stand to restore failures of virtio net and
> block devices.
> 
> [1] 
> https://lore.kernel.org/virtualization/cy8pr12mb719506cc5613eb100bc6c638dc...@cy8pr12mb7195.namprd12.prod.outlook.com/#t


I can only repeat what I said then, this is not how we do kernel
development.

> [2] 
> https://lore.kernel.org/virtualization/20250602024358.57114-1-pa...@nvidia.com/

What was missing here, is handling corner cases. So let us please 
try to handle them.

Here is how I would try to do it:

- add a new driver callback
- start a periodic timer task in virtio core on remove
- in the timer, probe that the device is still present.
  if not, invoke a driver callback
- cancel the task on device reset

If you do not have the time, let me know and I will try to look into it.

> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci 
> device")
> Cc: sta...@vger.kernel.org
> Reported-by: lirongq...@baidu.com
> Closes: 
> https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4...@baidu.com/
> Signed-off-by: Parav Pandit <pa...@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>       struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>       struct device *dev = get_device(&vp_dev->vdev.dev);
>  
> -     /*
> -      * Device is marked broken on surprise removal so that virtio upper
> -      * layers can abort any ongoing operation.
> -      */
> -     if (!pci_device_is_present(pci_dev))
> -             virtio_break_device(&vp_dev->vdev);
> -
>       pci_disable_sriov(pci_dev);
>  
>       unregister_virtio_device(&vp_dev->vdev);
> -- 
> 2.26.2


Reply via email to