> From: Michael S. Tsirkin <m...@redhat.com> > Sent: 24 August 2025 08:03 PM > To: Parav Pandit <pa...@nvidia.com> > Cc: virtualization@lists.linux.dev; jasow...@redhat.com; > stefa...@redhat.com; pbonz...@redhat.com; xuanz...@linux.alibaba.com; > sta...@vger.kernel.org; Max Gurtovoy <mgurto...@nvidia.com>; NBU- > Contact-Li Rongqing (EXTERNAL) <lirongq...@baidu.com> > Subject: Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio > pci > device" > > On Sun, Aug 24, 2025 at 02:36:23AM +0000, Parav Pandit wrote: > > > > > From: Michael S. Tsirkin <m...@redhat.com> > > > Sent: 22 August 2025 07:30 PM > > > > > > On Fri, Aug 22, 2025 at 01:49:36PM +0000, Parav Pandit wrote: > > > > > > > > > From: Michael S. Tsirkin <m...@redhat.com> > > > > > Sent: 22 August 2025 06:34 PM > > > > > > > > > > On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote: > > > > > > > From: Michael S. Tsirkin <m...@redhat.com> > > > > > > > Sent: 22 August 2025 03:52 PM > > > > > > > > > > > > > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote: > > > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support > > > > > > > > surprise removal of > > > > > > > virtio pci device"). > > > > > > > > > > > > > > > > Virtio drivers and PCI devices have never fully supported > > > > > > > > true surprise (aka hot unplug) removal. Drivers > > > > > > > > historically continued processing and waiting for pending > > > > > > > > I/O and even continued synchronous device reset during surprise > removal. > > > > > > > > Devices have also continued completing I/Os, doing DMA and > > > > > > > > allowing device reset after surprise removal to support such > drivers. > > > > > > > > > > > > > > > > Supporting it correctly would require a new device > > > > > > > > capability > > > > > > > > > > > > > > If a device is removed, it is removed. > > > > > > This is how it was implemented and none of the virtio drivers > supported it. > > > > > > So vendors had stepped away from such device implementation. > > > > > > (not just us). > > > > > > > > > > > > > > > If the slot does not have a mechanical interlock, I can pull the > > > > > device out. It's not up to a device implementation. > > > > > > > > Sure yes, stack is not there yet to support it. > > > > Each of the virtio device drivers are not there yet. > > > > Lets build that infra, let device indicate it and it will be > > > > smooth ride for driver > > > and device. > > > > > > There is simply no way for the device to "support" for surprise > > > removal, or lack such support thereof. > > > The support is up to the slot, not the device. Any pci compliant > > > device can be placed in a slot that allows surprise removal and that > > > is all. The user can then remove the device. > > > Software can then either recover gracefully - it should - or hang or > > > crash - it does sometimes, now. The patch you are trying to revert > > > is an attempt to move some use-cases from the 1st to the 2nd category. > > > > > It is the driver (and not the device) who needs to tell the device that it > > will > do sane cleanup and not wait infinitely. > > You can invent a way for driver to tell the device that it is not broken. But > even > if the driver does not do it, nothing at all prevents users from removing the > device. Sure. Vendors will implement the device accordingly based on the driver's config. If driver tells that it is ready for surprise removal, the device will implement functionality accordingly. And user can do anything it wants. But user will have indication and knowledge from the device, when not to break the system.
> > > > > But what is going on now, as far as I could tell, is that someone > > > developed a surprise removal emulation that does not actually remove > > > the device, and is using that for testing the code in linux that supports > surprise removal. > > Nop. Your analysis is incorrect. > > And I explained you that already. > > The device implementation supports correct implementation where device > stops all the dma and also does not support register access. > > And no single virtio driver supported that. > > > > On a surprised removed device, driver expects I/Os to complete and this is > beyond a 'bug fix' watermark. > > > > > That > > > weird emulation seems to lead to all kind of weird issues. You > > > answer is to remove the existing code and tell your testing team "we > > > do not support surprise removal". > > > > > He he, it is no the device, it is the driver that does not support surprise > removal as you can see in your proposed patches and other sw changes. > > Then fix the driver. Or don't, for that matter, if you lack the time. > I explained, it is not a fix. It is a completely new implementation of the infrastructure that spans, virtio, pci or core subsystem and each individual device types. > > > But just go ahead and tell this to them straight away. You do not > > > need this patch for this. > > > > > It is needed until infrastructure in multiple subsystem is built. > > What I do not understand, is what good does the revert do. Sorry. > Let me explain. It prevents the issue of vblk requests being stuck due to broken VQ. It prevents the vnet driver start_xmit() to be not stuck on skb completions. And virtio stack works exactly the way it was working before the commit. > > > > > > Or better still, let's fix the issues please. > > > > > The implementation is more than a fix category for stable kernels. > > Hence, what is asked is to do proper implementation for future kernels and > until that point restore the bad use experience. > > > > I am not at all interested in discussing ease of backporting fixes before they > are developed. Your attribution of "fixing a problem" for the missing new implementation that span across multiple different subsystem (virtio generic, block net) is not accurate. I understand that such confusion arises from your previous email regarding physical removal of the device... Hope I explained that its virtual system where user has not removed the card physically. Driver is expecting requests to complete and attempting device reset too... > Not how we do kernel development, sorry. > The ask is to revert a patch that broke the virtio stack. No one is asking to backport new implementation and new spec to old kernels.