> From: Michael S. Tsirkin <m...@redhat.com>
> Sent: 24 August 2025 08:00 PM
> 
> On Sun, Aug 24, 2025 at 02:36:11AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <m...@redhat.com>
> > > Sent: 22 August 2025 07:32 PM
> > >
> > > On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <m...@redhat.com>
> > > > > Sent: 22 August 2025 06:35 PM
> > > > >
> > > > > On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > > > From: Li,Rongqing <lirongq...@baidu.com>
> > > > > > > Sent: 22 August 2025 03:57 PM
> > > > > > >
> > > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > > > surprise removal of virtio pci device").
> > > > > > > >
> > > > > > > > Virtio drivers and PCI devices have never fully supported
> > > > > > > > true surprise (aka hot
> > > > > > > > unplug) removal. Drivers historically continued processing
> > > > > > > > and waiting for pending I/O and even continued synchronous
> > > > > > > > device reset during surprise removal. Devices have also
> > > > > > > > continued completing I/Os, doing DMA and allowing device
> > > > > > > > reset after surprise
> > > > > removal to support such drivers.
> > > > > > > >
> > > > > > > > Supporting it correctly would require a new device
> > > > > > > > capability and driver negotiation in the virtio
> > > > > > > > specification to safely stop I/O and free queue
> > > > > > > memory.
> > > > > > > > Failure to do so either breaks all the existing drivers
> > > > > > > > with call trace listed in the commit or crashes the host
> > > > > > > > on continuing the
> > > DMA.
> > > > > > > > Hence, until such specification and devices are invented,
> > > > > > > > restore the previous behavior of treating surprise removal
> > > > > > > > as graceful removal to avoid regressions and maintain
> > > > > > > > system stability same as before the commit 43bb40c5b926
> ("virtio_pci:
> > > > > > > > Support surprise removal of virtio pci
> > > > > > > device").
> > > > > > > >
> > > > > > > > As explained above, previous analysis of solving this only
> > > > > > > > in driver was incomplete and non-reliable at [1] and at
> > > > > > > > [2]; Hence reverting commit
> > > > > > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > > > > > virtio pci
> > > > > > > > device") is still the best stand to restore failures of
> > > > > > > > virtio net and block
> > > > > > > devices.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613
> > > > > > > EB10
> > > > > > > 0BC6
> > > > > > > C6
> > > > > > > > 38 dc...@cy8pr12mb7195.namprd12.prod.outlook.com/#t
> > > > > > > > [2]
> > > > > > > > https://lore.kernel.org/virtualization/20250602024358.5711
> > > > > > > > 4-1-
> > > > > > > > para
> > > > > > > > v@nv
> > > > > > > > idia.c
> > > > > > > > om/
> > > > > > > >
> > > > > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal
> > > > > > > > of virtio pci device")
> > > > > > > > Cc: sta...@vger.kernel.org
> > > > > > > > Reported-by: lirongq...@baidu.com
> > > > > > > > Closes:
> > > > > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c5
> > > > > > > > 5fb7
> > > > > > > > 3ca9
> > > > > > > > b474
> > > > > > > > 1@b
> > > > > > > > aidu.com/
> > > > > > > > Signed-off-by: Parav Pandit <pa...@nvidia.com>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Tested-by: Li RongQing <lirongq...@baidu.com>
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > -Li
> > > > > > >
> > > > > > Multiple users are blocked to have this fix in stable kernel.
> > > > >
> > > > > what are these users doing that is blocked by this fix?
> > > > >
> > > > Not sure I understand the question. Let me try to answer.
> > > > They are unable to dynamically add/remove the virtio net, block,
> > > > fs devices in
> > > their systems.
> > > > Users have their networking applications running over NS network
> > > > and
> > > database and file system through these devices.
> > > > Some of them keep reverting the patch. Some are unable to.
> > > > They are in search of stable kernel.
> > > >
> > > > Did I understand your question?
> > > >
> > >
> > > Not really, sorry.
> > >
> > > Does the system or does it not have a mechanical interlock?
> > >
> > It is modern system beyond mechanical interlock but has the ability for
> surprise removal.
> 
> I am not sure what does "beyond" mean. I guess that it does not have it?
> 
Right.

> > > If it does, how does a user run into surprise removal issues without
> > > the ability to remove the device?
> > >
> > User has the ability to surprise removal a device from the slot via the 
> > slot's
> pci registers.
> 
> I don't know what this means. Surprise removal is done by removing the
> device. Not via pci registers.
They are many ways to surprise remove a device from a slot.
Slots are capable to detect the device removal and when that occurs, the pcie 
switch/bridge updates the slot registers for the downstream port and indicate 
to the sw.
Some slots are physical in nature. Some are virtual slots in the pcie 
switch/bridge.
End result is, sw gets to know this sw PCIe registers in switch/bridge/port 
level.

> 
> > Yet the device is capable enough to fulfil the needs of broken drivers which
> are waiting for the pending requests to arrive.
> 
> I don't know what this means. A removed device can not do anything at all.
> 
If this was really physically removed, yes. 
but in virtual system, the device is still present on the slot until it gets 
indication from the OS.

> > > If it does not, and a user pull out the working device, how does
> > > your patch help?
> > >
> > A driver must tell that it will not follow broken ancient behaviour and at 
> > that
> point device would stop its ancient backward compatibility mode.
> 
> 
> 
> I don't know what is "ancient backward compatibility mode".
> 
Let me explain.
Sadly, CSPs virtio pci device implementation is done such a way that, it works 
with ancient Linux kernel which does not have commit 43bb40c5b9265.
Commit 43bb40c5b9265 was partial in nature.
Hence request to revert and do proper implementation like you mentioned using 
extra callback and/or with spec extension.

> 
> 
> 
> 
> > > --
> > > MST


Reply via email to