On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote: > On Thu, Aug 28 2025, Parav Pandit <pa...@nvidia.com> wrote: > > >> From: Cornelia Huck <coh...@redhat.com> > >> Sent: 27 August 2025 05:04 PM > >> > >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <m...@redhat.com> wrote: > >> > >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote: > >> >> > What I do not understand, is what good does the revert do. Sorry. > >> >> > > >> >> Let me explain. > >> >> It prevents the issue of vblk requests being stuck due to broken VQ. > >> >> It prevents the vnet driver start_xmit() to be not stuck on skb > >> >> completions. > >> > > >> > This is the part I don't get. In what scenario, before 43bb40c5b9265 > >> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck? > >> > > >> > Once the device is gone, it is not using any buffers at all. > >> > >> What I also don't understand: virtio-ccw does exactly the same thing > >> (virtio_break_device(), added in 2014), and it supports surprise removal > >> _only_, yet I don't remember seeing bug reports? > > > > I suspect that stress testing may not have happened for ccw with active > > vblk Ios and outstanding transmit pkt and cvq commands. > > Hard to say as we don't have ccw hw or systems. > > cc:ing linux-s390 list. I'd be surprised if nobody ever tested surprise > removal on a loaded system in the last 11 years.
As it became very clear from follow up discussion, the issue is nothing to do with virtio, it is with a broken hypervisor that allows device to DMA into guest memory while also telling the guest that the device has been removed. I guess s390 is just not broken like this. -- MST