On Thu, Aug 28 2025, "Michael S. Tsirkin" <m...@redhat.com> wrote:
> On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote: >> On Thu, Aug 28 2025, Parav Pandit <pa...@nvidia.com> wrote: >> >> >> From: Cornelia Huck <coh...@redhat.com> >> >> Sent: 27 August 2025 05:04 PM >> >> >> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <m...@redhat.com> wrote: >> >> >> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote: >> >> >> > What I do not understand, is what good does the revert do. Sorry. >> >> >> > >> >> >> Let me explain. >> >> >> It prevents the issue of vblk requests being stuck due to broken VQ. >> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb >> >> >> completions. >> >> > >> >> > This is the part I don't get. In what scenario, before 43bb40c5b9265 >> >> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck? >> >> > >> >> > Once the device is gone, it is not using any buffers at all. >> >> >> >> What I also don't understand: virtio-ccw does exactly the same thing >> >> (virtio_break_device(), added in 2014), and it supports surprise removal >> >> _only_, yet I don't remember seeing bug reports? >> > >> > I suspect that stress testing may not have happened for ccw with active >> > vblk Ios and outstanding transmit pkt and cvq commands. >> > Hard to say as we don't have ccw hw or systems. >> >> cc:ing linux-s390 list. I'd be surprised if nobody ever tested surprise >> removal on a loaded system in the last 11 years. > > > As it became very clear from follow up discussion, the issue is nothing > to do with virtio, it is with a broken hypervisor that allows device to > DMA into guest memory while also telling the guest that the device has > been removed. > > I guess s390 is just not broken like this. Ah good, I missed that -- that indeed sounds broken, and needs to be fixed there.