Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue
On Thu, May 16, 2024 at 5:46 PM Catherine Redfield wrote: > > Feng, > > Thank you for providing your debugging steps; I used them on a gce image > locally and was not able to replicate the issue. I also attempted to > replicate in qemu/virsh using qemu-guest-agent to enable the S3 suspend > state, also without success (that is S3 suspend state worked without any > problems). I have brought this back to the cloud for further debugging of > their config and guest agent to try and determine what the issue is. > > Thank you very much for all your help on this issue and time looking into it! > Catherine Does this fix the issue? I guess the reason is that GCE is using legacy virtio. https://lore.kernel.org/kvm/cacgkmeth_9baewekq862ygzwuozwg96z3g6oyqhzycj2jpu...@mail.gmail.com/T/ Thanks > > On Thu, May 9, 2024 at 5:03 AM Feng Liu wrote: >> >> >> On 2024-05-08 a.m.7:18, Catherine Redfield wrote: >> > *External email: Use caution opening links or attachments* >> > >> > >> > On a VM with the GCP kernel (where we first identified the problem), I see: >> > >> > 1. The full kernel log from `journalctl --system > kernlog` attached. >> > The specific suspend section is here: >> > >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > systemd[1]: Reached target sleep.target - Sleep. >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > systemd[1]: Starting systemd-suspend.service - System Suspend... >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > systemd-sleep[1413]: Performing sleep operation 'suspend'... >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: PM: suspend entry (deep) >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: Filesystems sync: 0.008 seconds >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: Freezing user space processes >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: Freezing user space processes completed (elapsed 0.001 seconds) >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: OOM killer disabled. >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: Freezing remaining freezable tasks >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: Freezing remaining freezable tasks completed (elapsed 0.000 >> > seconds) >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: printk: Suspending console(s) (use no_console_suspend to debug) >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: port 00:03:0.0: PM: dpm_run_callback(): >> > pm_runtime_force_suspend+0x0/0x130 returns -16 >> > May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal >> > kernel: port 00:03:0.0: PM: failed to suspend: error -16 >> >> Thanks Joesph and Catherine's help. >> >> Hi, >> >> I have alreay synced up with Cananical guys offline about this issue. >> >> I can run "suspend/resume" sucessfully on my local server and VM. >> And "PM: failed to suspend: error -16" looks like not cause by my >> previous virtio patch ( fd27ef6b44be ("virtio-pci: Introduce admin >> virtqueue")) which only modified "virtio_device_freeze" about "suspend" >> action. >> >> So I have provide the my steps and debug patch to Joesph and Catherine. >> I will also sync up the information here, as follow: >> >> I have read the qemu code and find a way to trigger "suspend/resume" on >> my setup, and add some debug message in the latest kerenel >> >> My setps are: >> 1. QEMU cmdline add following >> >> -global PIIX4_PM.disable_s3=0 \ >> -global PIIX4_PM.disable_s4=1 \ >> >> -netdev type=tap,ifname=tap0,id=hostnet0,script=no,downscript=no \ >> -device >> virtio-net-pci,netdev=hostnet0,id=net0,mac=$SSH_MAC,bus=pci.0,addr=0x3 \ >> .. >> >> 2. In the VM, run "systemctl suspend" to PM suspend the VM into memory >> 3. In qemu hmp shell, run "system_wakeup" to resume the VM again >> >> My VM configuration: >> NIC: 1 virtio nic emulated by QEMU >> OS: Ubuntu 22.04.4 LTS >> kernel: latest kernel, 6.9-rc7: ee5b455b0ada (kernel2/net-next-virito, >> kernel2/master, master) Merge tag 'slab-for-6.9-rc7-fixes' of >> git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab) >> >> >> I add some debug message on the latest kernel, and do above steps to >> trigger "suspen/resume". Everything of VM is OK, VM could suspend/resume >> successfully. >> Follwing is the kernel log: >> >> >> May 6 15:59:52 feliu-vm kernel: [ 43.446737] PM: suspend entry (deep) >> May 6 16:00:04 feliu-vm kernel: [ 43.467640] Filesystems sync: 0.020 >> seconds >> May 6 16:00:04 feliu-vm kernel: [ 43.467923] Freezing user space >> processes >> May 6 16:00:04 feliu-vm kernel: [ 43.470294] Freezing user space >>
Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue
On 2024-05-08 a.m.7:18, Catherine Redfield wrote: *External email: Use caution opening links or attachments* On a VM with the GCP kernel (where we first identified the problem), I see: 1. The full kernel log from `journalctl --system > kernlog` attached. The specific suspend section is here: May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd[1]: Reached target sleep.target - Sleep. May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd[1]: Starting systemd-suspend.service - System Suspend... May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd-sleep[1413]: Performing sleep operation 'suspend'... May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: PM: suspend entry (deep) May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Filesystems sync: 0.008 seconds May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing user space processes May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing user space processes completed (elapsed 0.001 seconds) May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: OOM killer disabled. May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing remaining freezable tasks May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds) May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: printk: Suspending console(s) (use no_console_suspend to debug) May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: port 00:03:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0x130 returns -16 May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: port 00:03:0.0: PM: failed to suspend: error -16 Thanks Joesph and Catherine's help. Hi, I have alreay synced up with Cananical guys offline about this issue. I can run "suspend/resume" sucessfully on my local server and VM. And "PM: failed to suspend: error -16" looks like not cause by my previous virtio patch ( fd27ef6b44be ("virtio-pci: Introduce admin virtqueue")) which only modified "virtio_device_freeze" about "suspend" action. So I have provide the my steps and debug patch to Joesph and Catherine. I will also sync up the information here, as follow: I have read the qemu code and find a way to trigger "suspend/resume" on my setup, and add some debug message in the latest kerenel My setps are: 1. QEMU cmdline add following -global PIIX4_PM.disable_s3=0 \ -global PIIX4_PM.disable_s4=1 \ -netdev type=tap,ifname=tap0,id=hostnet0,script=no,downscript=no \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=$SSH_MAC,bus=pci.0,addr=0x3 \ .. 2. In the VM, run "systemctl suspend" to PM suspend the VM into memory 3. In qemu hmp shell, run "system_wakeup" to resume the VM again My VM configuration: NIC: 1 virtio nic emulated by QEMU OS: Ubuntu 22.04.4 LTS kernel: latest kernel, 6.9-rc7: ee5b455b0ada (kernel2/net-next-virito, kernel2/master, master) Merge tag 'slab-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab) I add some debug message on the latest kernel, and do above steps to trigger "suspen/resume". Everything of VM is OK, VM could suspend/resume successfully. Follwing is the kernel log: May 6 15:59:52 feliu-vm kernel: [ 43.446737] PM: suspend entry (deep) May 6 16:00:04 feliu-vm kernel: [ 43.467640] Filesystems sync: 0.020 seconds May 6 16:00:04 feliu-vm kernel: [ 43.467923] Freezing user space processes May 6 16:00:04 feliu-vm kernel: [ 43.470294] Freezing user space processes completed (elapsed 0.002 seconds) May 6 16:00:04 feliu-vm kernel: [ 43.470299] OOM killer disabled. May 6 16:00:04 feliu-vm kernel: [ 43.470301] Freezing remaining freezable tasks May 6 16:00:04 feliu-vm kernel: [ 43.471482] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) May 6 16:00:04 feliu-vm kernel: [ 43.471495] printk: Suspending console(s) (use no_console_suspend to debug) May 6 16:00:04 feliu-vm kernel: [ 43.474034] virtio_net virtio0: godeng virtio device freeze May 6 16:00:04 feliu-vm kernel: [ 43.475714] virtio_net virtio0 ens3: godfeng virtnet_freeze done May 6 16:00:04 feliu-vm kernel: [ 43.475717] virtio_net virtio0: godfeng VIRTIO_F_ADMIN_VQ not enabled May 6 16:00:04 feliu-vm kernel: [ 43.475719] virtio_net virtio0: godeng virtio device freeze done May 6 16:00:04 feliu-vm kernel: [ 43.535382] smpboot: CPU 1 is now offline May 6 16:00:04 feliu-vm kernel: [ 43.537283] IRQ fixup: irq 1 move in progress, old vector 32 May 6 16:00:04 feliu-vm kernel: [ 43.538504] smpboot: CPU 2 is now offline May 6 16:00:04 feliu-vm kernel: [ 43.541392]
Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue
On Sat, May 4, 2024 at 2:10 AM Joseph Salisbury wrote: > > Hi Feng, > > During testing, a kernel bug was identified with the suspend/resume > functionality on instances running in a public cloud [0]. This bug is a > regression introduced in v6.8-rc1. After a kernel bisect, the following > commit was identified as the cause of the regression: > > fd27ef6b44be ("virtio-pci: Introduce admin virtqueue") Have a quick glance at the patch it seems it should not damage the freeze/restore as it should behave as in the past. But I found something interesting: 1) assumes 1 admin vq which is not what spec said 2) special function for admin virtqueue during freeze/restore, but it doesn't do anything special than del_vq() 3) lack real users but I guess e.g the destroy_avq() needs to be synchronized with the one that is using admin virtqueue > > I was hoping to get your feedback, since you are the patch author. Do > you think gathering any additional data will help diagnose this issue? Yes, please show us 1) the kernel log here. 2) the features that the device has like /sys/bus/virtio/devices/virtio0/features > This commit is depended upon by other virtio commits, so a revert test > is not really straight forward without reverting all the dependencies. > Any ideas you have would be greatly appreciated. Thanks > > > Thanks, > > Joe > > http://pad.lv/2063315 >