一) Can you post the guest kernel messages (dmesg)? If the guest is hanging then it may be easiest to configure a serial console so the kernel messages are sent to the host where you can see them.
Does the hang occur during the LTP code you linked or afterwards when the PCI device is bound to a virtio driver? > I used conosle, the hang occurred afterwards. dmesg shows that tpci test > is finished without error. LTP test case: https://github.com/linux-test-project/ltp/blob/522d7fba4afc84e07b252aa4cd91b241e81d6613/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c#L428 kernel 5.10, qemu 6.2 different guest-configuration tests show different results. guest did not crash if hung-task-panic=0, in my case i enable hung-task-panic in order to trace. test case 1: xml machine pc,virtio disk, virtio net —— guest's io hung, network broke down, though console is avilable but io operation hung. #ps -aux| grep D root 7 0.0 0.0 0 0 ? D 14:37 0:00 [kworker/u16:0+flush-253:0] root 483 0.0 0.0 0 0 ? D 14:37 0:00 [jbd2/vda3-8] test case 2: xml machine q35,virtio/q35,scsi ——disk did not hung but network broke down. ping errors though everything looks ok and no crash and no kernel error 二) I didn't see your original email so I missed the panic. I'd still like to see the earlier kernel messages before the panic in order to understand how the PCI device is bound. Is the vda device with hung I/O the same device that was accessed by the LTP test earlier? I guess the LTP test runs against the device and then the virtio driver binds to the device again afterwards? > the test is ``` // iterate all devices …… for (i = 0; i < 7; ++i) { // iterate current device's resources if (r->flags & IORESOURCE_MEM && r->flags & IORESOURCE_PREFETCH) { pci_release_resource(dev, i); ret = pci_assign_resource(dev, i); prk_info("assign resource to '%d', ret '%d'", i, ret); rc |= (ret < 0 && ret != -EBUSY) ? TFAIL : TPASS; } } ``` test does not do virtio device unbind and bind. I only notice mem resource changed. see 'test-case 12' ——————————— [ 88.905705] ltp_tpci: test-case 12 [ 88.905706] ltp_tpci: assign resources [ 88.905706] ltp_tpci: assign resource #0 [ 88.905707] ltp_tpci: name = 0000:00:07.0, flags = 262401, start 0xc080, end 0xc0ff [ 88.905707] ltp_tpci: assign resource #1 [ 88.905708] ltp_tpci: name = 0000:00:07.0, flags = 262656, start 0xfebd4000, end 0xfebd4fff [ 88.905709] ltp_tpci: assign resource #2 [ 88.905709] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0 [ 88.905710] ltp_tpci: assign resource #3 [ 88.905710] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0 [ 88.905711] ltp_tpci: assign resource #4 [ 88.905711] ltp_tpci: name = 0000:00:07.0, flags = 1319436, start 0xfe00c000, end 0xfe00ffff [ 88.905713] virtio-pci 0000:00:07.0: BAR 4: releasing [mem 0xfe00c000-0xfe00ffff 64bit pref] [ 88.905715] virtio-pci 0000:00:07.0: BAR 4: assigned [mem 0x24000c000-0x24000ffff 64bit pref] [ 88.906693] ltp_tpci: assign resource to '4', ret '0' [ 88.906694] ltp_tpci: assign resource #5 [ 88.906694] ltp_tpci: name = (null), flags = 0, start 0x0, end 0x0 [ 88.906695] ltp_tpci: assign resource #6 [ 88.906695] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0 [ 88.906800] ltp_tpci: test-case 13 ---- Replied Message ---- | From | Stefan Hajnoczi<stefa...@gmail.com> | | Date | 08/10/2023 23:24 | | To | Stefan Hajnoczi<stefa...@redhat.com> | | Cc | longguang.yue<kvml...@163.com> , Michael Tokarev<m...@tls.msk.ru> , m...@redhat.com<m...@redhat.com> , qemu-devel<qemu-devel@nongnu.org> , linux-kernel<linux-ker...@vger.kernel.org> | | Subject | Re: LTP test related to virtio releasing and reassigning resource leads to guest hung | On Thu, 10 Aug 2023 at 10:14, Stefan Hajnoczi <stefa...@redhat.com> wrote: On Thu, Aug 10, 2023 at 06:35:32PM +0800, longguang.yue wrote: could you please give me some tips to diagnose? I could do tests on qemu 8.0, but product environment could not update. I test on different kernel version 5.10.0-X, one is better and results show problem is more about host kernel rather than qemu. test cases are different combination of i440fx/q35 and virtio/scsi and kernel. Can you post the guest kernel messages (dmesg)? If the guest is hanging then it may be easiest to configure a serial console so the kernel messages are sent to the host where you can see them. Does the hang occur during the LTP code you linked or afterwards when the PCI device is bound to a virtio driver? I didn't see your original email so I missed the panic. I'd still like to see the earlier kernel messages before the panic in order to understand how the PCI device is bound. Is the vda device with hung I/O the same device that was accessed by the LTP test earlier? I guess the LTP test runs against the device and then the virtio driver binds to the device again afterwards? Which virtio device causes the problem? Can you describe the hang in more detail: is the guest still responsive (e.g. console or network)? Is the QEMU HMP/QMP monitor still responsive? Thanks, Stefan thanks ---- Replied Message ---- | From | Michael Tokarev<m...@tls.msk.ru> | | Date | 08/10/2023 17:08 | | To | longguang.yue<kvml...@163.com> , qemu-devel<qemu-devel@nongnu.org> , linux-kernel<linux-ker...@vger.kernel.org> | | Subject | Re: LTP test related to virtio releasing and reassigning resource leads to guest hung | 10.08.2023 11:57, longguang.yue wrote: Hi, all: A ltp test leads to guest hung(io hung), the test releases virtio device resource and then reassign. I find device’s mem prefetchable resource 64-bit is changed. ltp test: https://github.com/linux-test-project/ltp/blob/522d7fba4afc84e07b252aa4cd91b241e81d6613/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c#L428 Do you know what cause the problem? Thanks very much. -------------------------- ENV: kernel 5.10.0, qemu 6.2 Current qemu is 8.1 (well, almost, to be released this month; previous release is 8.0 anyway). This might be interesting to test in a current version before going any further. Thanks, /mjt