一) 
Can you post the guest kernel messages (dmesg)? If the guest is hanging
then it may be easiest to configure a serial console so the kernel
messages are sent to the host where you can see them.

Does the hang occur during the LTP code you linked or afterwards when
the PCI device is bound to a virtio driver?




>   I used conosle, the hang occurred afterwards.   dmesg shows that tpci test 
> is finished without error.
LTP test case: 
https://github.com/linux-test-project/ltp/blob/522d7fba4afc84e07b252aa4cd91b241e81d6613/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c#L428
kernel 5.10, qemu 6.2



different guest-configuration tests show different results.  guest did not 
crash if hung-task-panic=0, in my case  i enable hung-task-panic in order to 
trace.


test case 1:
xml machine pc,virtio disk, virtio net ——  guest's io hung, network broke down, 
though console is avilable but io operation hung.


#ps -aux| grep D

root           7  0.0  0.0      0     0 ?        D    14:37   0:00 
[kworker/u16:0+flush-253:0]
root         483  0.0  0.0      0     0 ?      D    14:37   0:00 [jbd2/vda3-8]


test case 2:
xml machine q35,virtio/q35,scsi ——disk did not hung but network broke down. 
ping errors though everything looks ok and no crash and no kernel error






二)
I didn't see your original email so I missed the panic. I'd still like
to see the earlier kernel messages before the panic in order to
understand how the PCI device is bound.

Is the vda device with hung I/O the same device that was accessed by
the LTP test earlier? I guess the LTP test runs against the device and
then the virtio driver binds to the device again afterwards?



> the test is 
```
// iterate all devices
……
for (i = 0; i < 7; ++i) {  // iterate current device's resources

  if (r->flags & IORESOURCE_MEM &&
  r->flags & IORESOURCE_PREFETCH) {
  pci_release_resource(dev, i);
  ret = pci_assign_resource(dev, i);
  prk_info("assign resource to '%d', ret '%d'", i, ret);
  rc |= (ret < 0 && ret != -EBUSY) ? TFAIL : TPASS;
  }

}
```
test does not do virtio device unbind and  bind. 
I only notice mem resource changed. see 'test-case 12'


———————————
[   88.905705] ltp_tpci: test-case 12
[   88.905706] ltp_tpci: assign resources
[   88.905706] ltp_tpci: assign resource #0
[   88.905707] ltp_tpci: name = 0000:00:07.0, flags = 262401, start 0xc080, end 
0xc0ff
[   88.905707] ltp_tpci: assign resource #1
[   88.905708] ltp_tpci: name = 0000:00:07.0, flags = 262656, start 0xfebd4000, 
end 0xfebd4fff
[   88.905709] ltp_tpci: assign resource #2
[   88.905709] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0
[   88.905710] ltp_tpci: assign resource #3
[   88.905710] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0
[   88.905711] ltp_tpci: assign resource #4
[   88.905711] ltp_tpci: name = 0000:00:07.0, flags = 1319436, start 
0xfe00c000, end 0xfe00ffff
[   88.905713] virtio-pci 0000:00:07.0: BAR 4: releasing [mem 
0xfe00c000-0xfe00ffff 64bit pref]
[   88.905715] virtio-pci 0000:00:07.0: BAR 4: assigned [mem 
0x24000c000-0x24000ffff 64bit pref]
[   88.906693] ltp_tpci: assign resource to '4', ret '0'
[   88.906694] ltp_tpci: assign resource #5
[   88.906694] ltp_tpci: name = (null), flags = 0, start 0x0, end 0x0
[   88.906695] ltp_tpci: assign resource #6
[   88.906695] ltp_tpci: name = 0000:00:07.0, flags = 0, start 0x0, end 0x0

[   88.906800] ltp_tpci: test-case 13





---- Replied Message ----
| From | Stefan Hajnoczi<stefa...@gmail.com> |
| Date | 08/10/2023 23:24 |
| To | Stefan Hajnoczi<stefa...@redhat.com> |
| Cc | longguang.yue<kvml...@163.com> ,
Michael Tokarev<m...@tls.msk.ru> ,
m...@redhat.com<m...@redhat.com> ,
qemu-devel<qemu-devel@nongnu.org> ,
linux-kernel<linux-ker...@vger.kernel.org> |
| Subject | Re: LTP test related to virtio releasing and reassigning resource 
leads to guest hung |
On Thu, 10 Aug 2023 at 10:14, Stefan Hajnoczi <stefa...@redhat.com> wrote:

On Thu, Aug 10, 2023 at 06:35:32PM +0800, longguang.yue wrote:
could you please give me some tips to diagnose?  I could do tests on qemu 8.0, 
but product environment could not update.
I test on different kernel version 5.10.0-X, one is better and results show 
problem is more about host kernel  rather than qemu.


test cases are different combination of i440fx/q35 and virtio/scsi and kernel.

Can you post the guest kernel messages (dmesg)? If the guest is hanging
then it may be easiest to configure a serial console so the kernel
messages are sent to the host where you can see them.

Does the hang occur during the LTP code you linked or afterwards when
the PCI device is bound to a virtio driver?

I didn't see your original email so I missed the panic. I'd still like
to see the earlier kernel messages before the panic in order to
understand how the PCI device is bound.

Is the vda device with hung I/O the same device that was accessed by
the LTP test earlier? I guess the LTP test runs against the device and
then the virtio driver binds to the device again afterwards?


Which virtio device causes the problem?

Can you describe the hang in more detail: is the guest still responsive
(e.g. console or network)? Is the QEMU HMP/QMP monitor still responsive?

Thanks,
Stefan





thanks




---- Replied Message ----
| From | Michael Tokarev<m...@tls.msk.ru> |
| Date | 08/10/2023 17:08 |
| To | longguang.yue<kvml...@163.com> ,
qemu-devel<qemu-devel@nongnu.org> ,
linux-kernel<linux-ker...@vger.kernel.org> |
| Subject | Re: LTP test related to virtio releasing and reassigning resource 
leads to guest hung |
10.08.2023 11:57, longguang.yue wrote:
Hi, all:
A ltp test leads to guest hung(io hung), the test releases virtio device 
resource and then reassign.
I find device’s mem prefetchable resource 64-bit is changed.

ltp
test: 
https://github.com/linux-test-project/ltp/blob/522d7fba4afc84e07b252aa4cd91b241e81d6613/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c#L428

Do you know what cause the problem?

Thanks very much.

--------------------------
ENV: kernel 5.10.0, qemu 6.2

Current qemu is 8.1 (well, almost, to be released this month;
previous release is 8.0 anyway).

This might be interesting to test in a current version before
going any further.

Thanks,

/mjt

Reply via email to