On 2025/11/04 16:41, Thomas Huth wrote:
On 04/11/2025 02.45, Akihiko Odaki wrote:
On 2025/11/03 22:59, Thomas Huth wrote:
On 28/10/2025 18.34, Paolo Bonzini wrote:
From: Akihiko Odaki <[email protected]>

Borrow the concept of force quiescent state from Linux to ensure readers
remain fast during normal operation and to avoid stalls.

  Hi Akihiko,

looks like this commit has introduced a regression with the "replay" functional test on the alpha target.
When I run something like:

  pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
   --num-processes 1 --repeat 10 func-alpha-replay

in the build folder, approx. half of the test runs are failing for me now.

I bisected the issue to this patch here - when I rebuild qemu-system- alpha with the commit right before this change here, the above test runs work fine, so I'm pretty sure that the problem has been introduced by this commit here.

Could you please have a look?

I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force quiescent state").

Can you provide meson-logs/testlog-thorough.txt so that I can look into the failure you are facing? If you think you have something useful for debugging, please share it to me too.

There's not much in that testlog-thorough.txt that could be helpful here,
it's basically just the information that the test has been killed due to
the timeout:

==================================== 1/1 =================================== test:         qemu:func-thorough+func-alpha-thorough+thorough / func- alpha-replay
start time:   07:25:26
duration:     90.01s
result:       killed by signal 15 SIGTERM
command:      RUST_BACKTRACE=1 QEMU_TEST_QEMU_IMG=/tmp/qemu-rcu/qemu-img QEMU_TEST_GDB=/usr/bin/gdb MALLOC_PERTURB_=255 MESON_TEST_ITERATION=1 PYTHONPATH=/home/thuth/devel/qemu/python:/home/thuth/devel/qemu/tests/ functional G_TEST_SLOW=1 SPEED=thorough QEMU_TEST_QEMU_BINARY=/tmp/qemu- rcu/qemu-system-alpha ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 LD_LIBRARY_PATH=/tmp/qemu-rcu/tests/tcg/plugins:/tmp/qemu-rcu/contrib/ plugins UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QEMU_BUILD_ROOT=/tmp/qemu-rcu /tmp/qemu-rcu/pyvenv/bin/python3 /home/thuth/devel/qemu/tests/functional/alpha/test_replay.py
=============================================================================

Summary of Failures:

1/1 qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay TIMEOUT        90.01s   killed by signal 15 SIGTERM


There is also not that much helpful information in
tests/functional/alpha/test_replay.AlphaReplay.test_clipper, apart from
the console.log file. For a good run, the console log looks like this:

2025-11-04 08:16:46,148: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:46,149: PCI:   region 0: 0000c000
2025-11-04 08:16:46,149: PCI:   region 1: 0000c008
2025-11-04 08:16:46,149: PCI:   region 2: 0000c010
2025-11-04 08:16:46,149: PCI:   region 3: 0000c018
2025-11-04 08:16:46,149: PCI:   region 4: 0000c020
2025-11-04 08:16:46,149: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:16:48,149: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:16:48,149: [    0.000000] Linux version 2.6.26-2-alpha- generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012 2025-11-04 08:16:48,150: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:16:48,150: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:16:48,150: [    0.000000] Command line: printk.time=0 console=ttyS0 2025-11-04 08:16:48,150: [    0.000000] memcluster 0, usage 1, start        0, end       15 2025-11-04 08:16:48,150: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:16:48,150: [    0.000000] freeing pages 15:2048
2025-11-04 08:16:48,150: [    0.000000] freeing pages 2987:16384
2025-11-04 08:16:48,151: [    0.000000] reserving pages 2987:2988
2025-11-04 08:16:48,151: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272 2025-11-04 08:16:48,151: [    0.000000] Kernel command line: printk.time=0 console=ttyS0
2025-11-04 08:16:57,358: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:57,358: PCI:   region 0: 0000c000
2025-11-04 08:16:57,358: PCI:   region 1: 0000c008
2025-11-04 08:16:57,359: PCI:   region 2: 0000c010
2025-11-04 08:16:57,359: PCI:   region 3: 0000c018
2025-11-04 08:16:57,359: PCI:   region 4: 0000c020
2025-11-04 08:16:57,360: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:17:08,468: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:17:08,470: [    0.000000] Linux version 2.6.26-2-alpha- generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012 2025-11-04 08:17:08,471: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:17:08,471: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:17:08,472: [    0.000000] Command line: printk.time=0 console=ttyS0 2025-11-04 08:17:08,472: [    0.000000] memcluster 0, usage 1, start        0, end       15 2025-11-04 08:17:08,473: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:17:08,473: [    0.000000] freeing pages 15:2048
2025-11-04 08:17:08,474: [    0.000000] freeing pages 2987:16384
2025-11-04 08:17:08,474: [    0.000000] reserving pages 2987:2988
2025-11-04 08:17:08,475: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272 2025-11-04 08:17:08,476: [    0.000000] Kernel command line: printk.time=0 console=ttyS0

I.e. the replay worked as expected. When it fails, console.log only contains:

2025-11-04 08:25:26,601: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:25:26,601: PCI:   region 0: 0000c000
2025-11-04 08:25:26,601: PCI:   region 1: 0000c008
2025-11-04 08:25:26,601: PCI:   region 2: 0000c010
2025-11-04 08:25:26,601: PCI:   region 3: 0000c018
2025-11-04 08:25:26,601: PCI:   region 4: 0000c020
2025-11-04 08:25:26,602: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:25:28,601: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:25:28,602: [    0.000000] Linux version 2.6.26-2-alpha- generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012 2025-11-04 08:25:28,602: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:25:28,602: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:25:28,602: [    0.000000] Command line: printk.time=0 console=ttyS0 2025-11-04 08:25:28,602: [    0.000000] memcluster 0, usage 1, start        0, end       15 2025-11-04 08:25:28,602: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:25:28,602: [    0.000000] freeing pages 15:2048
2025-11-04 08:25:28,603: [    0.000000] freeing pages 2987:16384
2025-11-04 08:25:28,603: [    0.000000] reserving pages 2987:2988
2025-11-04 08:25:28,603: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272 2025-11-04 08:25:28,603: [    0.000000] Kernel command line: printk.time=0 console=ttyS0

I.e. the replay did not work.

Could this RCU stuff somehow influence the replay mechanism in QEMU?

I don't know (yet).

Can you attach gdb and show a backtrace for each thread? It often reveals deadlock among threads.

Regards,
Akihiko Odaki

Reply via email to