On 04/11/2025 02.45, Akihiko Odaki wrote:
On 2025/11/03 22:59, Thomas Huth wrote:
On 28/10/2025 18.34, Paolo Bonzini wrote:
From: Akihiko Odaki <[email protected]>
Borrow the concept of force quiescent state from Linux to ensure
readers
remain fast during normal operation and to avoid stalls.
Hi Akihiko,
looks like this commit has introduced a regression with the "replay"
functional test on the alpha target.
When I run something like:
pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
--num-processes 1 --repeat 10 func-alpha-replay
in the build folder, approx. half of the test runs are failing for me
now.
I bisected the issue to this patch here - when I rebuild qemu-system-
alpha with the commit right before this change here, the above test
runs work fine, so I'm pretty sure that the problem has been
introduced by this commit here.
Could you please have a look?
I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force
quiescent state").
Can you provide meson-logs/testlog-thorough.txt so that I can look
into the failure you are facing? If you think you have something
useful for debugging, please share it to me too.
There's not much in that testlog-thorough.txt that could be helpful here,
it's basically just the information that the test has been killed due to
the timeout:
==================================== 1/1
===================================
test: qemu:func-thorough+func-alpha-thorough+thorough / func-
alpha-replay
start time: 07:25:26
duration: 90.01s
result: killed by signal 15 SIGTERM
command: RUST_BACKTRACE=1 QEMU_TEST_QEMU_IMG=/tmp/qemu-rcu/qemu-img
QEMU_TEST_GDB=/usr/bin/gdb MALLOC_PERTURB_=255 MESON_TEST_ITERATION=1
PYTHONPATH=/home/thuth/devel/qemu/python:/home/thuth/devel/qemu/tests/
functional G_TEST_SLOW=1 SPEED=thorough QEMU_TEST_QEMU_BINARY=/tmp/qemu-
rcu/qemu-system-alpha
ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1
LD_LIBRARY_PATH=/tmp/qemu-rcu/tests/tcg/plugins:/tmp/qemu-rcu/contrib/
plugins
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QEMU_BUILD_ROOT=/tmp/qemu-rcu /tmp/qemu-rcu/pyvenv/bin/python3 /home/thuth/devel/qemu/tests/functional/alpha/test_replay.py
=============================================================================
Summary of Failures:
1/1 qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay
TIMEOUT 90.01s killed by signal 15 SIGTERM
There is also not that much helpful information in
tests/functional/alpha/test_replay.AlphaReplay.test_clipper, apart from
the console.log file. For a good run, the console log looks like this:
2025-11-04 08:16:46,148: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:46,149: PCI: region 0: 0000c000
2025-11-04 08:16:46,149: PCI: region 1: 0000c008
2025-11-04 08:16:46,149: PCI: region 2: 0000c010
2025-11-04 08:16:46,149: PCI: region 3: 0000c018
2025-11-04 08:16:46,149: PCI: region 4: 0000c020
2025-11-04 08:16:46,149: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:16:48,149: [ 0.000000] Initializing cgroup subsys cpu
2025-11-04 08:16:48,149: [ 0.000000] Linux version 2.6.26-2-alpha-
generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3
20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:16:48,150: [ 0.000000] Booting GENERIC on Tsunami
variation Clipper using machine vector Clipper from SRM
2025-11-04 08:16:48,150: [ 0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:16:48,150: [ 0.000000] Command line: printk.time=0
console=ttyS0
2025-11-04 08:16:48,150: [ 0.000000] memcluster 0, usage 1,
start 0, end 15
2025-11-04 08:16:48,150: [ 0.000000] memcluster 1, usage 0,
start 15, end 16384
2025-11-04 08:16:48,150: [ 0.000000] freeing pages 15:2048
2025-11-04 08:16:48,150: [ 0.000000] freeing pages 2987:16384
2025-11-04 08:16:48,151: [ 0.000000] reserving pages 2987:2988
2025-11-04 08:16:48,151: [ 0.000000] Built 1 zonelists in Zone order,
mobility grouping on. Total pages: 16272
2025-11-04 08:16:48,151: [ 0.000000] Kernel command line:
printk.time=0 console=ttyS0
2025-11-04 08:16:57,358: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:57,358: PCI: region 0: 0000c000
2025-11-04 08:16:57,358: PCI: region 1: 0000c008
2025-11-04 08:16:57,359: PCI: region 2: 0000c010
2025-11-04 08:16:57,359: PCI: region 3: 0000c018
2025-11-04 08:16:57,359: PCI: region 4: 0000c020
2025-11-04 08:16:57,360: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:17:08,468: [ 0.000000] Initializing cgroup subsys cpu
2025-11-04 08:17:08,470: [ 0.000000] Linux version 2.6.26-2-alpha-
generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3
20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:17:08,471: [ 0.000000] Booting GENERIC on Tsunami
variation Clipper using machine vector Clipper from SRM
2025-11-04 08:17:08,471: [ 0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:17:08,472: [ 0.000000] Command line: printk.time=0
console=ttyS0
2025-11-04 08:17:08,472: [ 0.000000] memcluster 0, usage 1,
start 0, end 15
2025-11-04 08:17:08,473: [ 0.000000] memcluster 1, usage 0,
start 15, end 16384
2025-11-04 08:17:08,473: [ 0.000000] freeing pages 15:2048
2025-11-04 08:17:08,474: [ 0.000000] freeing pages 2987:16384
2025-11-04 08:17:08,474: [ 0.000000] reserving pages 2987:2988
2025-11-04 08:17:08,475: [ 0.000000] Built 1 zonelists in Zone order,
mobility grouping on. Total pages: 16272
2025-11-04 08:17:08,476: [ 0.000000] Kernel command line:
printk.time=0 console=ttyS0
I.e. the replay worked as expected. When it fails, console.log only
contains:
2025-11-04 08:25:26,601: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:25:26,601: PCI: region 0: 0000c000
2025-11-04 08:25:26,601: PCI: region 1: 0000c008
2025-11-04 08:25:26,601: PCI: region 2: 0000c010
2025-11-04 08:25:26,601: PCI: region 3: 0000c018
2025-11-04 08:25:26,601: PCI: region 4: 0000c020
2025-11-04 08:25:26,602: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:25:28,601: [ 0.000000] Initializing cgroup subsys cpu
2025-11-04 08:25:28,602: [ 0.000000] Linux version 2.6.26-2-alpha-
generic (Debian 2.6.26-29) ([email protected]) (gcc version 4.1.3
20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:25:28,602: [ 0.000000] Booting GENERIC on Tsunami
variation Clipper using machine vector Clipper from SRM
2025-11-04 08:25:28,602: [ 0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:25:28,602: [ 0.000000] Command line: printk.time=0
console=ttyS0
2025-11-04 08:25:28,602: [ 0.000000] memcluster 0, usage 1,
start 0, end 15
2025-11-04 08:25:28,602: [ 0.000000] memcluster 1, usage 0,
start 15, end 16384
2025-11-04 08:25:28,602: [ 0.000000] freeing pages 15:2048
2025-11-04 08:25:28,603: [ 0.000000] freeing pages 2987:16384
2025-11-04 08:25:28,603: [ 0.000000] reserving pages 2987:2988
2025-11-04 08:25:28,603: [ 0.000000] Built 1 zonelists in Zone order,
mobility grouping on. Total pages: 16272
2025-11-04 08:25:28,603: [ 0.000000] Kernel command line:
printk.time=0 console=ttyS0
I.e. the replay did not work.
Could this RCU stuff somehow influence the replay mechanism in QEMU?