On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
> On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
> > From: Thomas Huth <[email protected]>
> >
> > We just have to make sure that we can set the endianness to big
> > endian,
> > then we can also run this test on s390x.
> >
> > Signed-off-by: Thomas Huth <[email protected]>
> > ---
> > Marked as RFC since it depends on the fix for this bug (so it
> > cannot
> > be merged yet):
> >
> > https://lore.kernel.org/qemu-devel/[email protected]
> > /
> >
> > tests/functional/reverse_debugging.py | 4 +++-
> > tests/functional/s390x/meson.build | 1 +
> > tests/functional/s390x/test_reverse_debug.py | 21
> > ++++++++++++++++++++
> > 3 files changed, 25 insertions(+), 1 deletion(-)
> > create mode 100755 tests/functional/s390x/test_reverse_debug.py
>
> Reviewed-by: Ilya Leoshkevich <[email protected]>
>
>
> I have a simple fix which helps with your original report, but not
> with this test. I'm still investigating.
>
> --- a/target/s390x/machine.c
> +++ b/target/s390x/machine.c
> @@ -52,6 +52,14 @@ static int cpu_pre_save(void *opaque)
> kvm_s390_vcpu_interrupt_pre_save(cpu);
> }
>
> + if (tcg_enabled()) {
> + /*
> + * Ensure symmetry with cpu_post_load() with respect to
> + * CHECKPOINT_CLOCK_VIRTUAL.
> + */
> + tcg_s390_tod_updated(CPU(cpu), RUN_ON_CPU_NULL);
> + }
> +
> return 0;
> }
Interestingly enough, this patch fails only under load, e.g., if I run
make check -j"$(nproc)" or if I run your test in isolation, but with
stress-ng cpu in background. The culprit appears to be:
s390_tod_load()
qemu_s390_tod_set()
async_run_on_cpu(tcg_s390_tod_updated)
Depending on the system load, this additional tcg_s390_tod_updated()
may or may not end up being called during handle_backward(). If it
does, we get an infinite loop again, because now we need two
checkpoints.
I have a feeling that this code may be violating some record-replay
requirement, but I can't quite put my finger on it. For example,
async_run_on_cpu() does not sound like something deterministic, but
then again it just queues work for rr_cpu_thread_fn(), which is
supposed to be deterministic.