On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
> On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
> > From: Thomas Huth <[email protected]>
> > 
> > We just have to make sure that we can set the endianness to big
> > endian,
> > then we can also run this test on s390x.
> > 
> > Signed-off-by: Thomas Huth <[email protected]>
> > ---
> >  Marked as RFC since it depends on the fix for this bug (so it
> > cannot
> >  be merged yet):
> >  
> > https://lore.kernel.org/qemu-devel/[email protected]
> > /
> > 
> >  tests/functional/reverse_debugging.py        |  4 +++-
> >  tests/functional/s390x/meson.build           |  1 +
> >  tests/functional/s390x/test_reverse_debug.py | 21
> > ++++++++++++++++++++
> >  3 files changed, 25 insertions(+), 1 deletion(-)
> >  create mode 100755 tests/functional/s390x/test_reverse_debug.py
> 
> Reviewed-by: Ilya Leoshkevich <[email protected]>
> 
> 
> I have a simple fix which helps with your original report, but not
> with this test. I'm still investigating.
> 
> --- a/target/s390x/machine.c
> +++ b/target/s390x/machine.c
> @@ -52,6 +52,14 @@ static int cpu_pre_save(void *opaque)
>          kvm_s390_vcpu_interrupt_pre_save(cpu);
>      }
>  
> +    if (tcg_enabled()) {
> +        /*
> +         * Ensure symmetry with cpu_post_load() with respect to
> +         * CHECKPOINT_CLOCK_VIRTUAL.
> +         */
> +        tcg_s390_tod_updated(CPU(cpu), RUN_ON_CPU_NULL);
> +    }
> +
>      return 0;
>  }

Interestingly enough, this patch fails only under load, e.g., if I run
make check -j"$(nproc)" or if I run your test in isolation, but with
stress-ng cpu in background. The culprit appears to be:

s390_tod_load()
  qemu_s390_tod_set()
    async_run_on_cpu(tcg_s390_tod_updated)

Depending on the system load, this additional tcg_s390_tod_updated()
may or may not end up being called during handle_backward(). If it
does, we get an infinite loop again, because now we need two
checkpoints.

I have a feeling that this code may be violating some record-replay
requirement, but I can't quite put my finger on it. For example,
async_run_on_cpu() does not sound like something deterministic, but
then again it just queues work for rr_cpu_thread_fn(), which is
supposed to be deterministic.

Reply via email to