On 01/12/2025 22.49, Ilya Leoshkevich wrote:
Hi,

Here is my attempt to fix [1] based on the discussion in [2].

I'm sending this as an RFC, because I have definitely misunderstood a
thing or two about record-replay, missed some timer bookkeeping
intricacies, and haven't split arch-dependent and independent parts
into different patches.

This survives "make check" and "make check-tcg" with the test from [2],
both with and without extra load in background.

Please let me know what you think about the approach.

Best regards,
Ilya

[1] 
https://lore.kernel.org/qemu-devel/[email protected]/
[2] https://lore.kernel.org/qemu-devel/[email protected]/

---

Replaying even trivial s390x kernels hangs, because:

- cpu_post_load() fires the TOD timer immediately.

- s390_tod_load() schedules work for firing the TOD timer.

- If rr loop sees work and then timer, we get one timer expiration.

- If rr loop sees timer and then work, we get two timer expirations.

- Record and replay may diverge due to this race.

- In this particular case divergence makes replay loop spin: it sees that
   TOD timer has expired, but cannot invoke its callback, because there
   is no recorded CHECKPOINT_CLOCK_VIRTUAL.

- The order in which rr loop sees work and timer depends on whether
   and when rr loop wakes up during load_snapshot().

- rr loop may wake up after the main thread kicks the CPU and drops
   the BQL, which may happen if it calls, e.g., qemu_cond_wait_bql().

Firing TOD timer twice is duplicate work, but it was introduced
intentionally in commit 7c12f710bad6 ("s390x/tcg: rearm the CKC timer
during migration") in order to avoid dependency on migration order.

The key culprits here are timers that are armed ready expired. They
break the ordering between timers and CPU work, because they are not
constrained by instruction execution, thus introducing non-determinism
and record-replay divergence.

Fix by converting such timer callbacks to CPU work. Also add TOD clock
updates to the save path, mirroring the load path, in order to have the
same CHECKPOINT_CLOCK_VIRTUAL during recording and replaying.

Signed-off-by: Ilya Leoshkevich <[email protected]>
---
  hw/s390x/tod.c           |  5 +++++
  stubs/async-run-on-cpu.c |  7 +++++++
  stubs/cpus-queue.c       |  4 ++++
  stubs/meson.build        |  2 ++
  target/s390x/machine.c   |  4 ++++
  util/qemu-timer.c        | 30 ++++++++++++++++++++++++++++++
  6 files changed, 52 insertions(+)
  create mode 100644 stubs/async-run-on-cpu.c
  create mode 100644 stubs/cpus-queue.c

Thanks, this indeed fixes the test for me, so:

Tested-by: Thomas Huth <[email protected]>


Reply via email to