Hi, It seems, that this approach is not always correct. Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including external ones). qemu_start_warp_timer uses the deadline for setting warp timer (which should be deterministic). Therefore warp timer may become non-deterministic and replay may behave differently compared to recording phase.
We have to rollback these or improve somehow to avoid non-determinism. Pavel Dovgalyuk > -----Original Message----- > From: Artem Pisarenko [mailto:artem.k.pisare...@gmail.com] > Sent: Thursday, October 18, 2018 2:04 PM > To: qemu-devel@nongnu.org > Cc: Pavel Dovgalyuk; Paolo Bonzini; Artem Pisarenko > Subject: [PATCH v3 0/4] Introduce attributes for timers subsystem and remove > QEMU_CLOCK_VIRTUAL_EXT clock type > > Recent patches from series [PATCH v6] "Fixing record/replay and adding > reverse debugging" > introduced new clock type QEMU_CLOCK_VIRTUAL_EXT and replaced virtual timers > in some external > subsystems with it. > This resulted in small change to existing behavior, which I consider to be > unacceptable. > Processing of virtual timers, belonging to new clock type, was kicked off to > main loop, which > made them asynchronous with vCPU thread and, in icount mode, with whole guest > execution. This > breaks expected determinism in non-record/replay icount mode of emulation > where these > "external subsystems" are isolated from host (i.e. they external only to > guest core, not to > emulation environment). > > Example for slirp ("user" backend for network device): > User runs qemu in icount mode with rtc clock=vm without any external > communication interfaces > but with "-netdev user,restrict=on". It expects deterministic execution, > because network > services are emulated inside qemu and isolated from host. There are no > reasons to get reply > from DHCP server with different delay or something like that. > > These series of patches revert those commits and reimplement their > modifications in a better > way. > > Current implementation of timers/clock processing is confusing (at least for > me) because of > exceptions from design concept behind them, which already introduced by > icount mode (which > adds QEMU_CLOCK_VIRTUAL_RT). Adding QEMU_CLOCK_VIRTUAL_EXT just made things > even more > complicated. I consider these "alternative" virtual clocks to be some kind of > hacks being > convinient only to authors of relevant qemu features. Lets don't touch > fundamental clock types > and keep them orthogonal to special cases of timers handling. > > As far as I understand, original intention of author was just to make > optimization in replay > log by avoiding storing extra events which don't change guest state directly. > Indeed, for > example, ipv6 icmp timer in slirp does things which external to guest core > and ends with > sending network packet to guest, but record/replay will anyway catch event, > corresponding to > packet reception in guest network frontend, and store it to replay log, so > there are no need > in making checkpoint for corresponding clock when that timer fires. > If so, then we just need to skip checkpoints for clock values, when only > these specific timers > are run. It is individual timers which are specific, not clock. > Adding some kind of attribute/flag/property to individual timer allows any > special qemu > feature (such as record/replay) to inspect it and handle as needed. This > design achieves less > dependencies, more transparency and has more intuitive and clear sense. For > record/replay > feature it's achieved with 'EXTERNAL' attribute. The only drawback is that it > required to add > one extra mutex unlock/lock pair for virtual clock type in > timerlist_run_timers() function > (see patch 3), but it's being added only when qemu runs in record/replay mode. > > Finally, this patch series optimizes checkpointing for all other clocks it > applies to. > > > v3 changes: > - fixed failed build in last patch > - attributes has been refactored and restricted to be used only within > qemu-timer (as > suggested by Stefan Hajnoczi and Paolo Bonzini) > > v2 changes: > - timers/aio interface refactored and improved, addition to couroutines > removed (as Paolo > Bonzini suggested) > - fixed wrong reimplementation in qemu-timer.c caused race conditions > - added bonus patch which optimizes checkpointing for other clocks > > P.S. I've tried to test record/replay with slirp, but in replay mode qemu > stucks at guest > linux boot after "Configuring network interfaces..." message, where DHCP > communication takes > place. It's broken in a same way both in master and master with reverted > commits being fixed. > > P.S.2. I wasn't able to test these patches on purely clean master branch > because of bugs > https://bugs.launchpad.net/qemu/+bug/1790460 and > https://bugs.launchpad.net/qemu/+bug/1795369, > which workarounded by several not-accepted (yet?) modifications. > > > Artem Pisarenko (4): > Revert some patches from recent [PATCH v6] "Fixing record/replay and > adding reverse debugging" > Introduce attributes to qemu timer subsystem > Restores record/replay behavior related to special virtual clock > processing for timers used in external subsystems. > Optimize record/replay checkpointing for all clocks it applies to > > include/block/aio.h | 59 ++++++++++++++++++--- > include/qemu/timer.h | 128 > +++++++++++++++++++++++----------------------- > slirp/ip6_icmp.c | 9 ++-- > tests/ptimer-test-stubs.c | 13 +++-- > ui/input.c | 9 ++-- > util/qemu-timer.c | 95 +++++++++++++++++++++++----------- > 6 files changed, 201 insertions(+), 112 deletions(-) > > -- > 2.7.4