> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > On 17/10/2018 13:38, Pavel Dovgalyuk wrote: > >> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > >> On 17/10/2018 11:53, Artem Pisarenko wrote: > >>> See my last comment in bug report. This kind of modification, even > >>> adapted to changed function name, doesn't solve issue. > >>> I thought long time that it does, but once I catched qemu with a hang. > >>> And of course, I wasn't able to reproduce it. So it just better hides > >>> issue. > >>> Take a look at alternative solution from > >>> QBox: > >>> https://git.greensocs.com/qemu/qbox/commit/a8ed106032e375e715a531d6e93e4d9ec295dbdb > >>> I didn't catched fail with it (yet). > > > > Tried to test it, but rr seems to be broken again. > > I'll try to bisect now. > > Can we add a test that runs with "make check" and covers the basics of > record/replay's cpus.c bits? > > rr is very cool, and we fixed/understood a lot of stuff when getting it > ready for inclusion. But now it's constantly broken and every time we > change rr we also risk breaking icount.
I found the source of the bug. As QEMU becomes more multi-threaded and non-synchronized, checkpoints move from thread to thread. And the event queue that processed at checkpoints should belong to the same thread in both record and replay executions. Current problem was with the checkpoint for virtual timers. They are processed from different threads: from vCPU and from aio_dispatch function. Therefore the following patch fixes the problem, but I think that this part has to be refactored. There should be nailed-to-thread events that process the event queue. Then checkpoints can become just synchronization events and therefore omitted for empty timer lists, for example. diff --git a/replay/replay-events.c b/replay/replay-events.c index 83a7d81..60e8c21 100644 --- a/replay/replay-events.c +++ b/replay/replay-events.c @@ -205,6 +205,7 @@ void replay_save_events(int checkpoint) { g_assert(replay_mutex_locked()); g_assert(checkpoint != CHECKPOINT_CLOCK_WARP_START); + g_assert(checkpoint != CHECKPOINT_CLOCK_VIRTUAL); while (!QTAILQ_EMPTY(&events_list)) { Event *event = QTAILQ_FIRST(&events_list); replay_save_event(event, checkpoint); diff --git a/replay/replay.c b/replay/replay.c index 93d2573..79b8485 100644 --- a/replay/replay.c +++ b/replay/replay.c @@ -231,7 +231,8 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint) /* This checkpoint belongs to several threads. Processing events from different threads is non-deterministic */ - if (checkpoint != CHECKPOINT_CLOCK_WARP_START) { + if (checkpoint != CHECKPOINT_CLOCK_WARP_START + && checkpoint != CHECKPOINT_CLOCK_VIRTUAL) { replay_save_events(checkpoint); } res = true; Pavel Dovgalyuk