On 27/02/2024 07:41, Peter Xu wrote: > On Thu, Feb 22, 2024 at 05:56:27PM +0200, Avihai Horon wrote: >> This bug was observed in several VFIO migration scenarios where some >> workload on the VM prevented RAM from ever reaching a hard zero, not >> allowing VFIO initial pre-copy data to be sent, and thus destination >> could not ack switchover. Note that the same scenario, but without >> switchover-ack, would converge. >> >> Fix it by not serializing device data sending during pre-copy iterative >> phase if switchover was not acked yet. > > I am still not fully convinced that it's even legal that one device can > consume all iterator's bandwidth, ignoring the rest.. Though again it's > not about this patch, but about commit 90697be889. > > I'm thinking whether we should allow each device to have its own portion of > chance to push data for each call to qemu_savevm_state_iterate(), > irrelevant of vfio's switchover-ack capability. >
I guess that this means the only change needed is (...) >> diff --git a/migration/savevm.c b/migration/savevm.c >> index d612c8a9020..3a012796375 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s) >> * 0 : We haven't finished, caller have to go again >> * 1 : We have finished, we can go to complete phase >> */ >> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) >> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool >> can_switchover) >> { >> SaveStateEntry *se; >> int ret = 1; >> @@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool >> postcopy) >> "%d(%s): %d", >> se->section_id, se->idstr, ret); >> qemu_file_set_error(f, ret); >> + return ret; >> } >> - if (ret <= 0) { >> - /* Do not proceed to the next vmstate before this one reported >> - completion of the current stage. This serializes the >> migration >> - and reduces the probability that a faster changing state is >> - synchronized over and over again. */ >> + >> + if (ret == 0 && can_switchover) { >> + /* >> + * Do not proceed to the next vmstate before this one reported >> + * completion of the current stage. This serializes the >> migration >> + * and reduces the probability that a faster changing state is >> + * synchronized over and over again. >> + * Do it only if migration can switchover. If migration can't >> + * switchover yet, do proceed to let other devices send their >> data >> + * too, as this may be required for switchover to be acked and >> + * migration to converge. >> + */ >> break; >> } >> } (...) is here to have: if (ret < 0) { break; } ? Or you were thinking in some heuristic? Joao