On 2/22/2024 23:56, Avihai Horon wrote: > Currently, migration code serializes device data sending during pre-copy > iterative phase. As noted in the code comment, this is done to prevent > faster changing device from sending its data over and over. > > However, with switchover-ack capability enabled, this behavior can be > problematic and may prevent migration from converging. The problem lies > in the fact that an earlier device may never finish sending its data and > thus block other devices from sending theirs. > > This bug was observed in several VFIO migration scenarios where some > workload on the VM prevented RAM from ever reaching a hard zero, not > allowing VFIO initial pre-copy data to be sent, and thus destination > could not ack switchover. Note that the same scenario, but without > switchover-ack, would converge. > > Fix it by not serializing device data sending during pre-copy iterative > phase if switchover was not acked yet.
Hi Avihai, Can this bug be solved by ordering the priority of different device's handlers? > > Fixes: 1b4adb10f898 ("migration: Implement switchover ack logic") > Signed-off-by: Avihai Horon <avih...@nvidia.com> > --- > migration/savevm.h | 2 +- > migration/migration.c | 4 ++-- > migration/savevm.c | 22 +++++++++++++++------- > 3 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/migration/savevm.h b/migration/savevm.h > index 74669733dd6..d4a368b522b 100644 > --- a/migration/savevm.h > +++ b/migration/savevm.h > @@ -36,7 +36,7 @@ void qemu_savevm_state_setup(QEMUFile *f); > bool qemu_savevm_state_guest_unplug_pending(void); > int qemu_savevm_state_resume_prepare(MigrationState *s); > void qemu_savevm_state_header(QEMUFile *f); > -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy); > +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool > can_switchover); > void qemu_savevm_state_cleanup(void); > void qemu_savevm_state_complete_postcopy(QEMUFile *f); > int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only, > diff --git a/migration/migration.c b/migration/migration.c > index ab21de2cadb..d8bfe1fb1b9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3133,7 +3133,7 @@ static MigIterateState > migration_iteration_run(MigrationState *s) > } > > /* Just another iteration step */ > - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); > + qemu_savevm_state_iterate(s->to_dst_file, in_postcopy, can_switchover); > return MIG_ITERATE_RESUME; > } > > @@ -3216,7 +3216,7 @@ static MigIterateState > bg_migration_iteration_run(MigrationState *s) > { > int res; > > - res = qemu_savevm_state_iterate(s->to_dst_file, false); > + res = qemu_savevm_state_iterate(s->to_dst_file, false, true); > if (res > 0) { > bg_migration_completion(s); > return MIG_ITERATE_BREAK; > diff --git a/migration/savevm.c b/migration/savevm.c > index d612c8a9020..3a012796375 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s) > * 0 : We haven't finished, caller have to go again > * 1 : We have finished, we can go to complete phase > */ > -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) > +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool > can_switchover) > { > SaveStateEntry *se; > int ret = 1; > @@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool > postcopy) > "%d(%s): %d", > se->section_id, se->idstr, ret); > qemu_file_set_error(f, ret); > + return ret; > } > - if (ret <= 0) { > - /* Do not proceed to the next vmstate before this one reported > - completion of the current stage. This serializes the migration > - and reduces the probability that a faster changing state is > - synchronized over and over again. */ > + > + if (ret == 0 && can_switchover) { > + /* > + * Do not proceed to the next vmstate before this one reported > + * completion of the current stage. This serializes the migration > + * and reduces the probability that a faster changing state is > + * synchronized over and over again. > + * Do it only if migration can switchover. If migration can't > + * switchover yet, do proceed to let other devices send their > data > + * too, as this may be required for switchover to be acked and > + * migration to converge. > + */ > break; > } > } > @@ -1724,7 +1732,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp) > qemu_savevm_state_setup(f); > > while (qemu_file_get_error(f) == 0) { > - if (qemu_savevm_state_iterate(f, false) > 0) { > + if (qemu_savevm_state_iterate(f, false, true) > 0) { > break; > } > }