Peter Xu <pet...@redhat.com> writes: > On Thu, Jun 13, 2024 at 11:51:58AM -0300, Fabiano Rosas wrote: >> Peter Xu <pet...@redhat.com> writes: >> >> > This patch adds a migration state on src called "postcopy-recover-setup". >> > The new state will describe the intermediate step starting from when the >> > src QEMU started an postcopy recovery request, until the migration channels >> > are properly established, but before the recovery process take place. >> > >> > The request came from Libvirt where Libvirt currently rely on the migration >> > state events to detect migration state changes. That works for most of the >> > migration process but except postcopy recovery failures at the beginning. >> > >> > Currently postcopy recovery only has two major states: >> > >> > - postcopy-paused: this is the state that both sides of QEMU will be in >> > for a long time as long as the migration channel was interrupted. >> > >> > - postcopy-recover: this is the state where both sides of QEMU handshake >> > with each other, preparing for a continuation of postcopy which used to >> > be interrupted. >> > >> > The issue here is when the recovery port is invalid, the src QEMU will take >> > the URI/channels, noticing the ports are not valid, and it'll silently keep >> > in the postcopy-paused state, with no event sent to Libvirt. In this case, >> > the only thing Libvirt can do is to poll the migration status with a proper >> > interval, however that's less optimal. >> > >> > Considering that this is the only case where Libvirt won't get a >> > notification from QEMU on such events, let's add postcopy-recover-setup >> > state to mimic what we used to have with the "setup" state of a newly >> >> s/used to // > > Fixed. > >> >> > initialized migration, describing the phase of connection establishment. >> > >> > With that, postcopy recovery will have two paths to go now, and either path >> > will guarantee an event generated. Now the events will look like this >> > during a recovery process on src QEMU: >> > >> > - Initially when the recovery is initiated on src, QEMU will go from >> > "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have >> > this event. >> > >> > - Depending on whether the channel re-establishment is succeeded: >> > >> > - In succeeded case, src QEMU will move from "postcopy-recover-setup" >> > to "postcopy-recover". Old QEMUs also have this event. >> > >> > - In failure case, src QEMU will move from "postcopy-recover-setup" to >> > "postcopy-paused" again. Old QEMUs don't have this event. >> > >> > This guarantees that Libvirt will always receive a notification for >> > recovery process properly. >> > >> > One thing to mention is, such new status is only needed on src QEMU not >> > both. On dest QEMU, the state machine doesn't change. Hence the events >> > don't change either. It's done like so because dest QEMU may not have an >> > explicit point of setup start. E.g., it can happen that when dest QEMUs >> > doesn't use migrate-recover command to use a new URI/channel, but the old >> > URI/channels can be reused in recovery, in which case the old ports simply >> > can work again after the network routes are fixed up. >> > >> > The patch has some touch-ups in the dest path too, but it's because there's >> > some unclearness on using migrate_set_state(), so the change should make it >> > crystal clear now by checking current status always. The next step from >> >> Can we get a separate patch for these cleanups? > > We can, and I'll do. > >> >> > that POV would be making migrate_set_state() not using cmpxchg() but always >> > update the status, but that's for later. >> > >> > Cc: Jiri Denemark <jdene...@redhat.com> >> > Cc: Fabiano Rosas <faro...@suse.de> >> > Cc: Prasad Pandit <ppan...@redhat.com> >> > Buglink: https://issues.redhat.com/browse/RHEL-38485 >> > Signed-off-by: Peter Xu <pet...@redhat.com> >> > --- >> > qapi/migration.json | 4 +++ >> > migration/postcopy-ram.h | 3 ++ >> > migration/migration.c | 66 +++++++++++++++++++++++++++++++++++----- >> > migration/postcopy-ram.c | 6 ++++ >> > migration/savevm.c | 4 +-- >> > 5 files changed, 73 insertions(+), 10 deletions(-) >> > >> > diff --git a/qapi/migration.json b/qapi/migration.json >> > index a351fd3714..a135bbcd96 100644 >> > --- a/qapi/migration.json >> > +++ b/qapi/migration.json >> > @@ -142,6 +142,9 @@ >> > # >> > # @postcopy-paused: during postcopy but paused. (since 3.0) >> > # >> > +# @postcopy-recover-setup: setup phase for a postcopy recover process, >> > +# preparing for a recover phase to start. (since 9.1) >> >> recover*y* process >> recover*y* phase > > OK. > >> >> > +# >> > # @postcopy-recover: trying to recover from a paused postcopy. (since >> > # 3.0) >> > # >> > @@ -166,6 +169,7 @@ >> > { 'enum': 'MigrationStatus', >> > 'data': [ 'none', 'setup', 'cancelling', 'cancelled', >> > 'active', 'postcopy-active', 'postcopy-paused', >> > + 'postcopy-recover-setup', >> > 'postcopy-recover', 'completed', 'failed', 'colo', >> > 'pre-switchover', 'device', 'wait-unplug' ] } >> > ## >> > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h >> > index ecae941211..a6df1b2811 100644 >> > --- a/migration/postcopy-ram.h >> > +++ b/migration/postcopy-ram.h >> > @@ -13,6 +13,8 @@ >> > #ifndef QEMU_POSTCOPY_RAM_H >> > #define QEMU_POSTCOPY_RAM_H >> > >> > +#include "qapi/qapi-types-migration.h" >> > + >> > /* Return true if the host supports everything we need to do postcopy-ram >> > */ >> > bool postcopy_ram_supported_by_host(MigrationIncomingState *mis, >> > Error **errp); >> > @@ -193,5 +195,6 @@ enum PostcopyChannels { >> > void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile >> > *file); >> > void postcopy_preempt_setup(MigrationState *s); >> > int postcopy_preempt_establish_channel(MigrationState *s); >> > +bool postcopy_is_paused(MigrationStatus status); >> > >> > #endif >> > diff --git a/migration/migration.c b/migration/migration.c >> > index bfbd657035..9475dce7dc 100644 >> > --- a/migration/migration.c >> > +++ b/migration/migration.c >> > @@ -595,6 +595,26 @@ bool migrate_uri_parse(const char *uri, >> > MigrationChannel **channel, >> > return true; >> > } >> > >> > +static bool >> > +migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp) >> > +{ >> > + MigrationStatus current = mis->state; >> > + >> > + if (current == MIGRATION_STATUS_POSTCOPY_PAUSED) { >> > + /* Postcopy paused state doesn't change when setup new ports */ >> >> The "setup new ports" part is a bit vague. Maybe: >> >> /* >> * The SETUP state only happens at the start of migration. A postcopy >> * migration recovery migration stays in POSTCOPY_PAUSED. >> */ > > Mentioning SETUP in this if clause might be slightly confusing? How about > I only keep the latter sentence (and change it a bit; there's one extra > "migration")? > > /* > * Postcopy migration stays in POSTCOPY_PAUSED even if reconnection > * happens. > */
Yep. > >> >> > + return true; >> > + } >> > + >> > + if (current != MIGRATION_STATUS_NONE) { >> > + error_setg(errp, "Illegal migration incoming state: %s", >> > + MigrationStatus_str(current)); >> > + return false; >> > + } >> >> This is a good candidate for a separate patch due to the extra change in >> behavior not necessarily related to postcopy. > > I'll split that, but just to mention, I think there should have no > functional change, or we're doomed. Right, but it's technically not impossible, so better to have it separate anyway. > > The old code worked the same by ignoring migrate_set_state() when PAUSED in > the cmpxchg(), which is implicit and unclear. > Agreed. >> >> > + >> > + migrate_set_state(&mis->state, current, MIGRATION_STATUS_SETUP); >> > + return true; >> > +} >> > + >> > static void qemu_start_incoming_migration(const char *uri, bool >> > has_channels, >> > MigrationChannelList *channels, >> > Error **errp) >> > @@ -633,8 +653,9 @@ static void qemu_start_incoming_migration(const char >> > *uri, bool has_channels, >> > return; >> > } >> > >> > - migrate_set_state(&mis->state, MIGRATION_STATUS_NONE, >> > - MIGRATION_STATUS_SETUP); >> > + if (!migration_incoming_state_setup(mis, errp)) { >> > + return; >> > + } >> > >> > if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { >> > SocketAddress *saddr = &addr->u.socket; >> > @@ -1070,6 +1091,7 @@ bool migration_is_setup_or_active(void) >> > case MIGRATION_STATUS_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_PAUSED: >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> > case MIGRATION_STATUS_POSTCOPY_RECOVER: >> > case MIGRATION_STATUS_SETUP: >> > case MIGRATION_STATUS_PRE_SWITCHOVER: >> > @@ -1092,6 +1114,7 @@ bool migration_is_running(void) >> > case MIGRATION_STATUS_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_PAUSED: >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> > case MIGRATION_STATUS_POSTCOPY_RECOVER: >> > case MIGRATION_STATUS_SETUP: >> > case MIGRATION_STATUS_PRE_SWITCHOVER: >> > @@ -1229,6 +1252,7 @@ static void fill_source_migration_info(MigrationInfo >> > *info) >> > case MIGRATION_STATUS_PRE_SWITCHOVER: >> > case MIGRATION_STATUS_DEVICE: >> > case MIGRATION_STATUS_POSTCOPY_PAUSED: >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> > case MIGRATION_STATUS_POSTCOPY_RECOVER: >> > /* TODO add some postcopy stats */ >> > populate_time_info(info, s); >> > @@ -1279,6 +1303,7 @@ static void >> > fill_destination_migration_info(MigrationInfo *info) >> > case MIGRATION_STATUS_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_PAUSED: >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> >> Does this need to be here? We don't reach this state on destination, right? > > Right, it won't hurt having that here, but let me drop it. > >> >> > case MIGRATION_STATUS_POSTCOPY_RECOVER: >> > case MIGRATION_STATUS_FAILED: >> > case MIGRATION_STATUS_COLO: >> > @@ -1435,9 +1460,30 @@ static void migrate_error_free(MigrationState *s) >> > >> > static void migrate_fd_error(MigrationState *s, const Error *error) >> > { >> >> The default case of the swtich below is a bit surprising to me. Why >> wouldn't we allow this function to be called from other places to set >> STATUS_FAILED? >> >> ...unless this is only mean for the connection phase, so: >> >> just to check your understanding here because it seems we've drifted a >> bit from the original definition on those, specially with >> migrate_fd_cleanup(), but does this _fd_ in the function name implies >> something like a "connection phase"? As in, "connect to the fd", "the fd >> connection errored out" and "cleanup the fd connection". Maybe it's time >> to switch this "fd" to something clearer... > > IIUC it's just that many functions around this area used to be called > migrate_fd_*(). Let me know if you have some suggestions, though. > At some other point I think we need to rename those. Very little going on in these functions that relates directly to an fd. It needs a closer inspection, however. >> >> > + MigrationStatus current = s->state; >> > + MigrationStatus next; >> > + >> > assert(s->to_dst_file == NULL); >> > - migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, >> > - MIGRATION_STATUS_FAILED); >> > + >> > + switch (current) { >> > + case MIGRATION_STATUS_SETUP: >> > + next = MIGRATION_STATUS_FAILED; >> > + break; >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> > + /* Never fail a postcopy migration; switch back to PAUSED instead >> > */ >> > + next = MIGRATION_STATUS_POSTCOPY_PAUSED; >> >> So presumably we can keep recovering the migration indefinitely? > > Yes, that's by design. > ok >> >> > + break; >> > + default: >> > + /* >> > + * This really shouldn't happen. Just be careful to not crash a VM >> > + * just for this. Instead, dump something. >> > + */ >> > + error_report("%s: Illegal migration status (%s) detected", >> > + __func__, MigrationStatus_str(current)); >> > + return; >> > + } >> > + >> > + migrate_set_state(&s->state, current, next); >> > migrate_set_error(s, error); >> > } >> > >> > @@ -1538,6 +1584,7 @@ bool migration_in_postcopy(void) >> > switch (s->state) { >> > case MIGRATION_STATUS_POSTCOPY_ACTIVE: >> > case MIGRATION_STATUS_POSTCOPY_PAUSED: >> > + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: >> > case MIGRATION_STATUS_POSTCOPY_RECOVER: >> > return true; >> > default: >> > @@ -1936,6 +1983,9 @@ static bool migrate_prepare(MigrationState *s, bool >> > resume, Error **errp) >> > return false; >> > } >> > >> > + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED, >> > + MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP); >> > + >> > /* This is a resume, skip init status */ >> > return true; >> > } >> > @@ -2968,9 +3018,9 @@ static MigThrError postcopy_pause(MigrationState *s) >> > * We wait until things fixed up. Then someone will setup the >> > * status back for us. >> > */ >> > - while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { >> > + do { >> > qemu_sem_wait(&s->postcopy_pause_sem); >> > - } >> > + } while (postcopy_is_paused(s->state)); >> >> Is there a particular reason to go from while() to do{}while()? > > It'll be the same AFAICT, the 1st check should mostly be meaningless as the > sem must be posted at least once. Yes I should have had a comment > somewhere but I didn't.. if you want me to keep the old way it's also fine, > isn't a big deal here to check first either, just save some cycles when I > worked on these lines. > That's fine. Just making sure there wasn't some hidden purpose here. >> >> > >> > if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { >> > /* Woken up by a recover procedure. Give it a shot */ >> > @@ -3666,7 +3716,7 @@ void migrate_fd_connect(MigrationState *s, Error >> > *error_in) >> > { >> > Error *local_err = NULL; >> > uint64_t rate_limit; >> > - bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >> > + bool resume = migration_in_postcopy(); >> >> Here you're expecting just PAUSED or RECOVER_SETUP, right? We'll not >> reach here in any of the other postcopy states. > > I think here it must be RECOVER_SETUP after this patch. I changed it to > use the helper as I think that's cleaner (precopy doesn't allow resume), > and we don't need such change if the state machine trivially changes again. > Intent matters for anyone reading the code in the future. I would use the state explicitly, but I don't have a strong opinion, feel free to ignore. >> >> > int ret; >> > >> > /* >> > @@ -3733,7 +3783,7 @@ void migrate_fd_connect(MigrationState *s, Error >> > *error_in) >> > >> > if (resume) { >> > /* Wakeup the main migration thread to do the recovery */ >> > - migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED, >> > + migrate_set_state(&s->state, >> > MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP, >> > MIGRATION_STATUS_POSTCOPY_RECOVER); >> > qemu_sem_post(&s->postcopy_pause_sem); >> > return; >> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c >> > index 97701e6bb2..1c374b7ea1 100644 >> > --- a/migration/postcopy-ram.c >> > +++ b/migration/postcopy-ram.c >> > @@ -1770,3 +1770,9 @@ void *postcopy_preempt_thread(void *opaque) >> > >> > return NULL; >> > } >> > + >> > +bool postcopy_is_paused(MigrationStatus status) >> > +{ >> > + return status == MIGRATION_STATUS_POSTCOPY_PAUSED || >> > + status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; >> > +} >> > diff --git a/migration/savevm.c b/migration/savevm.c >> > index e71410d8c1..deb57833f8 100644 >> > --- a/migration/savevm.c >> > +++ b/migration/savevm.c >> > @@ -2864,9 +2864,9 @@ static bool >> > postcopy_pause_incoming(MigrationIncomingState *mis) >> > error_report("Detected IO failure for postcopy. " >> > "Migration paused."); >> > >> > - while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { >> > + do { >> > qemu_sem_wait(&mis->postcopy_pause_sem_dst); >> > - } >> > + } while (postcopy_is_paused(mis->state)); >> > >> > trace_postcopy_pause_incoming_continued(); >>