Hi Fabiano On 2025-08-08 16:08, Fabiano Rosas wrote: > Peter Xu <pet...@redhat.com> writes: > > > On Thu, Aug 07, 2025 at 01:49:10PM +0200, Juraj Marcin wrote: > >> From: Juraj Marcin <jmar...@redhat.com> > >> > >> Depending on where an error during postcopy_start() happens, the state > >> can be either "active", "device" or "cancelling", but never > >> "postcopy-active". Migration state is transitioned to "postcopy-active" > >> only just before a successful return from the function. > >> > >> Accept any state except "cancelling" when transitioning to "failed" > >> state. > >> > >> Signed-off-by: Juraj Marcin <jmar...@redhat.com> > >> --- > >> migration/migration.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/migration/migration.c b/migration/migration.c > >> index 10c216d25d..e5ce2940d5 100644 > >> --- a/migration/migration.c > >> +++ b/migration/migration.c > >> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error > >> **errp) > >> fail_closefb: > >> qemu_fclose(fb); > >> fail: > >> - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > >> - MIGRATION_STATUS_FAILED); > >> + if ( ms->state != MIGRATION_STATUS_CANCELLING) { > >> + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED); > >> + } > > > > Hmm, this might have been overlooked from my commit 48814111366b. Maybe > > worth a Fixes and copy stable? > > > > For example, I would expect the old code (prior of 48814111366b) still be > > able to fail postcopy and resume src QEMU if qemu_savevm_send_packaged() > > failed. Now, looks like it'll be stuck at "device" state.. > > > > The other thing is it also looks like a common pattern to set FAILED > > meanwhile not messing with a CANCELLING stage. It's not easy to always > > remember this, so maybe we should consider having a helper function? > > > > migrate_set_failure(MigrationState *, Error *err); > > > > We didn't do it back then because at there would be some logical > conflict with this series: > > https://lore.kernel.org/r/20250110100707.4805-1-shivam.kum...@nutanix.com > > But I don't remember the details. If it works this time I'm all for it.
Thanks! I will look into that. Best regards, Juraj Marcin > > > Which could set err with migrate_set_error() (likely we could also > > error_report() the error), and update FAILED iff it's not CANCELLING. > > > > I saw three of such occurances that such helper may apply, but worth double > > check: > > > > postcopy_start[2725] if (ms->state != > > MIGRATION_STATUS_CANCELLING) { > > migration_completion[3069] if (s->state != MIGRATION_STATUS_CANCELLING) > > { > > igration_connect[4064] if (s->state != MIGRATION_STATUS_CANCELLING) { > > > > If the cleanup looks worthwhile, and if the Fixes apply, we could have the > > cleanup patch on top of the fixes patch so patch 1 is easier to backport. > > > > Thanks, > > > >> migration_block_activate(NULL); > >> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL); > >> bql_unlock(); > >> -- > >> 2.50.1 > >> >