Peter Xu <pet...@redhat.com> writes: > On Tue, Aug 26, 2025 at 01:51:40PM +0200, Juraj Marcin wrote: >> From: Juraj Marcin <jmar...@redhat.com> >> >> Commit 48814111366b ("migration: Always set DEVICE state") introduced >> DEVICE state to postcopy, which moved the actual state transition that >> leads to POSTCOPY_ACTIVE. >> >> However, the error handling part of the postcopy_start() function still >> expects the state POSTCOPY_ACTIVE, but depending on where an error >> happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but >> never POSTCOPY_ACTIVE, as this transition now happens just before a >> successful return from the function. >> >> Instead, accept any state except CANCELLING when transitioning to FAILED >> state. >> >> Cc: qemu-sta...@nongnu.org >> Fixes: 48814111366b ("migration: Always set DEVICE state") >> Signed-off-by: Juraj Marcin <jmar...@redhat.com> > > Thanks, Juraj! > > Reviewed-by: Peter Xu <pet...@redhat.com> >
Reviewed-by: Fabiano Rosas <faro...@suse.de> >> >> --- >> In the RFC[1] where this patch was discussed, there was also a >> suggestion for a helper function migrate_set_failure() that would check >> if the state is not CANCELLING and then set migration error and FAILED >> state. I discussed the implementation with Peter, and we came to a >> conclusion that instead of patching such clean-up on top of the current >> error handling code, it might be more useful to do a larger refactor and >> clean-up of all error handling in the migration code. >> >> Such clean-up should reduce the number of places where we need to >> explicitly transition to a FAILED state (ideally to one, or only a >> couple of places), and instead only set an appropriate migration error >> using migrate_set_error(). Additionally, it would also refactor >> inappropriate uses of QEMUFile errors where the error is not really an >> error of the underlying channel and migrate_set_error() should be used >> instead. > > Fabiano: we discussed something around the FAILED status before as well. > If you started working on something in this area, please shoot! > I don't have anything planned, it's just the thread that I already linked in the previous version of this patch. Juraj is aware. >> >> [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmar...@redhat.com/ >> --- >> migration/migration.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 10c216d25d..32b8ce5613 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error >> **errp) >> fail_closefb: >> qemu_fclose(fb); >> fail: >> - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, >> - MIGRATION_STATUS_FAILED); >> + if (ms->state != MIGRATION_STATUS_CANCELLING) { >> + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED); >> + } >> migration_block_activate(NULL); >> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL); >> bql_unlock(); >> -- >> 2.50.1 >>