On Thu, Aug 07, 2025 at 01:49:10PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin <jmar...@redhat.com>
> 
> Depending on where an error during postcopy_start() happens, the state
> can be either "active", "device" or "cancelling", but never
> "postcopy-active". Migration state is transitioned to "postcopy-active"
> only just before a successful return from the function.
> 
> Accept any state except "cancelling" when transitioning to "failed"
> state.
> 
> Signed-off-by: Juraj Marcin <jmar...@redhat.com>
> ---
>  migration/migration.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..e5ce2940d5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error 
> **errp)
>  fail_closefb:
>      qemu_fclose(fb);
>  fail:
> -    migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                          MIGRATION_STATUS_FAILED);
> +    if ( ms->state != MIGRATION_STATUS_CANCELLING) {
> +        migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> +    }

Hmm, this might have been overlooked from my commit 48814111366b.  Maybe
worth a Fixes and copy stable?

For example, I would expect the old code (prior of 48814111366b) still be
able to fail postcopy and resume src QEMU if qemu_savevm_send_packaged()
failed.  Now, looks like it'll be stuck at "device" state..

The other thing is it also looks like a common pattern to set FAILED
meanwhile not messing with a CANCELLING stage.  It's not easy to always
remember this, so maybe we should consider having a helper function?

  migrate_set_failure(MigrationState *, Error *err);

Which could set err with migrate_set_error() (likely we could also
error_report() the error), and update FAILED iff it's not CANCELLING.

I saw three of such occurances that such helper may apply, but worth double
check:

postcopy_start[2725]           if (ms->state != MIGRATION_STATUS_CANCELLING) {
migration_completion[3069]     if (s->state != MIGRATION_STATUS_CANCELLING) {
igration_connect[4064]        if (s->state != MIGRATION_STATUS_CANCELLING) {

If the cleanup looks worthwhile, and if the Fixes apply, we could have the
cleanup patch on top of the fixes patch so patch 1 is easier to backport.

Thanks,

>      migration_block_activate(NULL);
>      migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>      bql_unlock();
> -- 
> 2.50.1
> 

-- 
Peter Xu


Reply via email to