Peter Xu <pet...@redhat.com> writes:

> On Tue, Aug 26, 2025 at 01:51:40PM +0200, Juraj Marcin wrote:
>> From: Juraj Marcin <jmar...@redhat.com>
>> 
>> Commit 48814111366b ("migration: Always set DEVICE state") introduced
>> DEVICE state to postcopy, which moved the actual state transition that
>> leads to POSTCOPY_ACTIVE.
>> 
>> However, the error handling part of the postcopy_start() function still
>> expects the state POSTCOPY_ACTIVE, but depending on where an error
>> happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
>> never POSTCOPY_ACTIVE, as this transition now happens just before a
>> successful return from the function.
>> 
>> Instead, accept any state except CANCELLING when transitioning to FAILED
>> state.
>> 
>> Cc: qemu-sta...@nongnu.org
>> Fixes: 48814111366b ("migration: Always set DEVICE state")
>> Signed-off-by: Juraj Marcin <jmar...@redhat.com>
>
> Thanks, Juraj!
>
> Reviewed-by: Peter Xu <pet...@redhat.com>
>

Reviewed-by: Fabiano Rosas <faro...@suse.de>

>> 
>> ---
>> In the RFC[1] where this patch was discussed, there was also a
>> suggestion for a helper function migrate_set_failure() that would check
>> if the state is not CANCELLING and then set migration error and FAILED
>> state. I discussed the implementation with Peter, and we came to a
>> conclusion that instead of patching such clean-up on top of the current
>> error handling code, it might be more useful to do a larger refactor and
>> clean-up of all error handling in the migration code.
>> 
>> Such clean-up should reduce the number of places where we need to
>> explicitly transition to a FAILED state (ideally to one, or only a
>> couple of places), and instead only set an appropriate migration error
>> using migrate_set_error(). Additionally, it would also refactor
>> inappropriate uses of QEMUFile errors where the error is not really an
>> error of the underlying channel and migrate_set_error() should be used
>> instead.
>
> Fabiano: we discussed something around the FAILED status before as well.
> If you started working on something in this area, please shoot!
>

I don't have anything planned, it's just the thread that I already
linked in the previous version of this patch. Juraj is aware.

>> 
>> [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmar...@redhat.com/
>> ---
>>  migration/migration.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 10c216d25d..32b8ce5613 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error 
>> **errp)
>>  fail_closefb:
>>      qemu_fclose(fb);
>>  fail:
>> -    migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> -                          MIGRATION_STATUS_FAILED);
>> +    if (ms->state != MIGRATION_STATUS_CANCELLING) {
>> +        migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
>> +    }
>>      migration_block_activate(NULL);
>>      migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>>      bql_unlock();
>> -- 
>> 2.50.1
>> 

Reply via email to