This series is a continuation of the following RFC series and its
discussion [1].

[1]: https://lore.kernel.org/all/20250807114922.1013286-1-jmar...@redhat.com/

This series takes a different approach to source side recoverability
than the original RFC series, it uses existing PING/PONG message types.
Although, such approach has some theoretical race conditions, when
discussed we came to a conclusion that in practice there is a very, very
slim chance if any for it to happen. On the other hand, this approach
doesn't require any changes in the migration protocol nor the
destination side QEMU instance to be functional.

In preparation for the state introduction, this series contains few
changes.

First, it includes a patch suggested by Peter, which adds a check to
block device activation when the source side tries to resume after a
failed migration.

Next, it refactors cleanup and error handling on the destination side.
This change is not strictly necessary for the feature to work. Without
this patch, if device state load failed, the destination QEMU would
either exit with an error exit code from the listen thread, or it might
crash if the main thread does some cleanup before the listen thread
exits the process. However, the source side can recover regardless of
how the destination side fails.

Finally, the last patch contains the main feature, the POSTCOPY_DEVICE
state. Compared to the approach discussed in the RFC, it uses a new PING
message with custom PING number. The reason behind that is, that the
PING 3 message is now sent only when postcopy-ram is active, but there
might be postcopy scenarios when this isn't true. The destination side
can respond to this new PING message without any changes required.

As this change introduces a new migration state, I have also tested it
with libvirt. Apart from a warning about an unknown migration state
received in an event, migration finishes without any issues.

Juraj Marcin (3):
  migration: Accept MigrationStatus in migration_has_failed()
  migration: Refactor incoming cleanup into migration_incoming_finish()
  migration: Introduce POSTCOPY_DEVICE state

Peter Xu (1):
  migration: Do not try to start VM if disk activation fails

 migration/migration.c                 | 124 +++++++++++++++++---------
 migration/migration.h                 |   3 +-
 migration/multifd.c                   |   2 +-
 migration/savevm.c                    |  48 ++++------
 migration/savevm.h                    |   2 +
 migration/trace-events                |   1 +
 qapi/migration.json                   |   8 +-
 tests/qtest/migration/precopy-tests.c |   3 +-
 8 files changed, 112 insertions(+), 79 deletions(-)

-- 
2.51.0


Reply via email to