This series is a continuation of the following RFC series and its discussion [1].
[1]: https://lore.kernel.org/all/20250807114922.1013286-1-jmar...@redhat.com/ This series takes a different approach to source side recoverability than the original RFC series, it uses existing PING/PONG message types. Although, such approach has some theoretical race conditions, when discussed we came to a conclusion that in practice there is a very, very slim chance if any for it to happen. On the other hand, this approach doesn't require any changes in the migration protocol nor the destination side QEMU instance to be functional. In preparation for the state introduction, this series contains few changes. First, it includes a patch suggested by Peter, which adds a check to block device activation when the source side tries to resume after a failed migration. Next, it refactors cleanup and error handling on the destination side. This change is not strictly necessary for the feature to work. Without this patch, if device state load failed, the destination QEMU would either exit with an error exit code from the listen thread, or it might crash if the main thread does some cleanup before the listen thread exits the process. However, the source side can recover regardless of how the destination side fails. Finally, the last patch contains the main feature, the POSTCOPY_DEVICE state. Compared to the approach discussed in the RFC, it uses a new PING message with custom PING number. The reason behind that is, that the PING 3 message is now sent only when postcopy-ram is active, but there might be postcopy scenarios when this isn't true. The destination side can respond to this new PING message without any changes required. As this change introduces a new migration state, I have also tested it with libvirt. Apart from a warning about an unknown migration state received in an event, migration finishes without any issues. Juraj Marcin (3): migration: Accept MigrationStatus in migration_has_failed() migration: Refactor incoming cleanup into migration_incoming_finish() migration: Introduce POSTCOPY_DEVICE state Peter Xu (1): migration: Do not try to start VM if disk activation fails migration/migration.c | 124 +++++++++++++++++--------- migration/migration.h | 3 +- migration/multifd.c | 2 +- migration/savevm.c | 48 ++++------ migration/savevm.h | 2 + migration/trace-events | 1 + qapi/migration.json | 8 +- tests/qtest/migration/precopy-tests.c | 3 +- 8 files changed, 112 insertions(+), 79 deletions(-) -- 2.51.0