* Peter Xu (pet...@redhat.com) wrote: > Tree is pushed here for better reference and testing (online tree > includes monitor OOB series): > > https://github.com/xzpeter/qemu/tree/postcopy-recover-all > > This version removed quite a few patches related to migrate-incoming, > instead I introduced a new command "migrate-recover" to trigger the > recovery channel on destination side to simplify the code. > > To test this two series altogether, please checkout above tree and > build. Note: to test on small and single host, one need to disable > full bandwidth postcopy migration otherwise it'll complete very fast. > Basically a simple patch like this would help: > > diff --git a/migration/migration.c b/migration/migration.c > index 4de3b551fe..c0206023d7 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool > *old_vm_running) > * will notice we're in POSTCOPY_ACTIVE and not actually > * wrap their state up here > */ > - qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > + // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); > if (migrate_postcopy_ram()) { > /* Ping just for debugging, helps line traces up */ > qemu_savevm_send_ping(ms->to_dst_file, 2); > > This patch is included already in above github tree. Please feel free > to drop this patch when want to test on big machines and between real > hosts. > > Detailed Test Procedures (QMP only) > =================================== > > 1. start source QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > /images/fedora-25.qcow2
I suspect -snapshot isn't doing the right thing to the storage when combined with the migration - I'm assuming the destination isn't using the same temporary file. (Also any reason for specifying split irqchip?) > 2. start destination QEMU. > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \ > -smp 4 -m 1G -qmp stdio \ > -name peter-vm,debug-threads=on \ > -netdev user,id=net0 \ > -device e1000,netdev=net0 \ > -global migration.x-max-bandwidth=4096 \ > -global migration.x-postcopy-ram=on \ > -incoming tcp:0.0.0.0:5555 \ > /images/fedora-25.qcow2 > > 3. On source, do QMP handshake as normal: > > {"execute": "qmp_capabilities"} > {"return": {}} > > 4. On destination, do QMP handshake to enable OOB: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > {"return": {}} > > 5. On source, trigger initial migrate command, switch to postcopy: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } } > {"return": {}} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 300, "status": "active", ...}} > {"execute": "migrate-start-postcopy"} > {"return": {}} > {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": > "STOP"} > {"execute": "query-migrate"} > {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}} > > 6. On source, manually trigger a "fake network down" using > "migrate-cancel" command: > > {"execute": "migrate_cancel"} > {"return": {}} > > During postcopy, it'll not really cancel the migration, but pause > it. On both sides, we should see this on stderr: > > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. > > It means now both sides are in postcopy-pause state. > > 7. (Optional) On destination side, let's try to hang the main thread > using the new x-oob-test command, providing a "lock=true" param: > > {"execute": "x-oob-test", "id": "lock-dispatcher-cmd", > "arguments": { "lock": true } } > > After sending this command, we should not see any "return", because > main thread is blocked already. But we can still use the monitor > since the monitor now has dedicated IOThread. > > 8. On destination side, provide a new incoming port using the new > command "migrate-recover" (note that if step 7 is carried out, we > _must_ use OOB form, otherwise the command will hang. With OOB, > this command will return immediately): > > {"execute": "migrate-recover", "id": "recover-cmd", > "arguments": { "uri": "tcp:localhost:5556" }, > "control": { "run-oob": true } } > {"timestamp": {"seconds": 1512454976, "microseconds": 186053}, > "event": "MIGRATION", "data": {"status": "setup"}} > {"return": {}, "id": "recover-cmd"} > > We can see that the command will success even if main thread is > locked up. > > 9. (Optional) This step is only needed if step 7 is carried out. On > destination, let's unlock the main thread before resuming the > migration, this time with "lock=false" to unlock the main thread > (since system running needs the main thread). Note that we _must_ > use OOB command here too: > > {"execute": "x-oob-test", "id": "unlock-dispatcher", > "arguments": { "lock": false }, "control": { "run-oob": true } } > {"return": {}, "id": "unlock-dispatcher"} > {"return": {}, "id": "lock-dispatcher-cmd"} > > Here the first "return" is the reply to the unlock command, the > second "return" is the reply to the lock command. After this > command, main thread is released. > > 10. On source, resume the postcopy migration: > > {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", > "resume": true }} > {"return": {}} > {"execute": "query-migrate"} > {"return": {"status": "completed", ...}} The use of x-oob-test to lock things is a bit different to reality and that means the ordering is different. When the destination is blocked by a page request, that page won't become unstuck until sometime after (10) happens and delivers the page to the target. You could try an 'info cpu' on the destination at (7) - although it's not guaranteed to lock, depending whether the page needed has arrived. Dave > Here's the change log: > > v5: > - add some more r-bs > - fix error path in ram_load_postcopy to always check on "ret" [Dave] > - move init/destroy of three new sems into migration object > init/finalize functions > - dropped patch "migration: delay the postcopy-active state switch", > meanwhile touch up patch 6 to check against > POSTCOPY_INCOMING_RUNNING state when trying to switch to > postcopy-pause state. [Dave] > - drop two patches that introduce qmp/hmp of migrate-pause, instead > re-use migrate-cancel to do manual trigger of postcopy recovery. > - add a new patch to let migrate_cancel to pause migration if it's > already in postcopy phase. > - add a new command "migrate-recover" to re-assign the incoming port, > instead of reusing migrate-incoming. > - since now I used migrate-recover command instead of migrate-incoming > itself, I dropped quite a few patches that are not really relevant > now, so the series got smaller: > migration: return incoming task tag for sockets > migration: return incoming task tag for exec > migration: return incoming task tag for fd > migration: store listen task tag > migration: allow migrate_incoming for paused VM > > v4: > - fix two compile errors that patchew reported > - for QMP: do s/2.11/2.12/g > - fix migrate-incoming logic to be more strict > > v3: > - add r-bs correspondingly > - in ram_load_postcopy() capture error if postcopy_place_page() failed > [Dave] > - remove "break" if there is a "goto" before that [Dave] > - ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more > print sizes [Dave] > - remove RAMState.ramblock_to_sync, instead use local counter [Dave] > - init tag in tcp_start_incoming_migration() [Dave] > - more traces when transmiting the recv bitmap [Dave] > - postcopy_pause_incoming(): do shutdown before taking rp lock [Dave] > - add one more patch to postpone the state switch of postcopy-active [Dave] > - refactor the migrate_incoming handling according to the email > discussion [Dave] > - add manual trigger to pause postcopy (two new patches added to > introduce "migrate-pause" command for QMP/HMP). [Dave] > > v2: > - rebased to alexey's received bitmap v9 > - add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21 > - patch 1: use target page size to calc bitmap [Dave] > - patch 3: move trace_*() after EINTR check [Dave] > - patch 4: dropped since I can use bitmap_complement() [Dave] > - patch 7: check file error right after data is read in both > qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(), > meanwhile also check in check_section_footer() [Dave] > - patch 8/9: fix error_report/commit message in both patches [Dave] > - patch 10: dropped (new parameter "x-postcopy-fast") > - patch 11: split the "postcopy-paused" patch into two, one to > introduce the new state, the other to implement the logic. Also, > print something when paused [Dave] > - patch 17: removed do_resume label, introduced migration_prepare() > [Dave] > - patch 18: removed do_pause label using a new loop [Dave] > - patch 20: removed incorrect comment [Dave] > - patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add > trace in loadvm_handle_recv_bitmap() [Dave] > - patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit > machines. More info in the commit message update. > - patch 23: add one check on migration state [Dave] > - patch 24: use macro instead of magic 1 [Dave] > - patch 26: use more trace_*() instead of one, and use one sem to > replace mutex+cond. [Dave] > - move sem init/destroy into migration_instance_init() and > migration_instance_finalize (new function after rebase). > - patch 29: squashed this patch most into: > "migration: implement "postcopy-pause" src logic" [Dave] > - split the two fix patches out of the series > - fixed two places where I misused "wake/woke/woken". [Dave] > - add new patch "bitmap: provide to_le/from_le helpers" to solve the > bitmap endianess issue [Dave] > - appended migrate_incoming series to this series, since that one is > depending on the paused state. Using explicit g_source_remove() for > listening ports [Dan] > > FUTURE TODO LIST > - support migrate_cancel during PAUSED/RECOVER state > - when anything wrong happens during PAUSED/RECOVER, switching back to > PAUSED state on both sides > > As we all know that postcopy migration has a potential risk to lost > the VM if the network is broken during the migration. This series > tries to solve the problem by allowing the migration to pause at the > failure point, and do recovery after the link is reconnected. > > There was existing work on this issue from Md Haris Iqbal: > > https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html > > This series is a totally re-work of the issue, based on Alexey > Perevalov's recved bitmap v8 series: > > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html > > Two new status are added to support the migration (used on both > sides): > > MIGRATION_STATUS_POSTCOPY_PAUSED > MIGRATION_STATUS_POSTCOPY_RECOVER > > The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the > network failure is detected. It is a phase that we'll be in for a long > time as long as the failure is detected, and we'll be there until a > recovery is triggered. In this state, all the threads (on source: > send thread, return-path thread; destination: ram-load thread, > page-fault thread) will be halted. > > The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered > a recovery, both source/destination VM will jump into this stage, do > whatever it needs to prepare the recovery (e.g., currently the most > important thing is to synchronize the dirty bitmap, please see commit > messages for more information). After the preparation is ready, the > source will do the final handshake with destination, then both sides > will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again. > > New commands/messages are defined as well to satisfy the need: > > MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for > delivering received bitmaps > > MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final > handshake of postcopy recovery. > > Here's some more details on how the whole failure/recovery routine is > happened: > > - start migration > - ... (switch from precopy to postcopy) > - both sides are in "postcopy-active" state > - ... (failure happened, e.g., network unplugged) > - both sides switch to "postcopy-paused" state > - all the migration threads are stopped on both sides > - ... (both VMs hanged) > - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on > source side, "-r" means "recover") > - both sides switch to "postcopy-recover" state > - on source: send-thread, return-path-thread will be waked up > - on dest: ram-load-thread waked up, fault-thread still paused > - source calls new savevmhandler hook resume_prepare() (currently, > only ram is providing the hook): > - ram_resume_prepare(): for each ramblock, fetch recved bitmap by: > - src sends MIG_CMD_RECV_BITMAP to dst > - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data > - src uses the recved bitmap to rebuild dirty bitmap > - source do final handshake with destination > - src sends MIG_CMD_RESUME to dst, telling "src is ready" > - when dst receives the command, fault thread will be waked up, > meanwhile, dst switch back to "postcopy-active" > - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready" > - when src receives the ack, state switch to "postcopy-active" > - postcopy migration continued > > Testing: > > As I said, it's still an extremely simple test. I used socat to create > a socket bridge: > > socat tcp-listen:6666 tcp-connect:localhost:5555 & > > Then do the migration via the bridge. I emulated the network failure > by killing the socat process (bridge down), then tries to recover the > migration using the other channel (default dst channel). It looks > like: > > port:6666 +------------------+ > +----------> | socat bridge [1] |-------+ > | +------------------+ | > | (Original channel) | > | | port: 5555 > +---------+ (Recovery channel) +--->+---------+ > | src VM |------------------------------------>| dst VM | > +---------+ +---------+ > > Known issues/notes: > > - currently destination listening port still cannot change. E.g., the > recovery should be using the same port on destination for > simplicity. (on source, we can specify new URL) > > - the patch: "migration: let dst listen on port always" is still > hacky, it just kept the incoming accept open forever for now... > > - some migration numbers might still be inaccurate, like total > migration time, etc. (But I don't really think that matters much > now) > > - the patches are very lightly tested. > > - Dave reported one problem that may hang destination main loop thread > (one vcpu thread holds the BQL) and the rest. I haven't encountered > it yet, but it does not mean this series can survive with it. > > - other potential issues that I may have forgotten or unnoticed... > > Anyway, the work is still in preliminary stage. Any suggestions and > comments are greatly welcomed. Thanks. > > Peter Xu (28): > migration: better error handling with QEMUFile > migration: reuse mis->userfault_quit_fd > migration: provide postcopy_fault_thread_notify() > migration: new postcopy-pause state > migration: implement "postcopy-pause" src logic > migration: allow dst vm pause on postcopy > migration: allow src return path to pause > migration: allow send_rq to fail > migration: allow fault thread to pause > qmp: hmp: add migrate "resume" option > migration: pass MigrationState to migrate_init() > migration: rebuild channel on source > migration: new state "postcopy-recover" > migration: wakeup dst ram-load-thread for recover > migration: new cmd MIG_CMD_RECV_BITMAP > migration: new message MIG_RP_MSG_RECV_BITMAP > migration: new cmd MIG_CMD_POSTCOPY_RESUME > migration: new message MIG_RP_MSG_RESUME_ACK > migration: introduce SaveVMHandlers.resume_prepare > migration: synchronize dirty bitmap for resume > migration: setup ramstate for resume > migration: final handshake for the resume > migration: free SocketAddress where allocated > migration: init dst in migration_object_init too > io: let watcher of the channel run in same ctx > migration: allow migrate_cancel to pause postcopy > qmp/migration: new command migrate-recover > hmp/migration: add migrate_recover command > > hmp-commands.hx | 28 ++- > hmp.c | 14 +- > hmp.h | 1 + > include/migration/register.h | 2 + > io/channel.c | 2 +- > migration/migration.c | 549 > ++++++++++++++++++++++++++++++++++++++----- > migration/migration.h | 24 +- > migration/postcopy-ram.c | 110 +++++++-- > migration/postcopy-ram.h | 2 + > migration/ram.c | 247 ++++++++++++++++++- > migration/ram.h | 3 + > migration/savevm.c | 233 +++++++++++++++++- > migration/savevm.h | 3 + > migration/socket.c | 4 +- > migration/trace-events | 21 ++ > qapi/migration.json | 35 ++- > 16 files changed, 1172 insertions(+), 106 deletions(-) > > -- > 2.14.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK