* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > This is the 8th version of COLO.
I'm seeing an occasional error: pcibus_reset: Assertion `bus->irq_count[i] == 0' failed. on the secondary; have you seen that? bus->irq_count[4] is -1 in my backtrace; it's colo_process_incoming_checkpoints->qemu_devices_reset->qbus_walk_children->qbus_reset_one->pcibus_reset Dave > Here is only COLO frame part, include: VM checkpoint, > failover, proxy API, block replication API, not include block replication. > The block part is treated as a separate series. > > As usual, we provide 'basic' and 'developing' branches in github: > https://github.com/coloft/qemu/commits/colo-v1.5-basic > https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features) > > The 'basic' branch is exactly the same with this patch series, > We will keep this series simple as possible, just for easy review. > > The extra features in colo-v1.5-developing branch: > 1) Separate ram and device save/load process to reduce size of extra memory > used during checkpoint > 2) Live migrate part of dirty pages to slave during sleep time. > 3) You get the statistic info about checkpoint by command 'info migrate' > > Please reference to the follow link to test COLO. > http://wiki.qemu.org/Features/COLO. > > COLO is a totally new feature which is still in early stage, > your comments and feedback are warmly welcomed. > > NOTE: > We have decided to re-implement the colo proxy in userspace (In qemu exactly). > you can find the discussion about why & how to realize the colo proxy in qemu > from the follow link: > http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html > > TODO: > 1. COLO function switch on/off > 2. The capability of continuous FT > 3. Optimize the performance. > > v8: > - Move some global variables into MigrationIncomingState and MigrationState > - Move some cleanup work form colo thread and colo incoming thread into > failover > BH function and also fix the code logic for the cleanup work. > - fix the bug that colo thread and colo incoming thread possibly block in the > socket 'recv' call when do failover work. > - Optimize colo_flush_ram_cache() > - Add migration state for incoming side, we use the state to verify if > migration > incoming side is in COLO state or not (Patch 5). > - Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', since > it is not correct. > > zhanghailiang (34): > configure: Add parameter for configure to enable/disable COLO support > migration: Introduce capability 'colo' to migration > COLO: migrate colo related info to slave > colo-comm/migration: skip colo info section for special cases > migration: Add state records for migration incoming > migration: Integrate COLO checkpoint process into migration > migration: Integrate COLO checkpoint process into loadvm > COLO: Implement colo checkpoint protocol > COLO: Add a new RunState RUN_STATE_COLO > QEMUSizedBuffer: Introduce two help functions for qsb > COLO: Save VM state to slave when do checkpoint > COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily > COLO VMstate: Load VM state into qsb before restore it > arch_init: Start to trace dirty pages of SVM > COLO RAM: Flush cached RAM into SVM's memory > COLO failover: Introduce a new command to trigger a failover > COLO failover: Introduce state to record failover process > COLO failover: Implement COLO primary/secondary vm failover work > qmp event: Add event notification for COLO error > COLO failover: Don't do failover during loading VM's state > COLO: Add new command parameter 'forward_nic' 'colo_script' for net > COLO NIC: Init/remove colo nic devices when add/cleanup tap devices > tap: Make launch_script() public > COLO NIC: Implement colo nic device interface configure() > colo-nic: Handle secondary VM's original net device configure > COLO NIC: Implement colo nic init/destroy function > COLO NIC: Some init work related with proxy module > COLO: Handle nfnetlink message from proxy module > COLO: Do checkpoint according to the result of packets comparation > COLO: Improve checkpoint efficiency by do additional periodic > checkpoint > COLO: Add colo-set-checkpoint-period command > COLO NIC: Implement NIC checkpoint and failover > COLO: Implement shutdown checkpoint > COLO: Add block replication into colo process > > configure | 33 +- > docs/qmp/qmp-events.txt | 16 + > hmp-commands.hx | 30 ++ > hmp.c | 15 + > hmp.h | 2 + > include/exec/cpu-all.h | 1 + > include/migration/colo.h | 45 +++ > include/migration/failover.h | 33 ++ > include/migration/migration.h | 19 + > include/migration/qemu-file.h | 3 +- > include/net/colo-nic.h | 37 ++ > include/net/net.h | 2 + > include/net/tap.h | 19 + > include/sysemu/sysemu.h | 3 + > migration/Makefile.objs | 2 + > migration/colo-comm.c | 75 ++++ > migration/colo-failover.c | 83 +++++ > migration/colo.c | 805 > ++++++++++++++++++++++++++++++++++++++++++ > migration/migration.c | 116 ++++-- > migration/qemu-file-buf.c | 58 +++ > migration/ram.c | 242 ++++++++++++- > migration/savevm.c | 2 +- > net/Makefile.objs | 1 + > net/colo-nic.c | 457 ++++++++++++++++++++++++ > net/net.c | 2 + > net/tap.c | 90 +++-- > qapi-schema.json | 58 ++- > qapi/event.json | 15 + > qemu-options.hx | 7 + > qmp-commands.hx | 42 +++ > scripts/colo-proxy-script.sh | 145 ++++++++ > stubs/Makefile.objs | 1 + > stubs/migration-colo.c | 58 +++ > trace-events | 10 + > vl.c | 37 +- > 35 files changed, 2474 insertions(+), 90 deletions(-) > create mode 100644 include/migration/colo.h > create mode 100644 include/migration/failover.h > create mode 100644 include/net/colo-nic.h > create mode 100644 migration/colo-comm.c > create mode 100644 migration/colo-failover.c > create mode 100644 migration/colo.c > create mode 100644 net/colo-nic.c > create mode 100755 scripts/colo-proxy-script.sh > create mode 100644 stubs/migration-colo.c > > -- > 1.8.3.1 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK