On Tue, Oct 21, 2025 at 6:09 AM Lukas Straub <[email protected]> wrote:
>
> On Mon, 20 Oct 2025 17:41:30 -0400
> Peter Xu <[email protected]> wrote:
>
> > On Wed, Oct 08, 2025 at 05:26:13PM -0400, Peter Xu wrote:
> > > On Thu, Sep 04, 2025 at 04:27:39PM +0800, Zhang Chen wrote:
> > > > > I confess I didn't test anything on COLO but only from code
> > > > > observations
> > > > > and analysis. COLO maintainers: could you add some unit tests to
> > > > > QEMU's
> > > > > qtests?
> > > >
> > > > For the COLO part, I think remove the coroutines related code is OK for
> > > > me.
> > > > Because the original coroutine still need to call the
> > > > "colo_process_incoming_thread".
> > >
> > > Chen, thanks for the comment. It's still reassuring.
> > >
> > > >
> > > > Hi Hailiang, any comments for this part?
> > >
> > > Any further comment on this series would always be helpful.
> > >
> > > It'll be also great if anyone can come up with a selftest for COLO. Now
> > > any new migration features needs both unit test and doc to get merged.
> > > COLO was merged earlier so it doesn't need to, however these will be
> > > helpful for sure to make sure COLO won't be easily broken.
> >
> > Chen/Hailiang:
> >
> > I may use some help from COLO side.
> >
> > Just now, I did give it a shot with the current docs/COLO-FT.txt and it
> > didn't really work for me.
> >
> > The cmdlines I used almost followed the doc, however I changed a few
> > things. For example, on secondary VM I added "file.locking=off" for drive
> > "parent0" because otherwise the "nbd-server-add" command will fail taking
> > the lock and it won't ever boot. Meanwhile I switched to socket netdev
> > from tap, in my case I only plan to run the COLO main routine, I hope
> > that's harmless too but let me know if it is a problem.
> >
> > So below are the final cmdlines I used..
> >
> > For primary:
> >
> > bin=~/git/qemu/bin/qemu-system-x86_64
> > $bin -enable-kvm -cpu qemu64,kvmclock=on \
> > -m 512 -smp 1 -qmp stdio \
> > -device piix3-usb-uhci -device usb-tablet -name primary \
> > -netdev socket,id=hn0,listen=127.0.0.1:10000 \
> > -device rtl8139,id=e0,netdev=hn0 \
> > -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
> > -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
> > -chardev
> > socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
> > -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
> > -chardev
> > socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
> > -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
> > -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
> > -object
> > filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
> > -object
> > filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
> > -object iothread,id=iothread1 \
> > -object
> > colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1
> > \
> > -drive
> > if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=./primary.qcow2,children.0.driver=qcow2
> >
> > For secondary (testing locally, hence using 127.0.0.1 as primary_ip):
> >
> > bin=~/git/qemu/bin/qemu-system-x86_64
> > primary_ip=127.0.0.1
> > $bin -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
> > -device piix3-usb-uhci -device usb-tablet -name secondary \
> > -netdev socket,id=hn0,connect=127.0.0.1:10000 \
> > -device rtl8139,id=e0,netdev=hn0 \
> > -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
> > -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
> > -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
> > -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
> > -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
> > -drive
> > if=none,id=parent0,file.filename=primary.qcow2,driver=qcow2,file.locking=off
> > \
> > -drive
> > if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,top-id=colo-disk0,file.file.filename=secondary-active.qcow2,file.backing.driver=qcow2,file.backing.file.filename=secondary-hidden.qcow2,file.backing.backing=parent0
> > \
> > -drive
> > if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=childs0
> > \
> > -incoming tcp:0.0.0.0:9998
> >
>
> Hi Peter,
> You have to use -incoming defer and enable x-colo on the
> secondary side before starting migration.
>
> And primary.qcow2 should be a separate image (with same content) for
> each qemu instance.
Yes, Lukas is right. Qemu can't allow 2 VM touch 1 image.
So, you can try to "cp primary.qcow2 secondary.qcow2",
then change the secondary side to " -drive
if=none,id=parent0,file.filename=secondary.qcow2,driver=qcow2,file.locking=off
\"
Thanks
Chen
>
> Regards,
> Lukas
>
>
> > I started secondary, then primary, run the suggested QMP commands on
> > secondary first, then the bunch of QMP commands on primary. I got below
> > error:
> >
> > x1:colo $ ./primary.sh
> > qemu-system-x86_64: -chardev
> > socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on: info: QEMU
> > waiting for connection on: disconnected:tcp:0.0.0.0:9004,server=on
> > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10},
> > "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
> > VNC server running on ::1:5900
> > {"execute":"qmp_capabilities"}
> > {"return": {}}
> > {"execute": "human-monitor-command", "arguments": {"command-line":
> > "drive_add -n buddy
> > driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
> > {"return": ""}
> > {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0",
> > "node": "replication0" } }
> > {"return": {}}
> > {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [
> > {"capability": "x-colo", "state": true } ] } }
> > {"return": {}}
> > {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
> > {"return": {}}
> > {"timestamp": {"seconds": 1760996025, "microseconds": 483349}, "event":
> > "STOP"}
> >
> > x1:colo $ ./secondary.sh
> > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10},
> > "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
> > VNC server running on ::1:5901
> > {"execute":"qmp_capabilities"}
> > {"return": {}}
> > {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [
> > {"capability": "x-colo", "state": true } ] } }
> > {"return": {}}
> > {"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet",
> > "data": {"host": "0.0.0.0", "port": "9999"} } } }
> > {"return": {}}
> > {"execute": "nbd-server-add", "arguments": {"device": "parent0",
> > "writable": true } }
> > {"return": {}}
> > {"timestamp": {"seconds": 1760996025, "microseconds": 695059}, "event":
> > "RESUME"}
> > qemu-system-x86_64: Can't receive COLO message: Input/output error
> > {"timestamp": {"seconds": 1760996025, "microseconds": 695369}, "event":
> > "COLO_EXIT", "data": {"mode": "secondary", "reason": "error"}}
> >
> > Do you know what I missed? Or does it mean that COLO is broken?
> >
> > Meanwhile, do you know if COLO still being used by anyone? I'm pretty sure
> > both Fabiano and myself are not looking after it.. I remember Dave used to
> > try it, but it might be a long time ago too.
> >
> > Thanks,
> >
>