On Thu, Nov 6, 2025 at 11:21 AM Zhang Chen <[email protected]> wrote: > > On Thu, Nov 6, 2025 at 9:10 AM Zhijian Li (Fujitsu) > <[email protected]> wrote: > > > > > > > > On 06/11/2025 04:58, Peter Xu wrote: > > > On Tue, Nov 04, 2025 at 09:36:06AM +0800, Li Zhijian wrote: > > >> Commit 4881411136 ("migration: Always set DEVICE state") set a new DEVICE > > >> state before completed during migration, which broke the original > > >> transition > > >> to COLO. The migration flow for precopy has changed to: > > >> active -> pre-switchover -> device -> completed. > > >> > > >> This patch updates the transition state to ensure that the Pre-COLO > > >> state corresponds to DEVICE state correctly. > > >> > > >> Fixes: 4881411136 ("migration: Always set DEVICE state") > > >> Signed-off-by: Li Zhijian <[email protected]> > > >> --- > > >> migration/migration.c | 4 ++-- > > >> 1 file changed, 2 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/migration/migration.c b/migration/migration.c > > >> index a63b46bbef..6ec7f3cec8 100644 > > >> --- a/migration/migration.c > > >> +++ b/migration/migration.c > > >> @@ -3095,9 +3095,9 @@ static void migration_completion(MigrationState *s) > > >> goto fail; > > >> } > > >> > > >> - if (migrate_colo() && s->state == MIGRATION_STATUS_ACTIVE) { > > >> + if (migrate_colo() && s->state == MIGRATION_STATUS_DEVICE) { > > >> /* COLO does not support postcopy */ > > >> - migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, > > >> + migrate_set_state(&s->state, MIGRATION_STATUS_DEVICE, > > >> MIGRATION_STATUS_COLO); > > >> } else { > > >> migration_completion_end(s); > > > > > > Thanks a lot for fixing it, Zhijian. It means I broke COLO already for > > > 10.0/10.1.. > > > > > > Hailiang/Chen, do you still know anyone who is using COLO, especially in > > > enterprise? I don't expect any individual using it.. It definitely > > > complicates migration logics all over the places. Fabiano and I discussed > > > a few times on removing legacy code and COLO was always in the list. > > > > > > We used to discuss RDMA obsoletion too, that's when Huawei developers at > > > least tried to re-implement the whole RDMA using rsocket, that didn't land > > > only because of a perf regression. Meanwhile, Zhijian also provided an > > > unit test, which we rely on recently to not break RDMA at the minimum. > > > > > > If we do not have known users, I sincerely want to discuss with you on > > > obsoletion and removal of COLO from qemu codebase. Do you see feasible? > > > > > > Zhijian, do you have any input here? > > > > > > If we don't have any known users, I personally have no objection to > > removing COLO. > > > > From my previous understanding, its use cases are rather limited, and the > > checkpointing overhead is significant. > > Moreover, with the continuous development of Cloud Native over the past > > decade, service-based > > FT/HA solutions have become very mature, which shrinks the use cases for > > VM-based FT solutions even further. > > > > I think it's worth keeping if we have: > > > > - Active users who depend on it. > > - A unit test for the COLO framework. > > > > Thanks > > Zhijian > > > > > > Add CC Lukas. > > From technical point, I agree Zhijian's comments. We can probably do > this gradually. > In my side, I know some local companies build thier HA/FT product based on > COLO. > In this case, I think most of them already forked QEMU upstream code > to a private repo for internal mantained. > It may caused some upgrade issues in the future. > > And another part is Lukas covered pacemaker project integrated COLO, > and I don't know users status for pacemaker. > Maybe Lukas can input some comments? > > For the implementation, COLO not only have migration part of code(it > is the core of COLO), it also including network and block replication > for co-working. > If we remove migration related code need to consider how to handle > other parts, network maybe change to general QEMU netfilter? block > replication ? > > For the COLO framework unit test, I think it need to add some "#if > defined(qtest)" in migration code for testing(COLO proxy/netfilter > already have independent qtest). > > Thanks > Chen > >
Add pacemaker/corosync related details for COLO: https://wiki.qemu.org/Features/COLO/Managed_HOWTO > > > > > > > > > > > Thanks, > > >
