On Thu, 6 Nov 2025 11:21:56 +0800
Zhang Chen <[email protected]> wrote:

> On Thu, Nov 6, 2025 at 9:10 AM Zhijian Li (Fujitsu)
> <[email protected]> wrote:
> >
> >
> >
> > On 06/11/2025 04:58, Peter Xu wrote:  
> > > On Tue, Nov 04, 2025 at 09:36:06AM +0800, Li Zhijian wrote:  
> > >> Commit 4881411136 ("migration: Always set DEVICE state") set a new DEVICE
> > >> state before completed during migration, which broke the original 
> > >> transition
> > >> to COLO. The migration flow for precopy has changed to:
> > >> active -> pre-switchover -> device -> completed.
> > >>
> > >> This patch updates the transition state to ensure that the Pre-COLO
> > >> state corresponds to DEVICE state correctly.
> > >>
> > >> Fixes: 4881411136 ("migration: Always set DEVICE state")
> > >> Signed-off-by: Li Zhijian <[email protected]>
> > >> ---
> > >> [...]
> > >
> > > Thanks a lot for fixing it, Zhijian.  It means I broke COLO already for
> > > 10.0/10.1..
> > >
> > > Hailiang/Chen, do you still know anyone who is using COLO, especially in
> > > enterprise?  I don't expect any individual using it.. It definitely
> > > complicates migration logics all over the places.  Fabiano and I discussed
> > > a few times on removing legacy code and COLO was always in the list.
> > >
> > > We used to discuss RDMA obsoletion too, that's when Huawei developers at
> > > least tried to re-implement the whole RDMA using rsocket, that didn't land
> > > only because of a perf regression.  Meanwhile, Zhijian also provided an
> > > unit test, which we rely on recently to not break RDMA at the minimum.
> > >
> > > If we do not have known users, I sincerely want to discuss with you on
> > > obsoletion and removal of COLO from qemu codebase.  Do you see feasible?
> > >
> > > Zhijian, do you have any input here?  
> >
> >
> > If we don't have any known users, I personally have no objection to 
> > removing COLO.
> >
> >  From my previous understanding, its use cases are rather limited, and the 
> > checkpointing overhead is significant.
> > Moreover, with the continuous development of Cloud Native over the past 
> > decade, service-based
> > FT/HA solutions have become very mature, which shrinks the use cases for 
> > VM-based FT solutions even further.
> >
> > I think it's worth keeping if we have:
> >
> > - Active users who depend on it.
> > - A unit test for the COLO framework.
> >
> > Thanks
> > Zhijian
> >
> >  
> 
> Add CC Lukas.
> 
> [...]

Hello Everyone,

Thanks for bringing this to my attention.

I will write a migration unit-test and take maintainership of the colo
components.

Peter, what is your plan with refactoring the migration code and where
is the colo code blocking you?

I have quite a few cleanup patches lying around. Are you open to take
these in advance before the next merge window opens?

Best regards,
Lukas Straub

Attachment: pgpkw3pZeEMIR.pgp
Description: OpenPGP digital signature

Reply via email to