On Thu, 6 Nov 2025 11:21:56 +0800 Zhang Chen <[email protected]> wrote:
> On Thu, Nov 6, 2025 at 9:10 AM Zhijian Li (Fujitsu) > <[email protected]> wrote: > > > > > > > > On 06/11/2025 04:58, Peter Xu wrote: > > > On Tue, Nov 04, 2025 at 09:36:06AM +0800, Li Zhijian wrote: > > >> Commit 4881411136 ("migration: Always set DEVICE state") set a new DEVICE > > >> state before completed during migration, which broke the original > > >> transition > > >> to COLO. The migration flow for precopy has changed to: > > >> active -> pre-switchover -> device -> completed. > > >> > > >> This patch updates the transition state to ensure that the Pre-COLO > > >> state corresponds to DEVICE state correctly. > > >> > > >> Fixes: 4881411136 ("migration: Always set DEVICE state") > > >> Signed-off-by: Li Zhijian <[email protected]> > > >> --- > > >> [...] > > > > > > Thanks a lot for fixing it, Zhijian. It means I broke COLO already for > > > 10.0/10.1.. > > > > > > Hailiang/Chen, do you still know anyone who is using COLO, especially in > > > enterprise? I don't expect any individual using it.. It definitely > > > complicates migration logics all over the places. Fabiano and I discussed > > > a few times on removing legacy code and COLO was always in the list. > > > > > > We used to discuss RDMA obsoletion too, that's when Huawei developers at > > > least tried to re-implement the whole RDMA using rsocket, that didn't land > > > only because of a perf regression. Meanwhile, Zhijian also provided an > > > unit test, which we rely on recently to not break RDMA at the minimum. > > > > > > If we do not have known users, I sincerely want to discuss with you on > > > obsoletion and removal of COLO from qemu codebase. Do you see feasible? > > > > > > Zhijian, do you have any input here? > > > > > > If we don't have any known users, I personally have no objection to > > removing COLO. > > > > From my previous understanding, its use cases are rather limited, and the > > checkpointing overhead is significant. > > Moreover, with the continuous development of Cloud Native over the past > > decade, service-based > > FT/HA solutions have become very mature, which shrinks the use cases for > > VM-based FT solutions even further. > > > > I think it's worth keeping if we have: > > > > - Active users who depend on it. > > - A unit test for the COLO framework. > > > > Thanks > > Zhijian > > > > > > Add CC Lukas. > > [...] Hello Everyone, Thanks for bringing this to my attention. I will write a migration unit-test and take maintainership of the colo components. Peter, what is your plan with refactoring the migration code and where is the colo code blocking you? I have quite a few cleanup patches lying around. Are you open to take these in advance before the next merge window opens? Best regards, Lukas Straub
pgpkw3pZeEMIR.pgp
Description: OpenPGP digital signature
