On (Wed) 26 Oct 2016 [23:52:48], Hailiang Zhang wrote: > Hi Amit, > > On 2016/10/26 16:26, Amit Shah wrote: > >On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote: > >>Hi Amit, > >> > >>On 2016/10/26 14:09, Amit Shah wrote: > >>>Hello, > >>> > >>>On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote: > >>>>This is the 21th version of COLO frame series. > >>>> > >>>>Rebase to the latest master. > >>> > >>>I've reviewed the patchset, have some minor comments, but overall it > >>>looks good. The changes are contained, and common code / existing > >>>code paths are not affected much. We can still target to merge this > >>>for 2.8. > >>> > >> > >>I really appreciate your help ;), I will fix all the issues later > >>and send v22. Hope we can still catch the deadline of V2.8. > >> > >>>Do you have any tests on how much the VM slows down / downtime > >>>incurred during checkpoints? > >>> > >> > >>Yes, we tested that long time ago, it all depends. > >>The downtime is determined by the time of transferring the dirty pages > >>and the time of flushing ram from ram buffer. > >>But we really have methods to reduce the downtime. > >> > >>One method is to reduce the amount of data (dirty pages mainly) while do > >>checkpoint > >>by transferring dirty pages asynchronously while PVM and SVM are running > >>(no in > >>the time of doing checkpoint). Besides we can re-use the capability of > >>migration, such > >>as compressing, etc. > >>Another method is to reduce the time of flushing ram by using userfaultfd > >>API > >>to convert copying ram into marking bitmap. We can also flushing the ram > >>buffer > >>by multiple threads which advised by Dave ... > > > >Yes, I understand that as with any migration numbers, this too depends > >on what the guest is doing. However, can you just pick some standard > >workload - kernel compile or something like that - and post a few > >observations? > > > > Li Zhijian has sent some test results which based on kernel colo proxy, > After switch to userspace colo proxy, there maybe some degradations. > But for the old scenario, some optimizations are not implemented. > For the new userspace colo proxy scenario, we didn't test it overall, > Because it is still WIP, we will start the work after this frame is merged.
OK. > >>>Also, can you tell how did you arrive at the default checkpoint > >>>interval? > >>> > >> > >>Er, for this value, we referred to Remus in XEN platform. ;) > >>But after we implement COLO with colo proxy, this interval value will be > >>changed > >>to a bigger one (10s). And we will make it configuration too. Besides, we > >>will > >>add another configurable value to control the min interval of checkpointing. > > > >OK - any typical value that is a good mix between COLO keeping the > >network too busy / guest paused vs guest making progress? Again this > >is something that's workload-dependent, but I guess you have typical > >numbers from a network-bound workload? > > > > Yes, you can refer to Zhijian's email for detail. > I think it is necessary to add some test/performance results into COLO's wiki. > We will do that later. Yes, please. Also, in your next iteration, please add the colo files to the MAINTAINERS entry so you get CC'ed on future patches (and bugs :-) Amit