On Tue, Jul 30, 2013 at 09:04:56AM +0000, Zhanghaoyu (A) wrote: > > >> >> hi all, > >> >> > >> >> I met similar problem to these, while performing live migration or > >> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, > >> >> guest:suse11sp2), running tele-communication software suite in > >> >> guest, > >> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html > >> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 > >> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 > >> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771 > >> >> > >> >> After live migration or virsh restore [savefile], one process's CPU > >> >> utilization went up by about 30%, resulted in throughput > >> >> degradation of this process. > >> >> > >> >> If EPT disabled, this problem gone. > >> >> > >> >> I suspect that kvm hypervisor has business with this problem. > >> >> Based on above suspect, I want to find the two adjacent versions of > >> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), > >> >> and analyze the differences between this two versions, or apply the > >> >> patches between this two versions by bisection method, finally find the > >> >> key patches. > >> >> > >> >> Any better ideas? > >> >> > >> >> Thanks, > >> >> Zhang Haoyu > >> > > >> >I've attempted to duplicate this on a number of machines that are as > >> >similar to yours as I am able to get my hands on, and so far have not > >> >been able to see any performance degradation. And from what I've read in > >> >the above links, huge pages do not seem to be part of the problem. > >> > > >> >So, if you are in a position to bisect the kernel changes, that would > >> >probably be the best avenue to pursue in my opinion. > >> > > >> >Bruce > >> > >> I found the first bad > >> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w > >> information to gup(), allow read-only memory) which triggers this problem > >> by git bisecting the kvm kernel (download from > >> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. > >> > >> And, > >> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p > > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log > >> git diff > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 > >> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff > >> > >> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, > >> came to a conclusion that all of the differences between > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 > >> are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, > >> so this commit is the peace-breaker which directly or indirectly causes > >> the degradation. > >> > >> Does the map_writable flag passed to mmu_set_spte() function have effect > >> on PTE's PAT flag or increase the VMEXITs induced by that guest tried to > >> write read-only memory? > >> > >> Thanks, > >> Zhang Haoyu > >> > > > >There should be no read-only memory maps backing guest RAM. > > > >Can you confirm map_writable = false is being passed to __direct_map? (this > >should not happen, for guest RAM). > >And if it is false, please capture the associated GFN. > > > I added below check and printk at the start of __direct_map() at the fist bad > commit version, > --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c > 2013-07-26 18:44:05.000000000 +0800 > +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.000000000 +0800 > @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu > int pt_write = 0; > gfn_t pseudo_gfn; > > + if (!map_writable) > + printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__, __func__, > gfn); > + > for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { > if (iterator.level == level) { > unsigned pte_access = ACC_ALL; > > I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, > you can absolutely describe it as flooding. > The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not.
-- Gleb.