On 23/09/19 20:32, Jintack Lim wrote: > On Mon, Sep 23, 2019 at 4:48 AM Paolo Bonzini <pbonz...@redhat.com> wrote: >> >> On 23/09/19 12:42, Dr. David Alan Gilbert wrote: >>> >>> With those two clues, I guess maybe some dirty pages made by L2 are >>> not transferred to the destination correctly, but I'm not really sure. >>> >>> 3) It happens on Intel(R) Xeon(R) Silver 4114 CPU, but it does not on >>> Intel(R) Xeon(R) CPU E5-2630 v3 CPU. >> >> Hmm, try disabling pml (kvm_intel.pml=0). This would be the main >> difference, memory-management wise, between those two machines. >> > > Thank you, Paolo. > > This makes migration work successfully over 20 times in a row on > Intel(R) Xeon(R) Silver 4114 CPU where migration failed almost always > without disabling pml. > > I guess there's a problem in KVM pml code? I'm fine with disabling > pml. But if you have patches to fix the issue, I'm willing to test it > on the CPU.
Yes, it's a known bug in the PML code (that I thought was not an issue for migration, but I was wrong). I'll try to get you a patch this week. Paolo