Hi, > > * Gonglei (Arei) (arei.gong...@huawei.com) wrote: > > Hi Dave, > > > > We discussed some live migration fallback scenarios in this year's KVM > > forum, > > and now I can provide another scenario, perhaps the upstream should > consider rolling > > back for this situation. > > > > Environments information: > > > > host A: cpu E5620(model WestmereEP without flag xsave) > > host B: cpu E5-2643(model SandyBridgeEP with flag xsave) > > > > The reproduce steps is : > > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough). > > Well we don't guarantee migration across -cpu host - does this problem > go away if both qemu's are started with matching CPU flags > (corresponding to the Westmere) ? > Sorry, we didn't test other cpu model scenarios since we should assure that the live migration support from lower generation CPUs to higher generation CPUs. :(
> > 2. Migrate the vm to host B when cr4.OSXSAVE=0. > > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. > > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu > printed log as followed: > > > > KVM: entry failed, hardware error 0x80000021 > > > > If you're running a guest on an Intel machine without unrestricted mode > > support, the failure can be most likely due to the guest entering an invalid > > state for Intel VT. For example, the guest maybe running in big real mode > > which is not supported on less recent Intel processors. > > > > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 > > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 > > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > > ES =0000 00000000 0000ffff 00009300 > > CS =f000 ffff0000 0000ffff 00009b00 > > SS =0000 00000000 0000ffff 00009300 > > DS =0000 00000000 0000ffff 00009300 > > FS =0000 00000000 0000ffff 00009300 > > GS =0000 00000000 0000ffff 00009300 > > LDT=0000 00000000 0000ffff 00008200 > > TR =0000 00000000 0000ffff 00008b00 > > GDT= 00000000 0000ffff > > IDT= 00000000 0000ffff > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > > DR6=00000000ffff0ff0 DR7=0000000000000400 > > EFER=0000000000000000 > > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 > > > > Problem happened when kvm_put_sregs returns err -22(called by > kvm_arch_put_registers(qemu)). > > > > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that > > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1. > > We should cancel migration if kvm_arch_put_registers returns error. > > Do you have a backtrace of when the kvm_arch_put_registers is called > when it fails? The main backtrace is below: qemu_loadvm_state cpu_synchronize_all_post_init --> w/o return value cpu_synchronize_post_init --> w/o return value kvm_cpu_synchronize_post_init --> w/o return value run_on_cpu ---> w/o return value do_kvm_cpu_synchronize_post_init --> w/o return value kvm_arch_put_registers --> w/ return value Root cause is some functions don't have return values, the migration thread can't detect those failures. Paolo? > If it's called during the loading of the device state then we should be > able to detect it and fail the migration; however if it's only failing > after the CPU is restarted after the migration then it's a bit too late. > Actually the CPUs haven't started in this scenario. Thanks, -Gonglei