On Tue, Jun 27, 2017 at 10:30:01PM +0800, Haozhong Zhang wrote: > On 06/26/17 13:56 +0100, Stefan Hajnoczi wrote: > > On Mon, Jun 26, 2017 at 10:05:01AM +0800, Haozhong Zhang wrote: > > > On 06/23/17 10:55 +0100, Stefan Hajnoczi wrote: > > > > On Fri, Jun 23, 2017 at 08:13:13AM +0800, haozhong.zh...@intel.com > > > > wrote: > > > > > On 06/22/17 15:08 +0100, Stefan Hajnoczi wrote: > > > > > > I tried live migrating a guest with NVDIMM on qemu.git/master > > > > > > (edf8bc984): > > > > > > > > > > > > $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \ > > > > > > -object > > > > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \ > > > > > > -device nvdimm,id=nvdimm1,memdev=mem1 \ > > > > > > -drive if=virtio,file=test.img,format=raw > > > > > > > > > > > > $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \ > > > > > > -object > > > > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \ > > > > > > -device nvdimm,id=nvdimm1,memdev=mem1 \ > > > > > > -drive if=virtio,file=test.img,format=raw \ > > > > > > -incoming tcp::1234 > > > > > > > > > > > > (qemu) migrate tcp:127.0.0.1:1234 > > > > > > > > > > > > The guest kernel panics or hangs every time on the destination. It > > > > > > happens as long as the nvdimm device is present - I didn't even > > > > > > mount it > > > > > > inside the guest. > > > > > > > > > > > > Is migration expected to work? > > > > > > > > > > Yes, I tested on QEMU 2.8.0 several months ago and it worked. I'll > > > > > have a look at this issue. > > > > > > > > Great, thanks! > > > > > > > > David Gilbert suggested the following on IRC, it sounds like a good > > > > starting point for debugging: > > > > > > > > Launch the destination QEMU with -S (vcpus will be paused) and after > > > > migration has completed, compare the NVDIMM contents on source and > > > > destination. > > > > > > > > > > Which host and guest kernel are you testing? Is any workload running > > > in guest when migration? > > > > > > I just tested QEMU commit edf8bc984 with host/guest kernel 4.8.0, and > > > could not reproduce the issue. > > > > I can still reproduce the problem on qemu.git edf8bc984. > > > > My guest kernel is fairly close to yours. The host kernel is newer. > > > > Host kernel: 4.11.6-201.fc25.x86_64 > > Guest kernel: 4.8.8-300.fc25.x86_64 > > > > Command-line: > > > > qemu-system-x86_64 \ > > -enable-kvm \ > > -cpu host \ > > -machine pc,nvdimm \ > > -m 1G,slots=4,maxmem=8G \ > > -object > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \ > > -device nvdimm,id=nvdimm1,memdev=mem1 \ > > -drive if=virtio,file=test.img,format=raw \ > > -display none \ > > -serial stdio \ > > -monitor unix:/tmp/monitor.sock,server,nowait > > > > Start migration at the guest login prompt. You don't need to log in or > > do anything inside the guest. > > > > There seems to be a guest RAM corruption because I get different > > backtraces inside the guest every time. > > > > The problem goes away if I remove -device nvdimm. > > > > I managed to reproduce this bug. After bisect between good v2.8.0 and > bad edf8bc984, it looks a regression introduced by > 6b6712efccd "ram: Split dirty bitmap by RAMBlock" > This commit may result in guest crash after migration if any host > memory backend is used. > > Could you test whether the attached draft patch fixes this bug? If yes, > I will make a formal patch later.
Thanks for the fix! I tested and replied to your v2 patch. Stefan
signature.asc
Description: PGP signature