Re: [Qemu-devel] NVDIMM live migration broken?

Stefan Hajnoczi Wed, 28 Jun 2017 03:05:43 -0700

On Tue, Jun 27, 2017 at 10:30:01PM +0800, Haozhong Zhang wrote:
> On 06/26/17 13:56 +0100, Stefan Hajnoczi wrote:
> > On Mon, Jun 26, 2017 at 10:05:01AM +0800, Haozhong Zhang wrote:
> > > On 06/23/17 10:55 +0100, Stefan Hajnoczi wrote:
> > > > On Fri, Jun 23, 2017 at 08:13:13AM +0800, haozhong.zh...@intel.com 
> > > > wrote:
> > > > > On 06/22/17 15:08 +0100, Stefan Hajnoczi wrote:
> > > > > > I tried live migrating a guest with NVDIMM on qemu.git/master 
> > > > > > (edf8bc984):
> > > > > > 
> > > > > >   $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \
> > > > > >          -object 
> > > > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
> > > > > >      -device nvdimm,id=nvdimm1,memdev=mem1 \
> > > > > >      -drive if=virtio,file=test.img,format=raw
> > > > > > 
> > > > > >   $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \
> > > > > >          -object 
> > > > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
> > > > > >      -device nvdimm,id=nvdimm1,memdev=mem1 \
> > > > > >      -drive if=virtio,file=test.img,format=raw \
> > > > > >      -incoming tcp::1234
> > > > > > 
> > > > > >   (qemu) migrate tcp:127.0.0.1:1234
> > > > > > 
> > > > > > The guest kernel panics or hangs every time on the destination.  It
> > > > > > happens as long as the nvdimm device is present - I didn't even 
> > > > > > mount it
> > > > > > inside the guest.
> > > > > > 
> > > > > > Is migration expected to work?
> > > > > 
> > > > > Yes, I tested on QEMU 2.8.0 several months ago and it worked. I'll
> > > > > have a look at this issue.
> > > > 
> > > > Great, thanks!
> > > > 
> > > > David Gilbert suggested the following on IRC, it sounds like a good
> > > > starting point for debugging:
> > > > 
> > > > Launch the destination QEMU with -S (vcpus will be paused) and after
> > > > migration has completed, compare the NVDIMM contents on source and
> > > > destination.
> > > > 
> > > 
> > > Which host and guest kernel are you testing? Is any workload running
> > > in guest when migration?
> > > 
> > > I just tested QEMU commit edf8bc984 with host/guest kernel 4.8.0, and
> > > could not reproduce the issue.
> > 
> > I can still reproduce the problem on qemu.git edf8bc984.
> > 
> > My guest kernel is fairly close to yours.  The host kernel is newer.
> > 
> > Host kernel: 4.11.6-201.fc25.x86_64
> > Guest kernel: 4.8.8-300.fc25.x86_64
> > 
> > Command-line:
> > 
> >   qemu-system-x86_64 \
> >       -enable-kvm \
> >       -cpu host \
> >       -machine pc,nvdimm \
> >       -m 1G,slots=4,maxmem=8G \
> >       -object 
> > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
> >       -device nvdimm,id=nvdimm1,memdev=mem1 \
> >       -drive if=virtio,file=test.img,format=raw \
> >       -display none \
> >       -serial stdio \
> >       -monitor unix:/tmp/monitor.sock,server,nowait
> > 
> > Start migration at the guest login prompt.  You don't need to log in or
> > do anything inside the guest.
> > 
> > There seems to be a guest RAM corruption because I get different
> > backtraces inside the guest every time.
> > 
> > The problem goes away if I remove -device nvdimm.
> > 
> 
> I managed to reproduce this bug. After bisect between good v2.8.0 and
> bad edf8bc984, it looks a regression introduced by
>     6b6712efccd "ram: Split dirty bitmap by RAMBlock"
> This commit may result in guest crash after migration if any host
> memory backend is used.
> 
> Could you test whether the attached draft patch fixes this bug? If yes,
> I will make a formal patch later.


Thanks for the fix!  I tested and replied to your v2 patch.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] NVDIMM live migration broken?

Reply via email to