Nikunj A Dadhania <nik...@linux.vnet.ibm.com> writes: > Greg Kurz <gr...@kaod.org> writes: > >> On Sun, 11 Jun 2017 17:38:42 +0800 >> David Gibson <da...@gibson.dropbear.id.au> wrote: >> >>> On Fri, Jun 09, 2017 at 05:09:13PM +0200, Greg Kurz wrote: >>> > On Fri, 9 Jun 2017 20:28:32 +1000 >>> > David Gibson <da...@gibson.dropbear.id.au> wrote: >>> > >>> > > On Fri, Jun 09, 2017 at 11:36:31AM +0200, Greg Kurz wrote: >>> > > > On Fri, 9 Jun 2017 12:28:13 +1000 >>> > > > David Gibson <da...@gibson.dropbear.id.au> wrote: >>> > > > >>> > 1) start guest >>> > >>> > qemu-system-ppc64 \ >>> > -nodefaults -nographic -snapshot -no-shutdown -serial mon:stdio \ >>> > -device virtio-net,netdev=netdev0,id=net0 \ >>> > -netdev >>> > bridge,id=netdev0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \ >>> > -device virtio-blk,drive=drive0,id=blk0 \ >>> > -drive file=/home/greg/images/sle12-sp1-ppc64le.qcow2,id=drive0,if=none \ >>> > -machine type=pseries,accel=tcg -cpu POWER8 > > Strangely, your command line does not have multiple threads. Need to see > what is the side effect of enabling MTTCG by default here. > >>> > >>> > 2) migrate >>> > >>> > 3) destination crashes (immediately or after very short delay) or >>> > hangs >>> >>> Ok. I'll bisect it when I can, but you might well get to it first. >>> >>> >> >> Heh, maybe you didn't see in my mail but I did bisect: >> >> f0b0685d6694a28c66018f438e822596243b1250 is the first bad commit >> commit f0b0685d6694a28c66018f438e822596243b1250 >> Author: Nikunj A Dadhania <nik...@linux.vnet.ibm.com> >> Date: Thu Apr 27 10:48:23 2017 +0530 >> >> tcg: enable MTTCG by default for PPC64 on x86 > > Let me have a look at it.
Interesting problem here, I see that when the migration is completed on source and there is a crash on destination: [ 56.185314] Unable to handle kernel paging request for data at address 0x5deadbeef0000108 [ 56.185401] Faulting instruction address: 0xc000000000277bc8 0xc000000000277bb8 <+168>: ld r7,8(r4) 0xc000000000277bbc <+172>: ld r6,0(r4) <======== 0xc000000000277bc0 <+176>: ori r8,r8,56302 0xc000000000277bc4 <+180>: rldicr r8,r8,32,31 0xc000000000277bc8 <+184>: std r7,8(r6) r4 = 0xf0000000000107a0 r6 = 0x5deadbeef0000100 Code at 0xc000000000277bbc <+172>, gave junk value in r6, that leads to the guest crash. When I inspect the memory on source and destination in qemu monitor, I get the following differences: diff -u s.txt d.txt --- s.txt 2017-06-16 10:34:39.657221125 +0530 +++ d.txt 2017-06-16 10:34:18.452238305 +0530 @@ -8,8 +8,8 @@ f000000000010760: 0x20de0b00 0x000000f0 0x60040100 0x000000f0 f000000000010770: 0x00000000 0x00000000 0x0004036d 0x000000c0 f000000000010780: 0x6c000100 0xf8ff3f00 0x7817f977 0x000000c0 -f000000000010790: 0x15000000 0x00000000 0xffffffff 0x01000000 -f0000000000107a0: 0x3090a96d 0x000000c0 0x3090a96d 0x000000c0 +f000000000010790: 0x01000000 0x00000000 0xffffffff 0x01000000 +f0000000000107a0: 0x000100f0 0xeedbea5d 0x000200f0 0xeedbea5d f0000000000107b0: 0x00000000 0x00000000 0x00d0a96d 0x000000c0 f0000000000107c0: 0x28000000 0xf8ff3f00 0x8852cc77 0x000000c0 f0000000000107d0: 0x00000000 0x00000000 0xffffffff 0x01000000 Source had a valid address at 0xf0000000000107a0, while garbage on the destination. Some observations: * Source updates the memory location (probably atomic_cmpxchg), but the updated page didnt get transferred to the destination * Getting rid of atomic_cmpxchg tcg ops in ldarx/stdcx, makes migration work fine. MTTCG running with 1 cpu. While I continue debugging, any hints would help. Regards Nikunj