xics: fix migration of older machine types

Nikunj A Dadhania Fri, 16 Jun 2017 03:54:52 -0700

Nikunj A Dadhania <nik...@linux.vnet.ibm.com> writes:

> Greg Kurz <gr...@kaod.org> writes:
>
>> On Sun, 11 Jun 2017 17:38:42 +0800
>> David Gibson <da...@gibson.dropbear.id.au> wrote:
>>
>>> On Fri, Jun 09, 2017 at 05:09:13PM +0200, Greg Kurz wrote:
>>> > On Fri, 9 Jun 2017 20:28:32 +1000
>>> > David Gibson <da...@gibson.dropbear.id.au> wrote:
>>> >   
>>> > > On Fri, Jun 09, 2017 at 11:36:31AM +0200, Greg Kurz wrote:  
>>> > > > On Fri, 9 Jun 2017 12:28:13 +1000
>>> > > > David Gibson <da...@gibson.dropbear.id.au> wrote:
>>> > > >     
>>> > 1) start guest
>>> > 
>>> > qemu-system-ppc64 \
>>> >  -nodefaults -nographic -snapshot -no-shutdown -serial mon:stdio \
>>> >  -device virtio-net,netdev=netdev0,id=net0 \
>>> >  -netdev 
>>> > bridge,id=netdev0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \
>>> >  -device virtio-blk,drive=drive0,id=blk0 \
>>> >  -drive file=/home/greg/images/sle12-sp1-ppc64le.qcow2,id=drive0,if=none \
>>> >  -machine type=pseries,accel=tcg -cpu POWER8
>
> Strangely, your command line does not have multiple threads. Need to see
> what is the side effect of enabling MTTCG by default here.
>
>>> > 
>>> > 2) migrate
>>> > 
>>> > 3) destination crashes (immediately or after very short delay) or
>>> > hangs  
>>> 
>>> Ok.  I'll bisect it when I can, but you might well get to it first.
>>> 
>>> 
>>
>> Heh, maybe you didn't see in my mail but I did bisect:
>>
>> f0b0685d6694a28c66018f438e822596243b1250 is the first bad commit
>> commit f0b0685d6694a28c66018f438e822596243b1250
>> Author: Nikunj A Dadhania <nik...@linux.vnet.ibm.com>
>> Date:   Thu Apr 27 10:48:23 2017 +0530
>>
>>     tcg: enable MTTCG by default for PPC64 on x86
>
> Let me have a look at it.


Interesting problem here, I see that when the migration is completed on
source and there is a crash on destination:

[   56.185314] Unable to handle kernel paging request for data at address 
0x5deadbeef0000108
[   56.185401] Faulting instruction address: 0xc000000000277bc8

   0xc000000000277bb8 <+168>:   ld      r7,8(r4)
   0xc000000000277bbc <+172>:   ld      r6,0(r4)                  <========
   0xc000000000277bc0 <+176>:   ori     r8,r8,56302
   0xc000000000277bc4 <+180>:   rldicr  r8,r8,32,31
   0xc000000000277bc8 <+184>:   std     r7,8(r6)

r4 = 0xf0000000000107a0
r6 = 0x5deadbeef0000100

Code at 0xc000000000277bbc <+172>, gave junk value in r6, that leads to
the guest crash. When I inspect the memory on source and destination in
qemu monitor, I get the following differences:

diff -u s.txt d.txt 
--- s.txt       2017-06-16 10:34:39.657221125 +0530
+++ d.txt       2017-06-16 10:34:18.452238305 +0530
@@ -8,8 +8,8 @@
 f000000000010760: 0x20de0b00 0x000000f0 0x60040100 0x000000f0
 f000000000010770: 0x00000000 0x00000000 0x0004036d 0x000000c0
 f000000000010780: 0x6c000100 0xf8ff3f00 0x7817f977 0x000000c0
-f000000000010790: 0x15000000 0x00000000 0xffffffff 0x01000000
-f0000000000107a0: 0x3090a96d 0x000000c0 0x3090a96d 0x000000c0
+f000000000010790: 0x01000000 0x00000000 0xffffffff 0x01000000
+f0000000000107a0: 0x000100f0 0xeedbea5d 0x000200f0 0xeedbea5d
 f0000000000107b0: 0x00000000 0x00000000 0x00d0a96d 0x000000c0
 f0000000000107c0: 0x28000000 0xf8ff3f00 0x8852cc77 0x000000c0
 f0000000000107d0: 0x00000000 0x00000000 0xffffffff 0x01000000

Source had a valid address at 0xf0000000000107a0, while garbage on the
destination.

Some observations:

* Source updates the memory location (probably atomic_cmpxchg), but the
  updated page didnt get transferred to the destination
  
* Getting rid of atomic_cmpxchg tcg ops in ldarx/stdcx, makes migration
  work fine. MTTCG running with 1 cpu.

While I continue debugging, any hints would help.

Regards
Nikunj

Re: [Qemu-devel] [PATCH v4 0/6] spapr/xics: fix migration of older machine types

Reply via email to