Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Aneesh Kumar K.V

On 5/20/19 8:25 PM, Nicholas Piggin wrote:

Bharata B Rao's on May 21, 2019 12:29 am:

On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:

On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:

Bharata B Rao's on May 20, 2019 3:56 pm:

On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:

git bisect points to

commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
Author: Nicholas Piggin 
Date:   Fri Jul 27 21:48:17 2018 +1000

 powerpc/64s: Fix page table fragment refcount race vs speculative 
references

 The page table fragment allocator uses the main page refcount racily
 with respect to speculative references. A customer observed a BUG due
 to page table page refcount underflow in the fragment allocator. This
 can be caused by the fragment allocator set_page_count stomping on a
 speculative reference, and then the speculative failure handler
 decrements the new reference, and the underflow eventually pops when
 the page tables are freed.

 Fix this by using a dedicated field in the struct page for the page
 table fragment allocator.

 Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
 Cc: sta...@vger.kernel.org # v3.10+


That's the commit that added the BUG_ON(), so prior to that you won't
see the crash.


Right, but the commit says it fixes page table page refcount underflow by
introducing a new field >pt_frag_refcount. Now we are hitting the 
underflow
for this pt_frag_refcount.


The fixed underflow is caused by a bug (race on page count) that got
fixed by that patch. You are hitting a different underflow here. It's
not certain my patch caused it, I'm just trying to reproduce now.


Ok.


Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
4GB guest (via host adding / removing memory device), and it just works.


Boot, add 8G, reboot, remove 8G is the sequence to reproduce.



It's likely to be an edge case like an off by one or rounding error
that just happens to trigger in your config. Might be easiest if you
could test with a debug patch.


Sure, I will continue debugging.


When the guest is rebooted after hotplug, the entire memory (which includes
the hotplugged memory) gets remapped again freshly. However at this time
since no slab is available yet, pt_frag_refcount never gets initialized as we
never do pte_fragment_alloc() for these mappings. So we right away hit the
underflow during the first unplug itself, it looks like.


Nice catch, good debugging work.


I will check how this can be fixed.


Tricky problem. What do you think? You might be able to make the early
page table allocations in the same pattern as the frag allocations, and
then fill in the struct page metadata when you have those.



I guess we need to do something similar to what x86 does. We need to 
walk the init_mm page table again and re-init struct page and other data 
structures backing the tables?


-aneesh



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Bharata B Rao
On Tue, May 21, 2019 at 12:55:49AM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 21, 2019 12:29 am:
> > On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> >> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> >> > Bharata B Rao's on May 20, 2019 3:56 pm:
> >> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > >> >> > git bisect points to
> >> > >> >> >
> >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > >> >> > Author: Nicholas Piggin 
> >> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> > >> >> >
> >> > >> >> > powerpc/64s: Fix page table fragment refcount race vs 
> >> > >> >> > speculative references
> >> > >> >> >
> >> > >> >> > The page table fragment allocator uses the main page 
> >> > >> >> > refcount racily
> >> > >> >> > with respect to speculative references. A customer observed 
> >> > >> >> > a BUG due
> >> > >> >> > to page table page refcount underflow in the fragment 
> >> > >> >> > allocator. This
> >> > >> >> > can be caused by the fragment allocator set_page_count 
> >> > >> >> > stomping on a
> >> > >> >> > speculative reference, and then the speculative failure 
> >> > >> >> > handler
> >> > >> >> > decrements the new reference, and the underflow eventually 
> >> > >> >> > pops when
> >> > >> >> > the page tables are freed.
> >> > >> >> >
> >> > >> >> > Fix this by using a dedicated field in the struct page for 
> >> > >> >> > the page
> >> > >> >> > table fragment allocator.
> >> > >> >> >
> >> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory 
> >> > >> >> > wastage")
> >> > >> >> > Cc: sta...@vger.kernel.org # v3.10+
> >> > >> >> 
> >> > >> >> That's the commit that added the BUG_ON(), so prior to that you 
> >> > >> >> won't
> >> > >> >> see the crash.
> >> > >> > 
> >> > >> > Right, but the commit says it fixes page table page refcount 
> >> > >> > underflow by
> >> > >> > introducing a new field >pt_frag_refcount. Now we are hitting 
> >> > >> > the underflow
> >> > >> > for this pt_frag_refcount.
> >> > >> 
> >> > >> The fixed underflow is caused by a bug (race on page count) that got 
> >> > >> fixed by that patch. You are hitting a different underflow here. It's
> >> > >> not certain my patch caused it, I'm just trying to reproduce now.
> >> > > 
> >> > > Ok.
> >> > 
> >> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> >> > 4GB guest (via host adding / removing memory device), and it just works.
> >> 
> >> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> >> 
> >> > 
> >> > It's likely to be an edge case like an off by one or rounding error
> >> > that just happens to trigger in your config. Might be easiest if you
> >> > could test with a debug patch.
> >> 
> >> Sure, I will continue debugging.
> > 
> > When the guest is rebooted after hotplug, the entire memory (which includes
> > the hotplugged memory) gets remapped again freshly. However at this time
> > since no slab is available yet, pt_frag_refcount never gets initialized as 
> > we
> > never do pte_fragment_alloc() for these mappings. So we right away hit the
> > underflow during the first unplug itself, it looks like.
> 
> Nice catch, good debugging work.

Thanks, with help from Aneesh.

> 
> > I will check how this can be fixed.
> 
> Tricky problem. What do you think? You might be able to make the early 
> page table allocations in the same pattern as the frag allocations, and 
> then fill in the struct page metadata when you have those.

Will explore.

> 
> Other option may be create a new set of page tables after mm comes up
> to replace the early page tables with. That's a bigger hammer though.

Will also check if similar scenario exists on x86 and if so, how and when
pte frag data is fixed there.

Regards,
Bharata.



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Nicholas Piggin
Bharata B Rao's on May 21, 2019 12:29 am:
> On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
>> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
>> > Bharata B Rao's on May 20, 2019 3:56 pm:
>> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> > >> >> > git bisect points to
>> > >> >> >
>> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > >> >> > Author: Nicholas Piggin 
>> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> > >> >> >
>> > >> >> > powerpc/64s: Fix page table fragment refcount race vs 
>> > >> >> > speculative references
>> > >> >> >
>> > >> >> > The page table fragment allocator uses the main page refcount 
>> > >> >> > racily
>> > >> >> > with respect to speculative references. A customer observed a 
>> > >> >> > BUG due
>> > >> >> > to page table page refcount underflow in the fragment 
>> > >> >> > allocator. This
>> > >> >> > can be caused by the fragment allocator set_page_count 
>> > >> >> > stomping on a
>> > >> >> > speculative reference, and then the speculative failure handler
>> > >> >> > decrements the new reference, and the underflow eventually 
>> > >> >> > pops when
>> > >> >> > the page tables are freed.
>> > >> >> >
>> > >> >> > Fix this by using a dedicated field in the struct page for the 
>> > >> >> > page
>> > >> >> > table fragment allocator.
>> > >> >> >
>> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory 
>> > >> >> > wastage")
>> > >> >> > Cc: sta...@vger.kernel.org # v3.10+
>> > >> >> 
>> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> > >> >> see the crash.
>> > >> > 
>> > >> > Right, but the commit says it fixes page table page refcount 
>> > >> > underflow by
>> > >> > introducing a new field >pt_frag_refcount. Now we are hitting 
>> > >> > the underflow
>> > >> > for this pt_frag_refcount.
>> > >> 
>> > >> The fixed underflow is caused by a bug (race on page count) that got 
>> > >> fixed by that patch. You are hitting a different underflow here. It's
>> > >> not certain my patch caused it, I'm just trying to reproduce now.
>> > > 
>> > > Ok.
>> > 
>> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
>> > 4GB guest (via host adding / removing memory device), and it just works.
>> 
>> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
>> 
>> > 
>> > It's likely to be an edge case like an off by one or rounding error
>> > that just happens to trigger in your config. Might be easiest if you
>> > could test with a debug patch.
>> 
>> Sure, I will continue debugging.
> 
> When the guest is rebooted after hotplug, the entire memory (which includes
> the hotplugged memory) gets remapped again freshly. However at this time
> since no slab is available yet, pt_frag_refcount never gets initialized as we
> never do pte_fragment_alloc() for these mappings. So we right away hit the
> underflow during the first unplug itself, it looks like.

Nice catch, good debugging work.

> I will check how this can be fixed.

Tricky problem. What do you think? You might be able to make the early 
page table allocations in the same pattern as the frag allocations, and 
then fill in the struct page metadata when you have those.

Other option may be create a new set of page tables after mm comes up
to replace the early page tables with. That's a bigger hammer though.

Thanks,
Nick



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Bharata B Rao
On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> > Bharata B Rao's on May 20, 2019 3:56 pm:
> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> > >> >> > git bisect points to
> > >> >> >
> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > >> >> > Author: Nicholas Piggin 
> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> > >> >> >
> > >> >> > powerpc/64s: Fix page table fragment refcount race vs 
> > >> >> > speculative references
> > >> >> >
> > >> >> > The page table fragment allocator uses the main page refcount 
> > >> >> > racily
> > >> >> > with respect to speculative references. A customer observed a 
> > >> >> > BUG due
> > >> >> > to page table page refcount underflow in the fragment 
> > >> >> > allocator. This
> > >> >> > can be caused by the fragment allocator set_page_count stomping 
> > >> >> > on a
> > >> >> > speculative reference, and then the speculative failure handler
> > >> >> > decrements the new reference, and the underflow eventually pops 
> > >> >> > when
> > >> >> > the page tables are freed.
> > >> >> >
> > >> >> > Fix this by using a dedicated field in the struct page for the 
> > >> >> > page
> > >> >> > table fragment allocator.
> > >> >> >
> > >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> > >> >> > Cc: sta...@vger.kernel.org # v3.10+
> > >> >> 
> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> > >> >> see the crash.
> > >> > 
> > >> > Right, but the commit says it fixes page table page refcount underflow 
> > >> > by
> > >> > introducing a new field >pt_frag_refcount. Now we are hitting 
> > >> > the underflow
> > >> > for this pt_frag_refcount.
> > >> 
> > >> The fixed underflow is caused by a bug (race on page count) that got 
> > >> fixed by that patch. You are hitting a different underflow here. It's
> > >> not certain my patch caused it, I'm just trying to reproduce now.
> > > 
> > > Ok.
> > 
> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> > 4GB guest (via host adding / removing memory device), and it just works.
> 
> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> 
> > 
> > It's likely to be an edge case like an off by one or rounding error
> > that just happens to trigger in your config. Might be easiest if you
> > could test with a debug patch.
> 
> Sure, I will continue debugging.

When the guest is rebooted after hotplug, the entire memory (which includes
the hotplugged memory) gets remapped again freshly. However at this time
since no slab is available yet, pt_frag_refcount never gets initialized as we
never do pte_fragment_alloc() for these mappings. So we right away hit the
underflow during the first unplug itself, it looks like.

I will check how this can be fixed.

> 
> Regards,
> Bharata.



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Bharata B Rao
On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 20, 2019 3:56 pm:
> > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> >> > git bisect points to
> >> >> >
> >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> >> > Author: Nicholas Piggin 
> >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >> >
> >> >> > powerpc/64s: Fix page table fragment refcount race vs speculative 
> >> >> > references
> >> >> >
> >> >> > The page table fragment allocator uses the main page refcount 
> >> >> > racily
> >> >> > with respect to speculative references. A customer observed a BUG 
> >> >> > due
> >> >> > to page table page refcount underflow in the fragment allocator. 
> >> >> > This
> >> >> > can be caused by the fragment allocator set_page_count stomping 
> >> >> > on a
> >> >> > speculative reference, and then the speculative failure handler
> >> >> > decrements the new reference, and the underflow eventually pops 
> >> >> > when
> >> >> > the page tables are freed.
> >> >> >
> >> >> > Fix this by using a dedicated field in the struct page for the 
> >> >> > page
> >> >> > table fragment allocator.
> >> >> >
> >> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >> > Cc: sta...@vger.kernel.org # v3.10+
> >> >> 
> >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> >> see the crash.
> >> > 
> >> > Right, but the commit says it fixes page table page refcount underflow by
> >> > introducing a new field >pt_frag_refcount. Now we are hitting the 
> >> > underflow
> >> > for this pt_frag_refcount.
> >> 
> >> The fixed underflow is caused by a bug (race on page count) that got 
> >> fixed by that patch. You are hitting a different underflow here. It's
> >> not certain my patch caused it, I'm just trying to reproduce now.
> > 
> > Ok.
> 
> Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> 4GB guest (via host adding / removing memory device), and it just works.

Boot, add 8G, reboot, remove 8G is the sequence to reproduce.

> 
> It's likely to be an edge case like an off by one or rounding error
> that just happens to trigger in your config. Might be easiest if you
> could test with a debug patch.

Sure, I will continue debugging.

Regards,
Bharata.



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-20 Thread Nicholas Piggin
Bharata B Rao's on May 20, 2019 3:56 pm:
> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> >> > git bisect points to
>> >> >
>> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> >> > Author: Nicholas Piggin 
>> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >> >
>> >> > powerpc/64s: Fix page table fragment refcount race vs speculative 
>> >> > references
>> >> >
>> >> > The page table fragment allocator uses the main page refcount racily
>> >> > with respect to speculative references. A customer observed a BUG 
>> >> > due
>> >> > to page table page refcount underflow in the fragment allocator. 
>> >> > This
>> >> > can be caused by the fragment allocator set_page_count stomping on a
>> >> > speculative reference, and then the speculative failure handler
>> >> > decrements the new reference, and the underflow eventually pops when
>> >> > the page tables are freed.
>> >> >
>> >> > Fix this by using a dedicated field in the struct page for the page
>> >> > table fragment allocator.
>> >> >
>> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >> > Cc: sta...@vger.kernel.org # v3.10+
>> >> 
>> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> >> see the crash.
>> > 
>> > Right, but the commit says it fixes page table page refcount underflow by
>> > introducing a new field >pt_frag_refcount. Now we are hitting the 
>> > underflow
>> > for this pt_frag_refcount.
>> 
>> The fixed underflow is caused by a bug (race on page count) that got 
>> fixed by that patch. You are hitting a different underflow here. It's
>> not certain my patch caused it, I'm just trying to reproduce now.
> 
> Ok.

Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
4GB guest (via host adding / removing memory device), and it just works.

It's likely to be an edge case like an off by one or rounding error
that just happens to trigger in your config. Might be easiest if you
could test with a debug patch.

Thanks,
Nick



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-19 Thread Bharata B Rao
On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > git bisect points to
> >> >
> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > Author: Nicholas Piggin 
> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >
> >> > powerpc/64s: Fix page table fragment refcount race vs speculative 
> >> > references
> >> >
> >> > The page table fragment allocator uses the main page refcount racily
> >> > with respect to speculative references. A customer observed a BUG due
> >> > to page table page refcount underflow in the fragment allocator. This
> >> > can be caused by the fragment allocator set_page_count stomping on a
> >> > speculative reference, and then the speculative failure handler
> >> > decrements the new reference, and the underflow eventually pops when
> >> > the page tables are freed.
> >> >
> >> > Fix this by using a dedicated field in the struct page for the page
> >> > table fragment allocator.
> >> >
> >> > Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> > Cc: sta...@vger.kernel.org # v3.10+
> >> 
> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> see the crash.
> > 
> > Right, but the commit says it fixes page table page refcount underflow by
> > introducing a new field >pt_frag_refcount. Now we are hitting the 
> > underflow
> > for this pt_frag_refcount.
> 
> The fixed underflow is caused by a bug (race on page count) that got 
> fixed by that patch. You are hitting a different underflow here. It's
> not certain my patch caused it, I'm just trying to reproduce now.

Ok.

> 
> > 
> > BTW, if I go below this commit, I don't hit the pagecount
> > 
> > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> > 
> > which is in pte_fragment_free() path.
> 
> Do you have CONFIG_DEBUG_VM=y?

Yes.

Regards,
Bharata.



Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-19 Thread Nicholas Piggin
Bharata B Rao's on May 20, 2019 2:25 pm:
> On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
>> Bharata B Rao  writes:
>> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> >> Hello,
>> >> 
>> >> On power9 host, performing memory hotunplug from ppc64le guest results in
>> >> kernel oops.
>> >> 
>> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> >> 
>> >> Recreation steps:
>> >> 
>> >> 1. Boot a guest with below mem configuration:
>> >>   33554432
>> >>   8388608
>> >>   4194304
>> >>   
>> >>     
>> >>   
>> >>     
>> >>   
>> >> 
>> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
>> >> now
>> >> reboot guest -> once guest comes back try to unplug 8G memory
>> >> 
>> >> mem.xml used:
>> >> 
>> >> 
>> >> 8
>> >> 0
>> >> 
>> >> 
>> >> 
>> >> Memory attach and detach commands used:
>> >>     virsh attach-device vm1 ./mem.xml --live
>> >>     virsh detach-device vm1 ./mem.xml --live
>> >> 
>> >> Trace seen inside guest after unplug, guest just hangs there forever:
>> >> 
>> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> >> pSeries
>> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
>> >> scsi_transport_iscsi
>> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress 
>> >> lzo_compress
>> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> >> xor raid6_pq multipath crc32c_vpmsum
>> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> >> tainted 5.1.0-dirty #2
>> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> >> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR:
>> >> 8000
>> >> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted 
>> >> (5.1.0-dirty)
>> >> [   21.963422] MSR:  8282b033   
>> >> CR:
>> >> 28002884  XER: 2004
>> >> [   21.963470] CFAR: c0c79304 IRQMASK: 0
>> >> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000
>> >> 00fff8c0
>> >> [   21.963470] GPR04: 0001 ffe30005 0005
>> >> 0020
>> >> [   21.963470] GPR08:  0001 c00a00fff8e0
>> >> c16d21a0
>> >> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0
>> >> c003ffe30100
>> >> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f
>> >> c16d21b0
>> >> [   21.963470] GPR20: c14de588 0001 c16d21b8
>> >> c00a00a0
>> >> [   21.963470] GPR24:   c00a00a0
>> >> c003ffe96000
>> >> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000
>> >> c00a00fff8c0
>> >> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
>> >> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
>> >> [   21.963873] Call Trace:
>> >> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0
>> >> (unreliable)
>> >> [   21.963933] [c003f88037b0] [] (null)
>> >> [   21.963969] [c003f88038c0] [c006f038]
>> >> vmemmap_free+0x218/0x2e0
>> >> [   21.964006] [c003f8803940] [c036f100]
>> >> sparse_remove_one_section+0xd0/0x138
>> >> [   21.964050] [c003f8803980] [c0383a50]
>> >> __remove_pages+0x410/0x560
>> >> [   21.964093] [c003f8803a90] [c0c784d8]
>> >> arch_remove_memory+0x68/0xdc
>> >> [   21.964136] [c003f8803ad0] [c0385d74]
>> >> __remove_memory+0xc4/0x110
>> >> [   21.964180] [c003f8803b10] [c00d44e4]
>> >> dlpar_remove_lmb+0x94/0x140
>> >> [   21.964223] [c003f8803b50] [c00d52b4]
>> >> dlpar_memory+0x464/0xd00
>> >> [   21.964259] [c003f8803be0] [c00cd5c0]
>> >> handle_dlpar_errorlog+0xc0/0x190
>> >> [   21.964303] [c003f8803c50] [c00cd6bc]
>> >> pseries_hp_work_fn+0x2c/0x60
>> >> [   21.964346] [c003f8803c80] [c013a4a0]
>> >> process_one_work+0x2b0/0x5a0
>> >> [   21.964388] [c003f8803d10] [c013a818]
>> >> worker_thread+0x88/0x610
>> >> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
>> >> [   21.964468] [c003f8803e20] [c000bdc4]
>> >> ret_from_kernel_thread+0x5c/0x78
>> >> [   21.964506] Instruction dump:
>> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14
>> >> 395f0020 813f0020
>> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac
>> >> 7d205028 3129
>> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> >> [   21.966349]
>> >> [   21.966383] Sending IPI to other CPUs
>> >> [   

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-19 Thread Bharata B Rao
On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
> Bharata B Rao  writes:
> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> >> Hello,
> >> 
> >> On power9 host, performing memory hotunplug from ppc64le guest results in
> >> kernel oops.
> >> 
> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> >> 
> >> Recreation steps:
> >> 
> >> 1. Boot a guest with below mem configuration:
> >>   33554432
> >>   8388608
> >>   4194304
> >>   
> >>     
> >>   
> >>     
> >>   
> >> 
> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> >> reboot guest -> once guest comes back try to unplug 8G memory
> >> 
> >> mem.xml used:
> >> 
> >> 
> >> 8
> >> 0
> >> 
> >> 
> >> 
> >> Memory attach and detach commands used:
> >>     virsh attach-device vm1 ./mem.xml --live
> >>     virsh detach-device vm1 ./mem.xml --live
> >> 
> >> Trace seen inside guest after unplug, guest just hangs there forever:
> >> 
> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> >> pSeries
> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
> >> scsi_transport_iscsi
> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> >> xor raid6_pq multipath crc32c_vpmsum
> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> >> tainted 5.1.0-dirty #2
> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> >> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR:
> >> 8000
> >> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted 
> >> (5.1.0-dirty)
> >> [   21.963422] MSR:  8282b033   
> >> CR:
> >> 28002884  XER: 2004
> >> [   21.963470] CFAR: c0c79304 IRQMASK: 0
> >> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000
> >> 00fff8c0
> >> [   21.963470] GPR04: 0001 ffe30005 0005
> >> 0020
> >> [   21.963470] GPR08:  0001 c00a00fff8e0
> >> c16d21a0
> >> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0
> >> c003ffe30100
> >> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f
> >> c16d21b0
> >> [   21.963470] GPR20: c14de588 0001 c16d21b8
> >> c00a00a0
> >> [   21.963470] GPR24:   c00a00a0
> >> c003ffe96000
> >> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000
> >> c00a00fff8c0
> >> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
> >> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
> >> [   21.963873] Call Trace:
> >> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0
> >> (unreliable)
> >> [   21.963933] [c003f88037b0] [] (null)
> >> [   21.963969] [c003f88038c0] [c006f038]
> >> vmemmap_free+0x218/0x2e0
> >> [   21.964006] [c003f8803940] [c036f100]
> >> sparse_remove_one_section+0xd0/0x138
> >> [   21.964050] [c003f8803980] [c0383a50]
> >> __remove_pages+0x410/0x560
> >> [   21.964093] [c003f8803a90] [c0c784d8]
> >> arch_remove_memory+0x68/0xdc
> >> [   21.964136] [c003f8803ad0] [c0385d74]
> >> __remove_memory+0xc4/0x110
> >> [   21.964180] [c003f8803b10] [c00d44e4]
> >> dlpar_remove_lmb+0x94/0x140
> >> [   21.964223] [c003f8803b50] [c00d52b4]
> >> dlpar_memory+0x464/0xd00
> >> [   21.964259] [c003f8803be0] [c00cd5c0]
> >> handle_dlpar_errorlog+0xc0/0x190
> >> [   21.964303] [c003f8803c50] [c00cd6bc]
> >> pseries_hp_work_fn+0x2c/0x60
> >> [   21.964346] [c003f8803c80] [c013a4a0]
> >> process_one_work+0x2b0/0x5a0
> >> [   21.964388] [c003f8803d10] [c013a818]
> >> worker_thread+0x88/0x610
> >> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
> >> [   21.964468] [c003f8803e20] [c000bdc4]
> >> ret_from_kernel_thread+0x5c/0x78
> >> [   21.964506] Instruction dump:
> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14
> >> 395f0020 813f0020
> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac
> >> 7d205028 3129
> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> >> [   21.966349]
> >> [   21.966383] Sending IPI to other CPUs
> >> [   21.978335] IPI complete
> >> [   21.981354] kexec: Starting switchover sequence.
> >> I'm in purgatory
> >
> > git bisect points to
> >
> > commit 

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-19 Thread Michael Ellerman
Bharata B Rao  writes:
> On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> Hello,
>> 
>> On power9 host, performing memory hotunplug from ppc64le guest results in
>> kernel oops.
>> 
>> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> 
>> Recreation steps:
>> 
>> 1. Boot a guest with below mem configuration:
>>   33554432
>>   8388608
>>   4194304
>>   
>>     
>>   
>>     
>>   
>> 
>> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> reboot guest -> once guest comes back try to unplug 8G memory
>> 
>> mem.xml used:
>> 
>> 
>> 8
>> 0
>> 
>> 
>> 
>> Memory attach and detach commands used:
>>     virsh attach-device vm1 ./mem.xml --live
>>     virsh detach-device vm1 ./mem.xml --live
>> 
>> Trace seen inside guest after unplug, guest just hangs there forever:
>> 
>> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> pSeries
>> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq multipath crc32c_vpmsum
>> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> tainted 5.1.0-dirty #2
>> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR:
>> 8000
>> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> [   21.963422] MSR:  8282b033   CR:
>> 28002884  XER: 2004
>> [   21.963470] CFAR: c0c79304 IRQMASK: 0
>> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000
>> 00fff8c0
>> [   21.963470] GPR04: 0001 ffe30005 0005
>> 0020
>> [   21.963470] GPR08:  0001 c00a00fff8e0
>> c16d21a0
>> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0
>> c003ffe30100
>> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f
>> c16d21b0
>> [   21.963470] GPR20: c14de588 0001 c16d21b8
>> c00a00a0
>> [   21.963470] GPR24:   c00a00a0
>> c003ffe96000
>> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000
>> c00a00fff8c0
>> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
>> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
>> [   21.963873] Call Trace:
>> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0
>> (unreliable)
>> [   21.963933] [c003f88037b0] [] (null)
>> [   21.963969] [c003f88038c0] [c006f038]
>> vmemmap_free+0x218/0x2e0
>> [   21.964006] [c003f8803940] [c036f100]
>> sparse_remove_one_section+0xd0/0x138
>> [   21.964050] [c003f8803980] [c0383a50]
>> __remove_pages+0x410/0x560
>> [   21.964093] [c003f8803a90] [c0c784d8]
>> arch_remove_memory+0x68/0xdc
>> [   21.964136] [c003f8803ad0] [c0385d74]
>> __remove_memory+0xc4/0x110
>> [   21.964180] [c003f8803b10] [c00d44e4]
>> dlpar_remove_lmb+0x94/0x140
>> [   21.964223] [c003f8803b50] [c00d52b4]
>> dlpar_memory+0x464/0xd00
>> [   21.964259] [c003f8803be0] [c00cd5c0]
>> handle_dlpar_errorlog+0xc0/0x190
>> [   21.964303] [c003f8803c50] [c00cd6bc]
>> pseries_hp_work_fn+0x2c/0x60
>> [   21.964346] [c003f8803c80] [c013a4a0]
>> process_one_work+0x2b0/0x5a0
>> [   21.964388] [c003f8803d10] [c013a818]
>> worker_thread+0x88/0x610
>> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
>> [   21.964468] [c003f8803e20] [c000bdc4]
>> ret_from_kernel_thread+0x5c/0x78
>> [   21.964506] Instruction dump:
>> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14
>> 395f0020 813f0020
>> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac
>> 7d205028 3129
>> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> [   21.966349]
>> [   21.966383] Sending IPI to other CPUs
>> [   21.978335] IPI complete
>> [   21.981354] kexec: Starting switchover sequence.
>> I'm in purgatory
>
> git bisect points to
>
> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> Author: Nicholas Piggin 
> Date:   Fri Jul 27 21:48:17 2018 +1000
>
> powerpc/64s: Fix page table fragment refcount race vs speculative 
> references
>
> The page table fragment allocator uses the main page refcount racily
> with respect to speculative references. A customer observed a BUG due

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-18 Thread Bharata B Rao
On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> Hello,
> 
> On power9 host, performing memory hotunplug from ppc64le guest results in
> kernel oops.
> 
> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> 
> Recreation steps:
> 
> 1. Boot a guest with below mem configuration:
>   33554432
>   8388608
>   4194304
>   
>     
>   
>     
>   
> 
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> reboot guest -> once guest comes back try to unplug 8G memory
> 
> mem.xml used:
> 
> 
> 8
> 0
> 
> 
> 
> Memory attach and detach commands used:
>     virsh attach-device vm1 ./mem.xml --live
>     virsh detach-device vm1 ./mem.xml --live
> 
> Trace seen inside guest after unplug, guest just hangs there forever:
> 
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR:
> 8000
> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  8282b033   CR:
> 28002884  XER: 2004
> [   21.963470] CFAR: c0c79304 IRQMASK: 0
> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000
> 00fff8c0
> [   21.963470] GPR04: 0001 ffe30005 0005
> 0020
> [   21.963470] GPR08:  0001 c00a00fff8e0
> c16d21a0
> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0
> c003ffe30100
> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f
> c16d21b0
> [   21.963470] GPR20: c14de588 0001 c16d21b8
> c00a00a0
> [   21.963470] GPR24:   c00a00a0
> c003ffe96000
> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000
> c00a00fff8c0
> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0
> (unreliable)
> [   21.963933] [c003f88037b0] [] (null)
> [   21.963969] [c003f88038c0] [c006f038]
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c003f8803940] [c036f100]
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c003f8803980] [c0383a50]
> __remove_pages+0x410/0x560
> [   21.964093] [c003f8803a90] [c0c784d8]
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c003f8803ad0] [c0385d74]
> __remove_memory+0xc4/0x110
> [   21.964180] [c003f8803b10] [c00d44e4]
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c003f8803b50] [c00d52b4]
> dlpar_memory+0x464/0xd00
> [   21.964259] [c003f8803be0] [c00cd5c0]
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c003f8803c50] [c00cd6bc]
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c003f8803c80] [c013a4a0]
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c003f8803d10] [c013a818]
> worker_thread+0x88/0x610
> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
> [   21.964468] [c003f8803e20] [c000bdc4]
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac
> 7d205028 3129
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

git bisect points to

commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
Author: Nicholas Piggin 
Date:   Fri Jul 27 21:48:17 2018 +1000

powerpc/64s: Fix page table fragment refcount race vs speculative references

The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a

Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-17 Thread Michael Ellerman
srikanth  writes:
> Hello,
>
> On power9 host, performing memory hotunplug from ppc64le guest results 
> in kernel oops.

Thanks for the report.

Did this used to work in the past? If so what is the last version that
worked?

> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>
> Recreation steps:
>
> 1. Boot a guest with below mem configuration:
>    33554432
>    8388608
>    4194304
>    
>      
>    
>      
>    
>
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
> now reboot guest -> once guest comes back try to unplug 8G memory

I assume the reboot is required to trigger the bug? ie. if you unplug
without rebooting it doesn't crash?

> mem.xml used:
> 
> 
> 8
> 0
> 
> 
>
> Memory attach and detach commands used:
>      virsh attach-device vm1 ./mem.xml --live
>      virsh detach-device vm1 ./mem.xml --live
>
> Trace seen inside guest after unplug, guest just hangs there forever:
>
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
> scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
> zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
> async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c0079e18 LR: c0c79308 CTR: 
> 8000
> [   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  8282b033   
> CR: 28002884  XER: 2004
> [   21.963470] CFAR: c0c79304 IRQMASK: 0
> [   21.963470] GPR00: c0c79308 c003f8803780 c1521000 
> 00fff8c0

Can you try not to word wrap these, it makes them much harder to read.

There's some instructions here on configuring Thunderbird:
  
https://www.kernel.org/doc/html/latest/process/email-clients.html#thunderbird-gui

> [   21.963470] GPR04: 0001 ffe30005 0005 
> 0020
> [   21.963470] GPR08:  0001 c00a00fff8e0 
> c16d21a0
> [   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 
> c003ffe30100
> [   21.963470] GPR16: c003ffe3 c14aa4de c00a009f 
> c16d21b0
> [   21.963470] GPR20: c14de588 0001 c16d21b8 
> c00a00a0
> [   21.963470] GPR24:   c00a00a0 
> c003ffe96000
> [   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 
> c00a00fff8c0
> [   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 
> (unreliable)
> [   21.963933] [c003f88037b0] [] (null)
> [   21.963969] [c003f88038c0] [c006f038] 
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c003f8803940] [c036f100] 
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c003f8803980] [c0383a50] 
> __remove_pages+0x410/0x560
> [   21.964093] [c003f8803a90] [c0c784d8] 
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c003f8803ad0] [c0385d74] 
> __remove_memory+0xc4/0x110
> [   21.964180] [c003f8803b10] [c00d44e4] 
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c003f8803b50] [c00d52b4] 
> dlpar_memory+0x464/0xd00
> [   21.964259] [c003f8803be0] [c00cd5c0] 
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c003f8803c50] [c00cd6bc] 
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c003f8803c80] [c013a4a0] 
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c003f8803d10] [c013a818] 
> worker_thread+0x88/0x610
> [   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
> [   21.964468] [c003f8803e20] [c000bdc4] 
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac 
> 7d205028 3129
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

It's not hung here, it's just not executing what we want it to :)

If you break into the qemu monitor 

PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest

2019-05-16 Thread srikanth

Hello,

On power9 host, performing memory hotunplug from ppc64le guest results 
in kernel oops.


Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.


Recreation steps:

1. Boot a guest with below mem configuration:
  33554432
  8388608
  4194304
  
    
  
    
  

2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
now reboot guest -> once guest comes back try to unplug 8G memory


mem.xml used:


8
0



Memory attach and detach commands used:
    virsh attach-device vm1 ./mem.xml --live
    virsh detach-device vm1 ./mem.xml --live

Trace seen inside guest after unplug, guest just hangs there forever:

[   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
[   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
[   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
pSeries
[   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
[   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
tainted 5.1.0-dirty #2

[   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
[   21.963355] NIP:  c0079e18 LR: c0c79308 CTR: 
8000

[   21.963392] REGS: c003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
[   21.963422] MSR:  8282b033   
CR: 28002884  XER: 2004

[   21.963470] CFAR: c0c79304 IRQMASK: 0
[   21.963470] GPR00: c0c79308 c003f8803780 c1521000 
00fff8c0
[   21.963470] GPR04: 0001 ffe30005 0005 
0020
[   21.963470] GPR08:  0001 c00a00fff8e0 
c16d21a0
[   21.963470] GPR12: c16e7b90 c7ff2700 c00a00a0 
c003ffe30100
[   21.963470] GPR16: c003ffe3 c14aa4de c00a009f 
c16d21b0
[   21.963470] GPR20: c14de588 0001 c16d21b8 
c00a00a0
[   21.963470] GPR24:   c00a00a0 
c003ffe96000
[   21.963470] GPR28: c00a00a0 c00a00a0 c003fffec000 
c00a00fff8c0

[   21.963802] NIP [c0079e18] pte_fragment_free+0x48/0xd0
[   21.963838] LR [c0c79308] remove_pagetable+0x49c/0x5b4
[   21.963873] Call Trace:
[   21.963890] [c003f8803780] [c003ffe997f0] 0xc003ffe997f0 
(unreliable)

[   21.963933] [c003f88037b0] [] (null)
[   21.963969] [c003f88038c0] [c006f038] 
vmemmap_free+0x218/0x2e0
[   21.964006] [c003f8803940] [c036f100] 
sparse_remove_one_section+0xd0/0x138
[   21.964050] [c003f8803980] [c0383a50] 
__remove_pages+0x410/0x560
[   21.964093] [c003f8803a90] [c0c784d8] 
arch_remove_memory+0x68/0xdc
[   21.964136] [c003f8803ad0] [c0385d74] 
__remove_memory+0xc4/0x110
[   21.964180] [c003f8803b10] [c00d44e4] 
dlpar_remove_lmb+0x94/0x140
[   21.964223] [c003f8803b50] [c00d52b4] 
dlpar_memory+0x464/0xd00
[   21.964259] [c003f8803be0] [c00cd5c0] 
handle_dlpar_errorlog+0xc0/0x190
[   21.964303] [c003f8803c50] [c00cd6bc] 
pseries_hp_work_fn+0x2c/0x60
[   21.964346] [c003f8803c80] [c013a4a0] 
process_one_work+0x2b0/0x5a0
[   21.964388] [c003f8803d10] [c013a818] 
worker_thread+0x88/0x610

[   21.964434] [c003f8803db0] [c0143884] kthread+0x1a4/0x1b0
[   21.964468] [c003f8803e20] [c000bdc4] 
ret_from_kernel_thread+0x5c/0x78

[   21.964506] Instruction dump:
[   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe9 7fff1a14 
395f0020 813f0020
[   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b09> 7c0004ac 
7d205028 3129

[   21.964613] ---[ end trace aaa571aa1636fee6 ]---
[   21.966349]
[   21.966383] Sending IPI to other CPUs
[   21.978335] IPI complete
[   21.981354] kexec: Starting switchover sequence.
I'm in purgatory