Re: [5.6.0-rc7] Kernel crash while running ndctl tests

2020-03-24 Thread Baoquan He
On 03/24/20 at 03:06pm, Sachin Sant wrote:
> 
> 
> > On 24-Mar-2020, at 2:45 PM, Aneesh Kumar K.V  
> > wrote:
> > 
> > Sachin Sant  writes:
> > 
> >> While running ndctl[1] tests against 5.6.0-rc7 following crash is 
> >> encountered.
> >> 
> >> Bisect leads me to  commit d41e2f3bd546 
> >> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
> >> 
> >> Reverting this commit helps and the tests complete without any crash.
> > 
> > 
> > Can you try this change?
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aadb7298dcef..3012d1f3771a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, 
> > unsigned long nr_pages,
> > ms->usage = NULL;
> > }
> > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > +   /* Mark the section invalid */
> > +   ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > }
> > 
> > if (section_is_early && memmap)
> > 
> 
> This patch works for me. The test ran successfully without any crash/failure.

Hi Aneesh,

Could you make a formal patch to post, since Sachin has tested and
confirmed it works?

> 
> Thanks
> -Sachin
> 
> > a pfn_valid check involves pnf_section_valid() check if section is
> > having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
> > So when we do that tupdate the section to not have MEM_MAP.
> > 
> > -aneesh
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.6.0-rc7] Kernel crash while running ndctl tests

2020-03-24 Thread Sachin Sant



> On 24-Mar-2020, at 2:45 PM, Aneesh Kumar K.V  
> wrote:
> 
> Sachin Sant  writes:
> 
>> While running ndctl[1] tests against 5.6.0-rc7 following crash is 
>> encountered.
>> 
>> Bisect leads me to  commit d41e2f3bd546 
>> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>> 
>> Reverting this commit helps and the tests complete without any crash.
> 
> 
> Can you try this change?
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, 
> unsigned long nr_pages,
>   ms->usage = NULL;
>   }
>   memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> + /* Mark the section invalid */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>   }
> 
>   if (section_is_early && memmap)
> 

This patch works for me. The test ran successfully without any crash/failure.

Thanks
-Sachin

> a pfn_valid check involves pnf_section_valid() check if section is
> having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
> So when we do that tupdate the section to not have MEM_MAP.
> 
> -aneesh
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.6.0-rc7] Kernel crash while running ndctl tests

2020-03-24 Thread Aneesh Kumar K.V
Sachin Sant  writes:

> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
>
> Bisect leads me to  commit d41e2f3bd546 
> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>
> Reverting this commit helps and the tests complete without any crash.
>
> pmem0: detected capacity change from 0 to 10720641024
> BUG: Kernel NULL pointer dereference on read at 0x
> Faulting instruction address: 0xc0c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> libcrc32c ip6_tables nft_compat ip_set rfkill nf_tables nfnetlink sunrpc sg 
> pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi 
> ibmvscsi scsi_transport_srp ibmveth
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> NIP:  c0c3447c LR: c0088354 CTR: c018e990
> REGS: c006223fb630 TRAP: 0300   Not tainted  (5.6.0-rc7-autotest)
> MSR:  8280b033   CR: 2404  XER: 
> 
> CFAR: c000dec4 DAR:  DSISR: 4000 IRQMASK: 0 
> GPR00: c03c5820 c006223fb8c0 c1684900 0400 
> GPR04: c00c00010100 07ff c0067ff20900 c00c 
> GPR08:  c00c0001  c3f0 
> GPR12: 8000 c0001ec70200 7fffc102f9e8 1002e088 
> GPR16:  10050d88 1002f778 1002f770 
> GPR20:  0100 0001 1000 
> GPR24: 0008  0400 c00c00014000 
> GPR28: c3101aa0 c00c0001 0100 04000100 
> NIP [c0c3447c] vmemmap_populated+0x98/0xc0
> LR [c0088354] vmemmap_free+0x144/0x320
> Call Trace:
> [c006223fb8c0] [c006223fb960] 0xc006223fb960 (unreliable)
> [c006223fb980] [c03c5820] section_deactivate+0x220/0x240
> [c006223fba30] [c03dc1d8] __remove_pages+0x118/0x170
> [c006223fba80] [c0086e5c] arch_remove_memory+0x3c/0x150
> [c006223fbb00] [c041a3bc] memunmap_pages+0x1cc/0x2f0
> [c006223fbb80] [c07d6d00] devm_action_release+0x30/0x50
> [c006223fbba0] [c07d7de8] release_nodes+0x2f8/0x3e0
> [c006223fbc50] [c07d0b38] 
> device_release_driver_internal+0x168/0x270
> [c006223fbc90] [c07ccf50] unbind_store+0x130/0x170
> [c006223fbcd0] [c07cc0b4] drv_attr_store+0x44/0x60
> [c006223fbcf0] [c051fdb8] sysfs_kf_write+0x68/0x80
> [c006223fbd10] [c051f200] kernfs_fop_write+0x100/0x290
> [c006223fbd60] [c042037c] __vfs_write+0x3c/0x70
> [c006223fbd80] [c042404c] vfs_write+0xcc/0x240
> [c006223fbdd0] [c042442c] ksys_write+0x7c/0x140
> [c006223fbe20] [c000b278] system_call+0x5c/0x68
> Instruction dump:
> 2ea8 4196003c 794a2428 7d685215 41820030 7d48502a 71480002 41820024 
> 714a0008 4082002c e90b0008 786adf62  7c635436 70630001 4c820020 
> ---[ end trace 579b48162da1b890 ]—


Can you try this change?

diff --git a/mm/sparse.c b/mm/sparse.c
index aadb7298dcef..3012d1f3771a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned 
long nr_pages,
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+   /* Mark the section invalid */
+   ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
 
if (section_is_early && memmap)

a pfn_valid check involves pnf_section_valid() check if section is
having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
So when we do that tupdate the section to not have MEM_MAP.

-aneesh
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.6.0-rc7] Kernel crash while running ndctl tests

2020-03-24 Thread Sachin Sant


> On 24-Mar-2020, at 12:37 PM, Baoquan He  wrote:
> 
> Hi Sachin,
> 
> On 03/24/20 at 11:25am, Sachin Sant wrote:
>> While running ndctl[1] tests against 5.6.0-rc7 following crash is 
>> encountered.
>> 
>> Bisect leads me to  commit d41e2f3bd546 
>> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>> 
>> Reverting this commit helps and the tests complete without any crash.
> 
> Could you paste your kernel config and the boot log?
> 

I have attached boot.log as well as kernel config.

Thanks
-Sachin

___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.6.0-rc7] Kernel crash while running ndctl tests

2020-03-24 Thread Baoquan He
Hi Sachin,

On 03/24/20 at 11:25am, Sachin Sant wrote:
> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
> 
> Bisect leads me to  commit d41e2f3bd546 
> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
> 
> Reverting this commit helps and the tests complete without any crash.

Could you paste your kernel config and the boot log?

If it's confidential, private attachment is also OK.

Thanks
Baoquan

> 
> pmem0: detected capacity change from 0 to 10720641024
> BUG: Kernel NULL pointer dereference on read at 0x
> Faulting instruction address: 0xc0c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> libcrc32c ip6_tables nft_compat ip_set rfkill nf_tables nfnetlink sunrpc sg 
> pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi 
> ibmvscsi scsi_transport_srp ibmveth
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> NIP:  c0c3447c LR: c0088354 CTR: c018e990
> REGS: c006223fb630 TRAP: 0300   Not tainted  (5.6.0-rc7-autotest)
> MSR:  8280b033   CR: 2404  XER: 
> 
> CFAR: c000dec4 DAR:  DSISR: 4000 IRQMASK: 0 
> GPR00: c03c5820 c006223fb8c0 c1684900 0400 
> GPR04: c00c00010100 07ff c0067ff20900 c00c 
> GPR08:  c00c0001  c3f0 
> GPR12: 8000 c0001ec70200 7fffc102f9e8 1002e088 
> GPR16:  10050d88 1002f778 1002f770 
> GPR20:  0100 0001 1000 
> GPR24: 0008  0400 c00c00014000 
> GPR28: c3101aa0 c00c0001 0100 04000100 
> NIP [c0c3447c] vmemmap_populated+0x98/0xc0
> LR [c0088354] vmemmap_free+0x144/0x320
> Call Trace:
> [c006223fb8c0] [c006223fb960] 0xc006223fb960 (unreliable)
> [c006223fb980] [c03c5820] section_deactivate+0x220/0x240
> [c006223fba30] [c03dc1d8] __remove_pages+0x118/0x170
> [c006223fba80] [c0086e5c] arch_remove_memory+0x3c/0x150
> [c006223fbb00] [c041a3bc] memunmap_pages+0x1cc/0x2f0
> [c006223fbb80] [c07d6d00] devm_action_release+0x30/0x50
> [c006223fbba0] [c07d7de8] release_nodes+0x2f8/0x3e0
> [c006223fbc50] [c07d0b38] 
> device_release_driver_internal+0x168/0x270
> [c006223fbc90] [c07ccf50] unbind_store+0x130/0x170
> [c006223fbcd0] [c07cc0b4] drv_attr_store+0x44/0x60
> [c006223fbcf0] [c051fdb8] sysfs_kf_write+0x68/0x80
> [c006223fbd10] [c051f200] kernfs_fop_write+0x100/0x290
> [c006223fbd60] [c042037c] __vfs_write+0x3c/0x70
> [c006223fbd80] [c042404c] vfs_write+0xcc/0x240
> [c006223fbdd0] [c042442c] ksys_write+0x7c/0x140
> [c006223fbe20] [c000b278] system_call+0x5c/0x68
> Instruction dump:
> 2ea8 4196003c 794a2428 7d685215 41820030 7d48502a 71480002 41820024 
> 714a0008 4082002c e90b0008 786adf62  7c635436 70630001 4c820020 
> ---[ end trace 579b48162da1b890 ]—
> 
> Thanks
> -Sachin
> 
> [1] 
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org