Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Kirill A. Shutemov
On Mon, Feb 19, 2018 at 02:47:35PM +0100, Michal Hocko wrote:
> [CC Kirill - I have a vague recollection that there were some follow ups
>  for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for
>  CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?]

All fixups are in v4.15.

> On Mon 05-02-18 19:54:24, Abdul Haleem wrote:
> > 
> > Greetings,
> > 
> > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel.
> > 
> > Machine: Power6 PowerVM ppc64
> > Kernel: 4.15.0
> > Config: attached
> > gcc: 4.8.2
> > Test: Memory hot-unplug of a memory block
> > echo offline > /sys/devices/system/memory/memory/state
> > 
> > The faulty instruction address points to the code path:
> > 
> > # gdb -batch vmlinux -ex 'list *(0xc0238330)'
> > 0xc0238330 is in get_pfnblock_flags_mask
> > (./include/linux/mmzone.h:1157).
> > 1152#endif
> > 1153
> > 1154static inline struct mem_section *__nr_to_section(unsigned long 
> > nr)
> > 1155{
> > 1156#ifdef CONFIG_SPARSEMEM_EXTREME
> > 1157if (!mem_section)
> > 1158return NULL;
> > 1159#endif
> > 1160if (!mem_section[SECTION_NR_TO_ROOT(nr)])
> > 1161return NULL;
> > 
> > 
> > The code was first introduced with commit( 83e3c48: mm/sparsemem:
> > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y)

Any chance to bisect it?

Could you check if the commit just before 83e3c48729d9 is fine?

-- 
 Kirill A. Shutemov


Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Kirill A. Shutemov
On Mon, Feb 19, 2018 at 02:47:35PM +0100, Michal Hocko wrote:
> [CC Kirill - I have a vague recollection that there were some follow ups
>  for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for
>  CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?]

All fixups are in v4.15.

> On Mon 05-02-18 19:54:24, Abdul Haleem wrote:
> > 
> > Greetings,
> > 
> > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel.
> > 
> > Machine: Power6 PowerVM ppc64
> > Kernel: 4.15.0
> > Config: attached
> > gcc: 4.8.2
> > Test: Memory hot-unplug of a memory block
> > echo offline > /sys/devices/system/memory/memory/state
> > 
> > The faulty instruction address points to the code path:
> > 
> > # gdb -batch vmlinux -ex 'list *(0xc0238330)'
> > 0xc0238330 is in get_pfnblock_flags_mask
> > (./include/linux/mmzone.h:1157).
> > 1152#endif
> > 1153
> > 1154static inline struct mem_section *__nr_to_section(unsigned long 
> > nr)
> > 1155{
> > 1156#ifdef CONFIG_SPARSEMEM_EXTREME
> > 1157if (!mem_section)
> > 1158return NULL;
> > 1159#endif
> > 1160if (!mem_section[SECTION_NR_TO_ROOT(nr)])
> > 1161return NULL;
> > 
> > 
> > The code was first introduced with commit( 83e3c48: mm/sparsemem:
> > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y)

Any chance to bisect it?

Could you check if the commit just before 83e3c48729d9 is fine?

-- 
 Kirill A. Shutemov


Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Michal Hocko
[CC Kirill - I have a vague recollection that there were some follow ups
 for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for
 CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?]

On Mon 05-02-18 19:54:24, Abdul Haleem wrote:
> 
> Greetings,
> 
> Kernel Oops seen when memory hot-unplug on powerpc mainline kernel.
> 
> Machine: Power6 PowerVM ppc64
> Kernel: 4.15.0
> Config: attached
> gcc: 4.8.2
> Test: Memory hot-unplug of a memory block
> echo offline > /sys/devices/system/memory/memory/state
> 
> The faulty instruction address points to the code path:
> 
> # gdb -batch vmlinux -ex 'list *(0xc0238330)'
> 0xc0238330 is in get_pfnblock_flags_mask
> (./include/linux/mmzone.h:1157).
> 1152  #endif
> 1153  
> 1154  static inline struct mem_section *__nr_to_section(unsigned long nr)
> 1155  {
> 1156  #ifdef CONFIG_SPARSEMEM_EXTREME
> 1157  if (!mem_section)
> 1158  return NULL;
> 1159  #endif
> 1160  if (!mem_section[SECTION_NR_TO_ROOT(nr)])
> 1161  return NULL;
> 
> 
> The code was first introduced with commit( 83e3c48: mm/sparsemem:
> Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y)
> 
> Trace messages:
> ---
> Offlined Pages 1024
> ehea: memory is going online
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> ehea: memory is going online
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> ehea: memory is going offline
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> Offlined Pages 1024
> Unable to handle kernel paging request for data at address
> 0xc0005b706ad88178
> Faulting instruction address: 0xc0238330
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE SMP NR_CPUS=1024 NUMA pSeries
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in: rpadlpar_io(E) rpaphp(E) xt_CHECKSUM(E) bnep(E)
> bluetooth(E) ecdh_generic(E) nf_conntrack_netbios_ns(E)
> nf_conntrack_broadcast(E) ip6t_REJECT(E) nf_reject_ipv6(E)
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E)
> nf_conntrack_ipv4(E) nf_defrag_ipv4(E) cfg80211(E) xt_conntrack(E)
> rfkill(E) nf_conntrack(E) libcrc32c(E) ebtable_nat(E) ebtable_broute(E)
> bridge(E) stp(E) llc(E) ebtable_filter(E) ebtables(E) ip6table_mangle(E)
> ip6table_security(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E)
> iptable_mangle(E) iptable_security(E) iptable_raw(E) iptable_filter(E)
> ip_tables(E) ses(E) enclosure(E) osst(E) scsi_transport_sas(E) st(E)
> nfsd(E) auth_rpcgss(E) ehea(E) uio_pdrv_genirq(E) nfs_acl(E) uio(E)
> lockd(E) sunrpc(E) grace(E) ipv6(E) crc_ccitt(E) autofs4(E)
>  ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 12 PID: 6981 Comm: avocado Tainted: GW   E4.15.0-autotest #1
> NIP:  c0238330 LR: c02c5dcc CTR: c0122f80
> REGS: c002aef63370 TRAP: 0300   Tainted: GW   E 
> (4.15.0-autotest)
> MSR:  8200b032   CR: 24242488  XER: 
> CFAR: c000883c DAR: c0005b706ad88178 DSISR: 4000 SOFTE: 1 
> GPR00: c02c5b70 c002aef635f0 c1101a00 c002b1563800 
> GPR04: b6db6db6e3f3e100 0002 0007 0010a000 
> GPR08: 5b6db6db71f8 c002b3fd0f80 00b6db6db6e3f3e1 0040 
> GPR12: 24242488 ced43c00 01001054b518 00803cdf37a8 
> GPR16:  c002b3f3ce20 0001 0001 
> GPR20: 0001  00026400 003f 
> GPR24: c113caa0 0008  0001 
> GPR28: c12a1620 c002b3f3ce20 c002b3f3ca00 c002b156 
> NIP [c0238330] .get_pfnblock_flags_mask+0x20/0xd0
> LR [c02c5dcc] .unset_migratetype_isolate+0x2bc/0x340
> Call Trace:
> [c002aef635f0] [c02c5b70] .unset_migratetype_isolate+0x60/0x340 
> (unreliable)
> [c002aef636a0] [c02c60e0] .start_isolate_page_range+0x290/0x450
> [c002aef637a0] [c02c0164] .__offline_pages+0x114/0xaa0
> [c002aef638f0] [c058a9b8] .memory_subsys_offline+0x58/0xe0
> [c002aef63970] [c0567638] .device_offline+0xe8/0x130
> [c002aef63a00] [c058a71c] .store_mem_state+0x15c/0x180
> [c002aef63a90] [c0562710] .dev_attr_store+0x30/0x60
> [c002aef63b00] [c03789e0] .sysfs_kf_write+0x60/0xa0
> [c002aef63b70] [c03777a4] .kernfs_fop_write+0x184/0x260
> [c002aef63c10] [c02cce8c] .__vfs_write+0x3c/0x1e0
> [c002aef63cf0] [c02cd240] .vfs_write+0xc0/0x230
> [c002aef63d90] [c02cd558] .SyS_write+0x58/0x100
> [c002aef63e30] [c000b858] system_call+0x58/0x6c
> Instruction dump:
> 4e800020 6000 6000 6000 

Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Michal Hocko
[CC Kirill - I have a vague recollection that there were some follow ups
 for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for
 CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?]

On Mon 05-02-18 19:54:24, Abdul Haleem wrote:
> 
> Greetings,
> 
> Kernel Oops seen when memory hot-unplug on powerpc mainline kernel.
> 
> Machine: Power6 PowerVM ppc64
> Kernel: 4.15.0
> Config: attached
> gcc: 4.8.2
> Test: Memory hot-unplug of a memory block
> echo offline > /sys/devices/system/memory/memory/state
> 
> The faulty instruction address points to the code path:
> 
> # gdb -batch vmlinux -ex 'list *(0xc0238330)'
> 0xc0238330 is in get_pfnblock_flags_mask
> (./include/linux/mmzone.h:1157).
> 1152  #endif
> 1153  
> 1154  static inline struct mem_section *__nr_to_section(unsigned long nr)
> 1155  {
> 1156  #ifdef CONFIG_SPARSEMEM_EXTREME
> 1157  if (!mem_section)
> 1158  return NULL;
> 1159  #endif
> 1160  if (!mem_section[SECTION_NR_TO_ROOT(nr)])
> 1161  return NULL;
> 
> 
> The code was first introduced with commit( 83e3c48: mm/sparsemem:
> Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y)
> 
> Trace messages:
> ---
> Offlined Pages 1024
> ehea: memory is going online
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> ehea: memory is going online
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> ehea: memory is going offline
> ehea: LPAR memory changed - re-initializing driver
> ehea: re-initializing driver complete
> Offlined Pages 1024
> Unable to handle kernel paging request for data at address
> 0xc0005b706ad88178
> Faulting instruction address: 0xc0238330
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE SMP NR_CPUS=1024 NUMA pSeries
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in: rpadlpar_io(E) rpaphp(E) xt_CHECKSUM(E) bnep(E)
> bluetooth(E) ecdh_generic(E) nf_conntrack_netbios_ns(E)
> nf_conntrack_broadcast(E) ip6t_REJECT(E) nf_reject_ipv6(E)
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E)
> nf_conntrack_ipv4(E) nf_defrag_ipv4(E) cfg80211(E) xt_conntrack(E)
> rfkill(E) nf_conntrack(E) libcrc32c(E) ebtable_nat(E) ebtable_broute(E)
> bridge(E) stp(E) llc(E) ebtable_filter(E) ebtables(E) ip6table_mangle(E)
> ip6table_security(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E)
> iptable_mangle(E) iptable_security(E) iptable_raw(E) iptable_filter(E)
> ip_tables(E) ses(E) enclosure(E) osst(E) scsi_transport_sas(E) st(E)
> nfsd(E) auth_rpcgss(E) ehea(E) uio_pdrv_genirq(E) nfs_acl(E) uio(E)
> lockd(E) sunrpc(E) grace(E) ipv6(E) crc_ccitt(E) autofs4(E)
>  ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 12 PID: 6981 Comm: avocado Tainted: GW   E4.15.0-autotest #1
> NIP:  c0238330 LR: c02c5dcc CTR: c0122f80
> REGS: c002aef63370 TRAP: 0300   Tainted: GW   E 
> (4.15.0-autotest)
> MSR:  8200b032   CR: 24242488  XER: 
> CFAR: c000883c DAR: c0005b706ad88178 DSISR: 4000 SOFTE: 1 
> GPR00: c02c5b70 c002aef635f0 c1101a00 c002b1563800 
> GPR04: b6db6db6e3f3e100 0002 0007 0010a000 
> GPR08: 5b6db6db71f8 c002b3fd0f80 00b6db6db6e3f3e1 0040 
> GPR12: 24242488 ced43c00 01001054b518 00803cdf37a8 
> GPR16:  c002b3f3ce20 0001 0001 
> GPR20: 0001  00026400 003f 
> GPR24: c113caa0 0008  0001 
> GPR28: c12a1620 c002b3f3ce20 c002b3f3ca00 c002b156 
> NIP [c0238330] .get_pfnblock_flags_mask+0x20/0xd0
> LR [c02c5dcc] .unset_migratetype_isolate+0x2bc/0x340
> Call Trace:
> [c002aef635f0] [c02c5b70] .unset_migratetype_isolate+0x60/0x340 
> (unreliable)
> [c002aef636a0] [c02c60e0] .start_isolate_page_range+0x290/0x450
> [c002aef637a0] [c02c0164] .__offline_pages+0x114/0xaa0
> [c002aef638f0] [c058a9b8] .memory_subsys_offline+0x58/0xe0
> [c002aef63970] [c0567638] .device_offline+0xe8/0x130
> [c002aef63a00] [c058a71c] .store_mem_state+0x15c/0x180
> [c002aef63a90] [c0562710] .dev_attr_store+0x30/0x60
> [c002aef63b00] [c03789e0] .sysfs_kf_write+0x60/0xa0
> [c002aef63b70] [c03777a4] .kernfs_fop_write+0x184/0x260
> [c002aef63c10] [c02cce8c] .__vfs_write+0x3c/0x1e0
> [c002aef63cf0] [c02cd240] .vfs_write+0xc0/0x230
> [c002aef63d90] [c02cd558] .SyS_write+0x58/0x100
> [c002aef63e30] [c000b858] system_call+0x58/0x6c
> Instruction dump:
> 4e800020 6000 6000 6000 3d02001a 788ac202 3928fc20