Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc
On Mon, Feb 19, 2018 at 02:47:35PM +0100, Michal Hocko wrote: > [CC Kirill - I have a vague recollection that there were some follow ups > for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for > CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?] All fixups are in v4.15. > On Mon 05-02-18 19:54:24, Abdul Haleem wrote: > > > > Greetings, > > > > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel. > > > > Machine: Power6 PowerVM ppc64 > > Kernel: 4.15.0 > > Config: attached > > gcc: 4.8.2 > > Test: Memory hot-unplug of a memory block > > echo offline > /sys/devices/system/memory/memory/state > > > > The faulty instruction address points to the code path: > > > > # gdb -batch vmlinux -ex 'list *(0xc0238330)' > > 0xc0238330 is in get_pfnblock_flags_mask > > (./include/linux/mmzone.h:1157). > > 1152#endif > > 1153 > > 1154static inline struct mem_section *__nr_to_section(unsigned long > > nr) > > 1155{ > > 1156#ifdef CONFIG_SPARSEMEM_EXTREME > > 1157if (!mem_section) > > 1158return NULL; > > 1159#endif > > 1160if (!mem_section[SECTION_NR_TO_ROOT(nr)]) > > 1161return NULL; > > > > > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) Any chance to bisect it? Could you check if the commit just before 83e3c48729d9 is fine? -- Kirill A. Shutemov
Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc
On Mon, Feb 19, 2018 at 02:47:35PM +0100, Michal Hocko wrote: > [CC Kirill - I have a vague recollection that there were some follow ups > for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for > CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?] All fixups are in v4.15. > On Mon 05-02-18 19:54:24, Abdul Haleem wrote: > > > > Greetings, > > > > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel. > > > > Machine: Power6 PowerVM ppc64 > > Kernel: 4.15.0 > > Config: attached > > gcc: 4.8.2 > > Test: Memory hot-unplug of a memory block > > echo offline > /sys/devices/system/memory/memory/state > > > > The faulty instruction address points to the code path: > > > > # gdb -batch vmlinux -ex 'list *(0xc0238330)' > > 0xc0238330 is in get_pfnblock_flags_mask > > (./include/linux/mmzone.h:1157). > > 1152#endif > > 1153 > > 1154static inline struct mem_section *__nr_to_section(unsigned long > > nr) > > 1155{ > > 1156#ifdef CONFIG_SPARSEMEM_EXTREME > > 1157if (!mem_section) > > 1158return NULL; > > 1159#endif > > 1160if (!mem_section[SECTION_NR_TO_ROOT(nr)]) > > 1161return NULL; > > > > > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) Any chance to bisect it? Could you check if the commit just before 83e3c48729d9 is fine? -- Kirill A. Shutemov
Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc
[CC Kirill - I have a vague recollection that there were some follow ups for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?] On Mon 05-02-18 19:54:24, Abdul Haleem wrote: > > Greetings, > > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel. > > Machine: Power6 PowerVM ppc64 > Kernel: 4.15.0 > Config: attached > gcc: 4.8.2 > Test: Memory hot-unplug of a memory block > echo offline > /sys/devices/system/memory/memory/state > > The faulty instruction address points to the code path: > > # gdb -batch vmlinux -ex 'list *(0xc0238330)' > 0xc0238330 is in get_pfnblock_flags_mask > (./include/linux/mmzone.h:1157). > 1152 #endif > 1153 > 1154 static inline struct mem_section *__nr_to_section(unsigned long nr) > 1155 { > 1156 #ifdef CONFIG_SPARSEMEM_EXTREME > 1157 if (!mem_section) > 1158 return NULL; > 1159 #endif > 1160 if (!mem_section[SECTION_NR_TO_ROOT(nr)]) > 1161 return NULL; > > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) > > Trace messages: > --- > Offlined Pages 1024 > ehea: memory is going online > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > ehea: memory is going online > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > ehea: memory is going offline > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > Offlined Pages 1024 > Unable to handle kernel paging request for data at address > 0xc0005b706ad88178 > Faulting instruction address: 0xc0238330 > Oops: Kernel access of bad area, sig: 11 [#1] > BE SMP NR_CPUS=1024 NUMA pSeries > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: rpadlpar_io(E) rpaphp(E) xt_CHECKSUM(E) bnep(E) > bluetooth(E) ecdh_generic(E) nf_conntrack_netbios_ns(E) > nf_conntrack_broadcast(E) ip6t_REJECT(E) nf_reject_ipv6(E) > nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) > nf_conntrack_ipv4(E) nf_defrag_ipv4(E) cfg80211(E) xt_conntrack(E) > rfkill(E) nf_conntrack(E) libcrc32c(E) ebtable_nat(E) ebtable_broute(E) > bridge(E) stp(E) llc(E) ebtable_filter(E) ebtables(E) ip6table_mangle(E) > ip6table_security(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) > iptable_mangle(E) iptable_security(E) iptable_raw(E) iptable_filter(E) > ip_tables(E) ses(E) enclosure(E) osst(E) scsi_transport_sas(E) st(E) > nfsd(E) auth_rpcgss(E) ehea(E) uio_pdrv_genirq(E) nfs_acl(E) uio(E) > lockd(E) sunrpc(E) grace(E) ipv6(E) crc_ccitt(E) autofs4(E) > ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) dm_mirror(E) > dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) > CPU: 12 PID: 6981 Comm: avocado Tainted: GW E4.15.0-autotest #1 > NIP: c0238330 LR: c02c5dcc CTR: c0122f80 > REGS: c002aef63370 TRAP: 0300 Tainted: GW E > (4.15.0-autotest) > MSR: 8200b032CR: 24242488 XER: > CFAR: c000883c DAR: c0005b706ad88178 DSISR: 4000 SOFTE: 1 > GPR00: c02c5b70 c002aef635f0 c1101a00 c002b1563800 > GPR04: b6db6db6e3f3e100 0002 0007 0010a000 > GPR08: 5b6db6db71f8 c002b3fd0f80 00b6db6db6e3f3e1 0040 > GPR12: 24242488 ced43c00 01001054b518 00803cdf37a8 > GPR16: c002b3f3ce20 0001 0001 > GPR20: 0001 00026400 003f > GPR24: c113caa0 0008 0001 > GPR28: c12a1620 c002b3f3ce20 c002b3f3ca00 c002b156 > NIP [c0238330] .get_pfnblock_flags_mask+0x20/0xd0 > LR [c02c5dcc] .unset_migratetype_isolate+0x2bc/0x340 > Call Trace: > [c002aef635f0] [c02c5b70] .unset_migratetype_isolate+0x60/0x340 > (unreliable) > [c002aef636a0] [c02c60e0] .start_isolate_page_range+0x290/0x450 > [c002aef637a0] [c02c0164] .__offline_pages+0x114/0xaa0 > [c002aef638f0] [c058a9b8] .memory_subsys_offline+0x58/0xe0 > [c002aef63970] [c0567638] .device_offline+0xe8/0x130 > [c002aef63a00] [c058a71c] .store_mem_state+0x15c/0x180 > [c002aef63a90] [c0562710] .dev_attr_store+0x30/0x60 > [c002aef63b00] [c03789e0] .sysfs_kf_write+0x60/0xa0 > [c002aef63b70] [c03777a4] .kernfs_fop_write+0x184/0x260 > [c002aef63c10] [c02cce8c] .__vfs_write+0x3c/0x1e0 > [c002aef63cf0] [c02cd240] .vfs_write+0xc0/0x230 > [c002aef63d90] [c02cd558] .SyS_write+0x58/0x100 > [c002aef63e30] [c000b858] system_call+0x58/0x6c > Instruction dump: > 4e800020 6000 6000 6000
Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc
[CC Kirill - I have a vague recollection that there were some follow ups for 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y"). Does any of them apply to this issue?] On Mon 05-02-18 19:54:24, Abdul Haleem wrote: > > Greetings, > > Kernel Oops seen when memory hot-unplug on powerpc mainline kernel. > > Machine: Power6 PowerVM ppc64 > Kernel: 4.15.0 > Config: attached > gcc: 4.8.2 > Test: Memory hot-unplug of a memory block > echo offline > /sys/devices/system/memory/memory/state > > The faulty instruction address points to the code path: > > # gdb -batch vmlinux -ex 'list *(0xc0238330)' > 0xc0238330 is in get_pfnblock_flags_mask > (./include/linux/mmzone.h:1157). > 1152 #endif > 1153 > 1154 static inline struct mem_section *__nr_to_section(unsigned long nr) > 1155 { > 1156 #ifdef CONFIG_SPARSEMEM_EXTREME > 1157 if (!mem_section) > 1158 return NULL; > 1159 #endif > 1160 if (!mem_section[SECTION_NR_TO_ROOT(nr)]) > 1161 return NULL; > > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) > > Trace messages: > --- > Offlined Pages 1024 > ehea: memory is going online > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > ehea: memory is going online > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > ehea: memory is going offline > ehea: LPAR memory changed - re-initializing driver > ehea: re-initializing driver complete > Offlined Pages 1024 > Unable to handle kernel paging request for data at address > 0xc0005b706ad88178 > Faulting instruction address: 0xc0238330 > Oops: Kernel access of bad area, sig: 11 [#1] > BE SMP NR_CPUS=1024 NUMA pSeries > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: rpadlpar_io(E) rpaphp(E) xt_CHECKSUM(E) bnep(E) > bluetooth(E) ecdh_generic(E) nf_conntrack_netbios_ns(E) > nf_conntrack_broadcast(E) ip6t_REJECT(E) nf_reject_ipv6(E) > nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) > nf_conntrack_ipv4(E) nf_defrag_ipv4(E) cfg80211(E) xt_conntrack(E) > rfkill(E) nf_conntrack(E) libcrc32c(E) ebtable_nat(E) ebtable_broute(E) > bridge(E) stp(E) llc(E) ebtable_filter(E) ebtables(E) ip6table_mangle(E) > ip6table_security(E) ip6table_raw(E) ip6table_filter(E) ip6_tables(E) > iptable_mangle(E) iptable_security(E) iptable_raw(E) iptable_filter(E) > ip_tables(E) ses(E) enclosure(E) osst(E) scsi_transport_sas(E) st(E) > nfsd(E) auth_rpcgss(E) ehea(E) uio_pdrv_genirq(E) nfs_acl(E) uio(E) > lockd(E) sunrpc(E) grace(E) ipv6(E) crc_ccitt(E) autofs4(E) > ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) dm_mirror(E) > dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) > CPU: 12 PID: 6981 Comm: avocado Tainted: GW E4.15.0-autotest #1 > NIP: c0238330 LR: c02c5dcc CTR: c0122f80 > REGS: c002aef63370 TRAP: 0300 Tainted: GW E > (4.15.0-autotest) > MSR: 8200b032 CR: 24242488 XER: > CFAR: c000883c DAR: c0005b706ad88178 DSISR: 4000 SOFTE: 1 > GPR00: c02c5b70 c002aef635f0 c1101a00 c002b1563800 > GPR04: b6db6db6e3f3e100 0002 0007 0010a000 > GPR08: 5b6db6db71f8 c002b3fd0f80 00b6db6db6e3f3e1 0040 > GPR12: 24242488 ced43c00 01001054b518 00803cdf37a8 > GPR16: c002b3f3ce20 0001 0001 > GPR20: 0001 00026400 003f > GPR24: c113caa0 0008 0001 > GPR28: c12a1620 c002b3f3ce20 c002b3f3ca00 c002b156 > NIP [c0238330] .get_pfnblock_flags_mask+0x20/0xd0 > LR [c02c5dcc] .unset_migratetype_isolate+0x2bc/0x340 > Call Trace: > [c002aef635f0] [c02c5b70] .unset_migratetype_isolate+0x60/0x340 > (unreliable) > [c002aef636a0] [c02c60e0] .start_isolate_page_range+0x290/0x450 > [c002aef637a0] [c02c0164] .__offline_pages+0x114/0xaa0 > [c002aef638f0] [c058a9b8] .memory_subsys_offline+0x58/0xe0 > [c002aef63970] [c0567638] .device_offline+0xe8/0x130 > [c002aef63a00] [c058a71c] .store_mem_state+0x15c/0x180 > [c002aef63a90] [c0562710] .dev_attr_store+0x30/0x60 > [c002aef63b00] [c03789e0] .sysfs_kf_write+0x60/0xa0 > [c002aef63b70] [c03777a4] .kernfs_fop_write+0x184/0x260 > [c002aef63c10] [c02cce8c] .__vfs_write+0x3c/0x1e0 > [c002aef63cf0] [c02cd240] .vfs_write+0xc0/0x230 > [c002aef63d90] [c02cd558] .SyS_write+0x58/0x100 > [c002aef63e30] [c000b858] system_call+0x58/0x6c > Instruction dump: > 4e800020 6000 6000 6000 3d02001a 788ac202 3928fc20