Re: [powerpc] Kernel oops while running xfstests w/ext4 (5.18-rc6-next-20220510)

2022-05-11 Thread Sachin Sant



> On 11-May-2022, at 2:56 PM, Sachin Sant  wrote:
> 
> While running xfstests (specifically ext4/032) w/ext4 on a POWER9 LPAR running
> linux-next version 5.18.0-rc6-next-20220510 following crash is seen:
> 
> [  472.486440] EXT4-fs (loop0): resized filesystem to 41943040
> [  472.760888] BUG: Kernel NULL pointer dereference at 0x002c
> [  472.760891] Faulting instruction address: 0xc07729f4
> [  472.760894] Oops: Kernel access of bad area, sig: 11 [#1]
> [  472.760913] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [  472.760921] Modules linked in: loop(E) dm_mod(E) nft_fib_inet(E) 
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) 
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) 
> nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) 
> bonding(E) rfkill(E) tls(E) nf_tables(E) libcrc32c(E) nfnetlink(E) sunrpc(E) 
> pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) 
> sd_mod(E) sg(E) lpfc(E) nvmet_fc(E) nvmet(E) ibmvscsi(E) 
> scsi_transport_srp(E) ibmveth(E) nvme_fc(E) nvme(E) nvme_fabrics(E) 
> nvme_core(E) t10_pi(E) scsi_transport_fc(E) crc64_rocksoft(E) crc64(E) tg3(E) 
> ipmi_devintf(E) ipmi_msghandler(E) fuse(E)
> [  472.761006] CPU: 8 PID: 5139 Comm: kworker/u193:0 Tainted: GE  
>5.18.0-rc6-next-20220510 #2
> [  472.761013] Workqueue: loop0 loop_rootcg_workfn [loop]
> [  472.761027] NIP:  c07729f4 LR: c077331c CTR: 
> c09e9ac0
> [  472.761032] REGS: c0002d95b3a0 TRAP: 0380   Tainted: GE
>   (5.18.0-rc6-next-20220510)
> [  472.761038] MSR:  8280b033   CR: 
> 24008822  XER: 
> [  472.761057] CFAR: c0772b80 IRQMASK: 0 
> [  472.761057] GPR00: c077331c c0002d95b640 c2a7cf00 
> c0002d95b8e0 
> [  472.761057] GPR04: c0006fd58200 0001 0010 
> 0040 
> [  472.761057] GPR08: 0020  0001 
> c008089570f8 
> [  472.761057] GPR12: 8000 c0001ec46300  
> c00054e32200 
> [  472.761057] GPR16: 5deadbeef100   
>  
> [  472.761057] GPR20: 7fff c009fc817a00 c0002d95b748 
> c0002d95b8e0 
> [  472.761057] GPR24: 0001  c000842b1c00 
>  
> [  472.761057] GPR28:   c0006fd58200 
> c0002d95b8e0 
> [  472.761126] NIP [c07729f4] blk_add_rq_to_plug+0x74/0x1d0
> [  472.761135] LR [c077331c] 
> blk_mq_try_issue_list_directly+0x18c/0x1d0
> [  472.761141] Call Trace:
> [  472.761144] [c0002d95b640] [c000842b1c00] 0xc000842b1c00 
> (unreliable)
> [  472.761153] [c0002d95b680] [c0773244] 
> blk_mq_try_issue_list_directly+0xb4/0x1d0
> [  472.761160] [c0002d95b6d0] [c077b38c] 
> blk_mq_sched_insert_requests+0x13c/0x240
> [  472.761168] [c0002d95b720] [c0772658] 
> blk_mq_flush_plug_list+0x118/0x440
> [  472.761175] [c0002d95b7c0] [c075ecbc] 
> __blk_flush_plug+0x17c/0x200
> [  472.761183] [c0002d95b840] [c075efe0] blk_finish_plug+0x50/0x70
> [  472.761190] [c0002d95b870] [c061a2a4] 
> __iomap_dio_rw+0x444/0x960
> [  472.761200] [c0002d95ba60] [c061a7e0] iomap_dio_rw+0x20/0x90
> [  472.761208] [c0002d95ba80] [c00808c56424] 
> ext4_file_read_iter+0x17c/0x2d0 [ext4]
> [  472.761237] [c0002d95bac0] [c00809822aa8] 
> lo_rw_aio.isra.36+0x260/0x320 [loop]
> [  472.761245] [c0002d95bb40] [c00809824030] 
> loop_process_work+0x448/0xb70 [loop]
> [  472.761253] [c0002d95bc90] [c0183744] 
> process_one_work+0x2b4/0x5b0
> [  472.761262] [c0002d95bd30] [c0183ab8] worker_thread+0x78/0x600
> [  472.761269] [c0002d95bdc0] [c01901d4] kthread+0x124/0x130
> [  472.761276] [c0002d95be10] [c000ce04] 
> ret_from_kernel_thread+0x5c/0x64
> [  472.761284] Instruction dump:
> [  472.761288] 893f0014 38e00040 3920 2fa9 7d283f9e 7e8a4840 409400b4 
> e93e 
> [  472.761300] e9290068 71290008 40820024 3d41 <813d002c> 614a 
> 7e895040 41950090 
> [  472.761314] ---[ end trace  ]---
> [  472.769088] 
> [  473.769091] Kernel panic - not syncing: Fatal exception
> 
> 5.18.0-rc6-next-20220509 build did not exhibit this problem.
> Will try git bisect and report back with results.
> 

Unfortunately git bisect doesn’t seem to help.
first bad commit: [3aedd17333a514c6f2542ed305d940e7a970a6f2] 
  Merge branch 'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git

# git bisect log
git bisect start
# bad: [3bf222d317a20170ee17f082626c1e0f83537e13] Add linux-next specific files 
for 20220510
git bisect bad 3bf222d317a20170ee17f082626c1e0f83537e13
# good: [c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a] Linux 5.18-rc6
git bisect good 

[powerpc] Kernel oops while running xfstests w/ext4 (5.18-rc6-next-20220510)

2022-05-11 Thread Sachin Sant
While running xfstests (specifically ext4/032) w/ext4 on a POWER9 LPAR running
linux-next version 5.18.0-rc6-next-20220510 following crash is seen:

[  472.486440] EXT4-fs (loop0): resized filesystem to 41943040
[  472.760888] BUG: Kernel NULL pointer dereference at 0x002c
[  472.760891] Faulting instruction address: 0xc07729f4
[  472.760894] Oops: Kernel access of bad area, sig: 11 [#1]
[  472.760913] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[  472.760921] Modules linked in: loop(E) dm_mod(E) nft_fib_inet(E) 
nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) 
nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) 
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) bonding(E) 
rfkill(E) tls(E) nf_tables(E) libcrc32c(E) nfnetlink(E) sunrpc(E) 
pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) 
sd_mod(E) sg(E) lpfc(E) nvmet_fc(E) nvmet(E) ibmvscsi(E) scsi_transport_srp(E) 
ibmveth(E) nvme_fc(E) nvme(E) nvme_fabrics(E) nvme_core(E) t10_pi(E) 
scsi_transport_fc(E) crc64_rocksoft(E) crc64(E) tg3(E) ipmi_devintf(E) 
ipmi_msghandler(E) fuse(E)
[  472.761006] CPU: 8 PID: 5139 Comm: kworker/u193:0 Tainted: GE
 5.18.0-rc6-next-20220510 #2
[  472.761013] Workqueue: loop0 loop_rootcg_workfn [loop]
[  472.761027] NIP:  c07729f4 LR: c077331c CTR: c09e9ac0
[  472.761032] REGS: c0002d95b3a0 TRAP: 0380   Tainted: GE  
(5.18.0-rc6-next-20220510)
[  472.761038] MSR:  8280b033   CR: 
24008822  XER: 
[  472.761057] CFAR: c0772b80 IRQMASK: 0 
[  472.761057] GPR00: c077331c c0002d95b640 c2a7cf00 
c0002d95b8e0 
[  472.761057] GPR04: c0006fd58200 0001 0010 
0040 
[  472.761057] GPR08: 0020  0001 
c008089570f8 
[  472.761057] GPR12: 8000 c0001ec46300  
c00054e32200 
[  472.761057] GPR16: 5deadbeef100   
 
[  472.761057] GPR20: 7fff c009fc817a00 c0002d95b748 
c0002d95b8e0 
[  472.761057] GPR24: 0001  c000842b1c00 
 
[  472.761057] GPR28:   c0006fd58200 
c0002d95b8e0 
[  472.761126] NIP [c07729f4] blk_add_rq_to_plug+0x74/0x1d0
[  472.761135] LR [c077331c] blk_mq_try_issue_list_directly+0x18c/0x1d0
[  472.761141] Call Trace:
[  472.761144] [c0002d95b640] [c000842b1c00] 0xc000842b1c00 
(unreliable)
[  472.761153] [c0002d95b680] [c0773244] 
blk_mq_try_issue_list_directly+0xb4/0x1d0
[  472.761160] [c0002d95b6d0] [c077b38c] 
blk_mq_sched_insert_requests+0x13c/0x240
[  472.761168] [c0002d95b720] [c0772658] 
blk_mq_flush_plug_list+0x118/0x440
[  472.761175] [c0002d95b7c0] [c075ecbc] 
__blk_flush_plug+0x17c/0x200
[  472.761183] [c0002d95b840] [c075efe0] blk_finish_plug+0x50/0x70
[  472.761190] [c0002d95b870] [c061a2a4] __iomap_dio_rw+0x444/0x960
[  472.761200] [c0002d95ba60] [c061a7e0] iomap_dio_rw+0x20/0x90
[  472.761208] [c0002d95ba80] [c00808c56424] 
ext4_file_read_iter+0x17c/0x2d0 [ext4]
[  472.761237] [c0002d95bac0] [c00809822aa8] 
lo_rw_aio.isra.36+0x260/0x320 [loop]
[  472.761245] [c0002d95bb40] [c00809824030] 
loop_process_work+0x448/0xb70 [loop]
[  472.761253] [c0002d95bc90] [c0183744] 
process_one_work+0x2b4/0x5b0
[  472.761262] [c0002d95bd30] [c0183ab8] worker_thread+0x78/0x600
[  472.761269] [c0002d95bdc0] [c01901d4] kthread+0x124/0x130
[  472.761276] [c0002d95be10] [c000ce04] 
ret_from_kernel_thread+0x5c/0x64
[  472.761284] Instruction dump:
[  472.761288] 893f0014 38e00040 3920 2fa9 7d283f9e 7e8a4840 409400b4 
e93e 
[  472.761300] e9290068 71290008 40820024 3d41 <813d002c> 614a 7e895040 
41950090 
[  472.761314] ---[ end trace  ]---
[  472.769088] 
[  473.769091] Kernel panic - not syncing: Fatal exception

5.18.0-rc6-next-20220509 build did not exhibit this problem.
Will try git bisect and report back with results.

- Sachin