[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault

2021-06-22 Thread Jake Staehle
Another one today on 5.8.0-55:

[  OK  ] Started Hostname Service.
[  OK  ] Started User Login Management.
[  OK  ] Started Docker Application Container Engine.

Ubuntu 20.10 babylon ttyS0

babylon login: [   43.284962] cloud-init[6278]: Cloud-init v. 
21.2-3-g899bfaa9-0ubuntu2~20.10.1 running 'modules:config' at Thu, 17 Jun 2021 
04:50:23 +. Up 43.23 seconds.
[   43.555449] cloud-init[6294]: Cloud-init v. 
21.2-3-g899bfaa9-0ubuntu2~20.10.1 running 'modules:final' at Thu, 17 Jun 2021 
04:50:23 +. Up 43.48 seconds.
[   43.59] cloud-init[6294]: Cloud-init v. 
21.2-3-g899bfaa9-0ubuntu2~20.10.1 finished at Thu, 17 Jun 2021 04:50:23 +. 
Datasource DataSourceNone.  Up 43.55 seconds
[   43.98] cloud-init[6294]: 2021-06-17 04:50:23,906 - 
cc_final_message.py[WARNING]: Used fallback datasource
[470667.791418] BUG: stack guard page was hit at 6cd7c52c (stack is 
b38fb7cf..d2b542d2)
[470667.791418] kernel stack overflow (double-fault):  [#1] SMP NOPTI
[470667.791418] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P   OE 
5.8.0-55-generic #62-Ubuntu
[470667.791419] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS 
PRO/B550M AORUS PRO, BIOS F13h 04/23/2021
[470667.791419] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[470667.791419] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 
8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 
d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[470667.791419] RSP: 0018:9b13808b3ff8 EFLAGS: 00010246
[470667.791420] RAX: 8c43bd86d9c0 RBX: 8c459b407800 RCX: 
0001
[470667.791420] RDX: 9b13808b4040 RSI: 9b13808b4038 RDI: 
8c459b407800
[470667.791420] RBP: 9b13808b4028 R08: 0001 R09: 
ae641900
[470667.791421] R10: 8c43bd86d1e0 R11: 0001 R12: 
9b13808b4038
[470667.791421] R13: 9b13808b4040 R14: 8c43bd86d9c0 R15: 
8c4585ea1070
[470667.791421] FS:  () GS:8c459edc() 
knlGS:
[470667.791421] CS:  0010 DS:  ES:  CR0: 80050033
[470667.791422] CR2: 9b13808b3fe8 CR3: 0007948a8000 CR4: 
00340ee0
[470667.791422] Call Trace:
[470667.791422]  ? mempool_kfree+0xe/0x10
[470667.791422]  ? kfree+0xb8/0x220
[470667.791422]  ? mempool_kfree+0xe/0x10
[470667.791422]  ? mempool_free+0x2f/0x80
[470667.791422]  ? md_end_io+0x4b/0x70
[470667.791423]  ? bio_endio+0xe6/0x150
[470667.791423]  ? bio_chain_endio+0x2d/0x40
[470667.791423]  ? md_end_io+0x5d/0x70
[470667.791423]  ? bio_endio+0xe6/0x150
[470667.791423]  ? bio_chain_endio+0x2d/0x40
[470667.791423]  ? md_end_io+0x5d/0x70
[470667.791423]  ? bio_endio+0xe6/0x150
[470667.791424]  ? bio_chain_endio+0x2d/0x40
[470667.791424]  ? md_end_io+0x5d/0x70
[470667.791424]  ? bio_endio+0xe6/0x150
[470667.791424]  ? bio_chain_endio+0x2d/0x40
[470667.791424]  ? md_end_io+0x5d/0x70
[470667.791424]  ? bio_endio+0xe6/0x150
[470667.791424]  ? bio_chain_endio+0x2d/0x40
[470667.791424]  ? md_end_io+0x5d/0x70
[470667.791425]  ? bio_endio+0xe6/0x150
[470667.791425]  ? bio_chain_endio+0x2d/0x40
[470667.791425]  ? md_end_io+0x5d/0x70
[470667.791425]  ? bio_endio+0xe6/0x150
[470667.791425]  ? bio_chain_endio+0x2d/0x40
[470667.791425]  ? md_end_io+0x5d/0x70
[470667.791425]  ? bio_endio+0xe6/0x150
[470667.791425]  ? bio_chain_endio+0x2d/0x40
[470667.791426]  ? md_end_io+0x5d/0x70
[470667.791426]  ? bio_endio+0xe6/0x150
[470667.791426]  ? bio_chain_endio+0x2d/0x40
[470667.791426]  ? md_end_io+0x5d/0x70
[470667.791426]  ? bio_endio+0xe6/0x150
[470667.791426]  ? bio_chain_endio+0x2d/0x40
[470667.791426]  ? md_end_io+0x5d/0x70
[470667.791427]  ? bio_endio+0xe6/0x150
[470667.791427]  ? bio_chain_endio+0x2d/0x40
[470667.791427]  ? md_end_io+0x5d/0x70
[470667.791427]  ? bio_endio+0xe6/0x150
[470667.791427]  ? bio_chain_endio+0x2d/0x40
[470667.791427]  ? md_end_io+0x5d/0x70
[470667.791427]  ? bio_endio+0xe6/0x150
[470667.791427]  ? bio_chain_endio+0x2d/0x40
[470667.791428]  ? md_end_io+0x5d/0x70
[470667.791428]  ? bio_endio+0xe6/0x150
[470667.791428]  ? bio_chain_endio+0x2d/0x40
[470667.791428]  ? md_end_io+0x5d/0x70
[470667.791428]  ? bio_endio+0xe6/0x150
[470667.791428]  ? bio_chain_endio+0x2d/0x40
[470667.791428]  ? md_end_io+0x5d/0x70
[470667.791429]  ? bio_endio+0xe6/0x150
[470667.791429]  ? bio_chain_endio+0x2d/0x40
[470667.791429]  ? md_end_io+0x5d/0x70
[470667.791429]  ? bio_endio+0xe6/0x150
[470667.791429]  ? bio_chain_endio+0x2d/0x40
[470667.791429]  ? md_end_io+0x5d/0x70
[470667.791429]  ? bio_endio+0xe6/0x150
[470667.791429]  ? bio_chain_endio+0x2d/0x40
[470667.791430]  ? md_end_io+0x5d/0x70
[470667.791430]  ? bio_endio+0xe6/0x150
[470667.791430]  ? bio_chain_endio+0x2d/0x40
[470667.791430]  ? md_end_io+0x5d/0x70
[470667.791430]  ? bio_endio+0xe6/0x150
[470667.791430]  ? bio_chain_endio+0x2d/0x40
[470667.791430]  ? md_end_io+0x5d/0x70
[470667.791430]  ? bio_endio+0xe6/0x150
[470667.791431]  ? bio_chain

[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault

2021-06-16 Thread Jake Staehle
Hey so this is totally still happening on kernel 5.8.0-53. Just got this
serial console capture:


babylon login: [1457468.880947] BUG: stack guard page was hit at 
7aef1a4a (stack is af9c61cd..7ccda653)
[1457468.880948] kernel stack overflow (double-fault):  [#1] SMP NOPTI
[1457468.880948] CPU: 3 PID: 512 Comm: md0_raid6 Tainted: P   OE 
5.8.0-53-generic #60-Ubuntu
[1457468.880949] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS 
PRO/B550M AORUS PRO, BIOS F13h 04/23/2021
[1457468.880949] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[1457468.880950] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 
8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 
d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[1457468.880951] RSP: 0018:bcda805efff8 EFLAGS: 00010246
[1457468.880952] RAX: 9bfb8ccc42a0 RBX: 9bfcdb407800 RCX: 
0001
[1457468.880952] RDX: bcda805f0040 RSI: bcda805f0038 RDI: 
9bfcdb407800
[1457468.880953] RBP: bcda805f0028 R08: 0001 R09: 
90841600
[1457468.880953] R10: 9bfb8ccc4f40 R11: 0001 R12: 
bcda805f0038
[1457468.880953] R13: bcda805f0040 R14: 9bfb8ccc42a0 R15: 
9bf766967940
[1457468.880954] FS:  () GS:9bfcdeac() 
knlGS:
[1457468.880954] CS:  0010 DS:  ES:  CR0: 80050033
[1457468.880955] CR2: bcda805effe8 CR3: 0003a65ee000 CR4: 
00340ee0
[1457468.880955] Call Trace:
[1457468.880955]  ? mempool_kfree+0xe/0x10
[1457468.880956]  ? kfree+0xb8/0x220
[1457468.880956]  ? mempool_kfree+0xe/0x10
[1457468.880956]  ? mempool_free+0x2f/0x80
[1457468.880956]  ? md_end_io+0x4b/0x70
[1457468.880957]  ? bio_endio+0xe6/0x150
[1457468.880957]  ? bio_chain_endio+0x2d/0x40
[1457468.880957]  ? md_end_io+0x5d/0x70
[1457468.880958]  ? bio_endio+0xe6/0x150
[1457468.880958]  ? bio_chain_endio+0x2d/0x40
[1457468.880958]  ? md_end_io+0x5d/0x70
[1457468.880959]  ? bio_endio+0xe6/0x150
[1457468.880959]  ? bio_chain_endio+0x2d/0x40
[1457468.880959]  ? md_end_io+0x5d/0x70
[1457468.880959]  ? bio_endio+0xe6/0x150
[1457468.880960]  ? bio_chain_endio+0x2d/0x40
[1457468.880960]  ? md_end_io+0x5d/0x70
[1457468.880960]  ? bio_endio+0xe6/0x150
[1457468.880960]  ? bio_chain_endio+0x2d/0x40
[1457468.880961]  ? md_end_io+0x5d/0x70
[1457468.880961]  ? bio_endio+0xe6/0x150
[1457468.880961]  ? bio_chain_endio+0x2d/0x40
[1457468.880962]  ? md_end_io+0x5d/0x70
[1457468.880962]  ? bio_endio+0xe6/0x150
[1457468.880962]  ? bio_chain_endio+0x2d/0x40
[1457468.880962]  ? md_end_io+0x5d/0x70
[1457468.880963]  ? bio_endio+0xe6/0x150
[1457468.880963]  ? bio_chain_endio+0x2d/0x40
[1457468.880963]  ? md_end_io+0x5d/0x70
[1457468.880963]  ? bio_endio+0xe6/0x150
[1457468.880964]  ? bio_chain_endio+0x2d/0x40
[1457468.880964]  ? md_end_io+0x5d/0x70
[1457468.880964]  ? bio_endio+0xe6/0x150
[1457468.880965]  ? bio_chain_endio+0x2d/0x40
[1457468.880965]  ? md_end_io+0x5d/0x70
[1457468.880965]  ? bio_endio+0xe6/0x150
[1457468.880965]  ? bio_chain_endio+0x2d/0x40
[1457468.880966]  ? md_end_io+0x5d/0x70
[1457468.880966]  ? bio_endio+0xe6/0x150
[1457468.880966]  ? bio_chain_endio+0x2d/0x40
[1457468.880966]  ? md_end_io+0x5d/0x70
[1457468.880967]  ? bio_endio+0xe6/0x150
[1457468.880967]  ? bio_chain_endio+0x2d/0x40
[1457468.880967]  ? md_end_io+0x5d/0x70
[1457468.880968]  ? bio_endio+0xe6/0x150
[1457468.880968]  ? bio_chain_endio+0x2d/0x40
[1457468.880968]  ? md_end_io+0x5d/0x70
[1457468.880968]  ? bio_endio+0xe6/0x150
[1457468.880969]  ? bio_chain_endio+0x2d/0x40
[1457468.880969]  ? md_end_io+0x5d/0x70
[1457468.880969]  ? bio_endio+0xe6/0x150
[1457468.880969]  ? bio_chain_endio+0x2d/0x40
[1457468.880970]  ? md_end_io+0x5d/0x70
[1457468.880970]  ? bio_endio+0xe6/0x150
[1457468.880970]  ? bio_chain_endio+0x2d/0x40
[1457468.880971]  ? md_end_io+0x5d/0x70
[1457468.880971]  ? bio_endio+0xe6/0x150
[1457468.880971]  ? bio_chain_endio+0x2d/0x40
[1457468.880971]  ? md_end_io+0x5d/0x70
[1457468.880972]  ? bio_endio+0xe6/0x150
[1457468.880972]  ? bio_chain_endio+0x2d/0x40
[1457468.880972]  ? md_end_io+0x5d/0x70
[1457468.880972]  ? bio_endio+0xe6/0x150
[1457468.880973]  ? bio_chain_endio+0x2d/0x40
[1457468.880973]  ? md_end_io+0x5d/0x70
[1457468.880973]  ? bio_endio+0xe6/0x150
[1457468.880973]  ? bio_chain_endio+0x2d/0x40
[1457468.880974]  ? md_end_io+0x5d/0x70
[1457468.880974]  ? bio_endio+0xe6/0x150
[1457468.880974]  ? bio_chain_endio+0x2d/0x40
[1457468.880975]  ? md_end_io+0x5d/0x70
[1457468.880975]  ? bio_endio+0xe6/0x150
[1457468.880975]  ? bio_chain_endio+0x2d/0x40
[1457468.880975]  ? md_end_io+0x5d/0x70
[1457468.880976]  ? bio_endio+0xe6/0x150
[1457468.880976]  ? bio_chain_endio+0x2d/0x40
[1457468.880976]  ? md_end_io+0x5d/0x70
[1457468.880976]  ? bio_endio+0xe6/0x150
[1457468.880977]  ? bio_chain_endio+0x2d/0x40
[1457468.880977]  ? md_end_io+0x5d/0x70
[1457468.880977]  ? bio_endio+0xe6/0x150
[1457468.88097

[Bug 1929591] [NEW] MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault

2021-05-25 Thread Jake Staehle
Public bug reported:

Hello:
Every few days I get a kernel panic on my Ubuntu Server 20.10 box, which was 
recently upgraded to a Ryzen 3700X. I have 7 WD Red Pro HDDs in a RAID 6 array 
with Linux MD, and they're all attached to a LSI 9211-8ik PCIe card. 
Motherboard is currently a Gigabyte B550M Aorus Pro. My Ubuntu install is 
running the latest 5.8.0-53 kernel.

This is the 2nd hardware configuration with the exact same kernel panic text. 
Previously I had these HDDs directly connected to the SATA controller of a 
ASRock X570 Pro4 ATX mobo with the same 3700X. I was also previously using 
Ubuntu Server 20.04 LTS -- I had upgraded to 20.10 in hopes that the newer 
kernel would fix it, which it did not.
 
I had posted a whole story on StackOverflow about this journey if you're 
interested: 
https://superuser.com/questions/1615400/md-raid-6-periodic-kernel-panic-possible-kernel-bug
 

However, I am now convinced this is a Linux kernel bug in the MD driver.

Example 1 kernel panic:

[406005.583315] BUG: stack guard page was hit at 7cbff150 (stack is 
3b7072a2..dac5ed08)
[406005.583315] kernel stack overflow (double-fault):  [#1] SMP NOPTI
[406005.583315] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P   OE 
5.8.0-36-generic #40-Ubuntu
[406005.583316] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS 
PRO/B550M AORUS PRO, BIOS F1 05/19/2020
[406005.583316] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[406005.583316] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 
8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 
d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[406005.583316] RSP: 0018:a620c06e3ff8 EFLAGS: 00010246
[406005.583317] RAX: 9aaf36f54720 RBX: 9ab34b407800 RCX: 
0001
[406005.583317] RDX: a620c06e4040 RSI: a620c06e4038 RDI: 
9ab34b407800
[406005.583317] RBP: a620c06e4028 R08: 0001 R09: 
b9c54500
[406005.583318] R10: 9aaf36f54fe0 R11: 0001 R12: 
a620c06e4038
[406005.583318] R13: a620c06e4040 R14: 9aaf36f54720 R15: 
9ab2925cbd10
[406005.583318] FS:  () GS:9ab34edc() 
knlGS:
[406005.583318] CS:  0010 DS:  ES:  CR0: 80050033
[406005.583318] CR2: a620c06e3fe8 CR3: 0005d52ac000 CR4: 
00340ee0
[406005.583319] Call Trace:
[406005.583319]  ? mempool_kfree+0xe/0x10
[406005.583319]  ? kfree+0xb8/0x220
[406005.583319]  ? mempool_kfree+0xe/0x10
[406005.583319]  ? mempool_free+0x2f/0x80
[406005.583319]  ? md_end_io+0x4b/0x70
[406005.583319]  ? bio_endio+0xe6/0x150


Example 2 kernel panic with old mobo:

[161342.301305] BUG: stack guard page was hit at fc60f228 (stack is 
875efe77..3f38a379)
[161342.301306] kernel stack overflow (double-fault):  [#1] SMP NOPTI
[161342.301306] CPU: 10 PID: 465 Comm: md0_raid6 Tainted: P   OE 
5.8.0-33-generic #36-Ubuntu
[161342.301307] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./X570 Pro4, BIOS P3.60 12/01/2020
[161342.301307] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[161342.301308] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 
8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 
d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[161342.301308] RSP: 0018:a86b00c6fff8 EFLAGS: 00010246
[161342.301309] RAX: 98edc21cac40 RBX: 98ef0b407800 RCX: 
0001
[161342.301310] RDX: a86b00c70040 RSI: a86b00c70038 RDI: 
98ef0b407800
[161342.301310] RBP: a86b00c70028 R08: 0001 R09: 
85854500
[161342.301311] R10: 98edc21ca100 R11: 0001 R12: 
a86b00c70038
[161342.301311] R13: a86b00c70040 R14: 98edc21cac40 R15: 
98e9b53d74d8
[161342.301311] FS:  () GS:98ef0ec8() 
knlGS:
[161342.301312] CS:  0010 DS:  ES:  CR0: 80050033
[161342.301312] CR2: a86b00c6ffe8 CR3: 0007fa766000 CR4: 
00340ee0
[161342.301312] Call Trace:
[161342.301313]  ? mempool_kfree+0xe/0x10
[161342.301313]  ? kfree+0xb8/0x220
[161342.301313]  ? mempool_kfree+0xe/0x10
[161342.301313]  ? mempool_free+0x2f/0x80
[161342.301314]  ? md_end_io+0x4b/0x70
[161342.301314]  ? bio_endio+0xe6/0x150
[161342.301314]  ? bio_chain_endio+0x2d/0x40
[161342.301315]  ? md_end_io+0x5d/0x70
[161342.301315]  ? bio_endio+0xe6/0x150
[161342.301315]  ? bio_chain_endio+0x2d/0x40
[161342.301315]  ? md_end_io+0x5d/0x70
[161342.301316]  ? bio_endio+0xe6/0x150
[161342.301316]  ? bio_chain_endio+0x2d/0x40
[161342.301316]  ? md_end_io+0x5d/0x70
[161342.301316]  ? bio_endio+0xe6/0x150
[161342.301317]  ? bio_chain_endio+0x2d/0x40
[161342.301317]  ? md_end_io+0x5d/0x70
[161342.301317]  ? bio_endio+0xe6/0x150
[161342.301317]  ? bio_chain_endio+0x2d/0x40
...
[161342.301379]  ? md_end_io+0x5d/0x70
[161342.301379]  ? bio_endio+0xe6