[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 Michael Ellerman (mich...@ellerman.id.au) changed: What|Removed |Added Status|NEW |NEEDINFO CC||mich...@ellerman.id.au -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #19 from Erhard F. (erhar...@mailbox.org) --- (Luckily) I am no longer able to reproduce this. Re-tested on 5.19-rc5. Perhaps the problem was also specific for this specific NVMe SSD. I swapped it for another one and now I have not seen this issue so far. I'll keep an eye on it and will close here if it stays like that for the next few stable kernels. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #18 from Erhard F. (erhar...@mailbox.org) --- The 'hackfix for MSI init' patch also applies on top of v5.14-rc6. But unchanged the G5 runs later into bug #213837. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #297439|0 |1 is obsolete|| --- Comment #17 from Erhard F. (erhar...@mailbox.org) --- Created attachment 298373 --> https://bugzilla.kernel.org/attachment.cgi?id=298373=edit kernel .config (5.14-rc6, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #297473|0 |1 is obsolete|| --- Comment #16 from Erhard F. (erhar...@mailbox.org) --- Created attachment 298371 --> https://bugzilla.kernel.org/attachment.cgi?id=298371=edit dmesg (5.14-rc6, PowerMac G5 11,2) As there is a fix now for bug #213803 I was able to build v5.14-rc6 and gave it a testride. Looks like the issue persists: [...] irq 63: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 10732 Comm: emerge Tainted: GW 5.14.0-rc6-PowerMacG5+ #2 Call Trace: [cfff7af0] [c054de24] .dump_stack_lvl+0x98/0xe0 (unreliable) [cfff7b80] [c00e1724] .__report_bad_irq+0x34/0xf0 [cfff7c20] [c00e160c] .note_interrupt+0x258/0x300 [cfff7ce0] [c00dd840] .handle_irq_event_percpu+0x5c/0x88 [cfff7d70] [c00dd8b0] .handle_irq_event+0x44/0x70 [cfff7e00] [c00e2d34] .handle_fasteoi_irq+0xac/0x158 [cfff7ea0] [c00dc8bc] .handle_irq_desc+0x34/0x54 [cfff7f10] [c0012058] .__do_irq+0x15c/0x238 [cfff7f90] [c0012978] .__do_IRQ+0xac/0xb4 [c0001e9cfcf0] [c0001e9cfd90] 0xc0001e9cfd90 [c0001e9cfd90] [c0012ac4] .do_IRQ+0x144/0x194 [c0001e9cfe10] [c0008050] hardware_interrupt_common_virt+0x210/0x220 --- interrupt: 500 at 0x3fffb9b25d9c NIP: 3fffb9b25d9c LR: 3fffb9b2811c CTR: 3fffb9b25d9c REGS: c0001e9cfe80 TRAP: 0500 Tainted: GW (5.14.0-rc6-PowerMacG5+) MSR: 9000f032 CR: 22482822 XER: 2000 IRQMASK: 0 GPR00: 3fffb9b28100 3d4e7550 3fffb9ef6200 3fffb7977790 GPR04: 3fffb7977790 3fffb55e8b80 3fffb9eccac0 GPR08: 3fffb9b25d9c 000f GPR12: 3fffb9b7eeb0 3fffb9fc8890 3d4e7658 3fffb395c548 GPR16: 3d4e7670 3fffb7902480 GPR20: 3fffb395c528 00014b8f7878 GPR24: 3fffb7969a80 00014b8f7830 3fffb7a750d0 000a GPR28: 3fffb7a750dc 007c 00014b8f9420 3fffb395c3c0 NIP [3fffb9b25d9c] 0x3fffb9b25d9c LR [3fffb9b2811c] 0x3fffb9b2811c --- interrupt: 500 handlers: [] .nvme_irq [] .nvme_irq Disabling IRQ #63 -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #15 from Erhard F. (erhar...@mailbox.org) --- (In reply to Oliver O'Halloran from comment #13) > In the meanwhile, can you try the patch above? That seems to fix bug which > is causing MSIs to be unusable. I'm not 100% sure why that woudld matter, > but it's possible the crashes are due to some other bug which doesn't appear > when MSIs are in use. Now I had time to test your patch on top of kernel 5.13-rc6 and 5.13.4. Can't test it on top of 5.14-rc2 due to bug #213803. Your patch seems to work fine and I don't get this "irq 63: nobody cared" messages and crashes any longer! However now when building stuff the G5 sooner or later crashes with: [...] Kernel panic - not syncing: corrupted stack end detected inside scheduler Call Trace: CPU: 1 PID: 2968 Comm: powerpc64-unkno Tainted: GW 5.13.0-rc6-PowerMacG5+ #2 [c000717178c0] [c05412d0] .dump_stack+0xe0/0x13c (unreliable) [c00071717960] [c00681a0] .panic+0x168/0x430 [c00071717a10] [c0809ca0] .__schedule+0x80/0x840 [c00071717af0] [c00a0ea8] .do_task_dead+0x54/0x58 [c00071717b70] [c006e7b4] .do_exit+0xa14/0xa6c [c00071717c60] [c006e89c] .do_group_exit+0x50/0xb0 [c00071717cf0] [c006e910] .__wake_up_parent+0x0/0x34 [c00071717d60] [c0021530] .system_call_exception+0x1b4/0x1ec [c00071717e10] [c000b9c4] system_call_common+0xe4/0x214 --- interrupt: c00 at 0x3fffa8092aa8 NIP: 3fffa8092aa8 LR: 3fffa7ff2d04 CTR: REGS: c00071717e80 TRAP: 0c00 Tainted: GW (5.13.0-rc6-PowerMacG5+) MSR: 9200f032 CR: 22000482 XER: IRQMASK: 0 GPR00: 00ea 3fffd04ef2a0 3fffa81b1300 GPR04: GPR08: GPR12: 3fffa8318c30 00012e5ff800 0001136b53b0 GPR16: 0001200cec38 3fffddea1c68 0001200ceb28 002f GPR20: 3fffa81abff8 0001 3fffa81aaa58 GPR24: 0003 0001 GPR28: 3fffa8311c50 f000 NIP [3fffa8092aa8] 0x3fffa8092aa8 LR [3fffa7ff2d04] 0x3fffa7ff2d04 --- interrupt: c00 Rebooting in 120 seconds.. Don't know whether this is related. I'll throw more debugging stuff in, file this as a seperate issue and link it here just in case. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #14 from Erhard F. (erhar...@mailbox.org) --- Thanks for the patch! I will try it as soon as I get to this G5 again. Don't know whether write access is necessary to trigger the bug. The past weekend I've seen it only by doing an 'emerge -pv distcc' on its' Gentoo partition, which only shows the flags and version distcc is going to be installed, but does not build anything yet. Still the bug was triggered. Filesystem was ext4, but I've seen it on btrfs at other times. Running kernel 5.10.x LTS for the time being which works just fine. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #13 from Oliver O'Halloran (ooh...@gmail.com) --- Hi, I got a loaner G5 with an NVMe drive, but I haven't been able to replicate the crash you're seeing. However, I think that's probably because I'm only reading from the NVMe since it's NTFS formatted and I didn't want to trash someone else's files. I'm waiting for a new NVMe drive to arrive so I can do some destructive testing which should hopefully replicate the bug. In the meanwhile, can you try the patch above? That seems to fix bug which is causing MSIs to be unusable. I'm not 100% sure why that woudld matter, but it's possible the crashes are due to some other bug which doesn't appear when MSIs are in use. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #12 from Oliver O'Halloran (ooh...@gmail.com) --- Created attachment 297755 --> https://bugzilla.kernel.org/attachment.cgi?id=297755=edit hackfix for MSI init -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #11 from Erhard F. (erhar...@mailbox.org) --- Created attachment 297473 --> https://bugzilla.kernel.org/attachment.cgi?id=297473=edit dmesg (5.13-rc6 + DEBUG_VM_PGTABLE, PowerMac G5 11,2) The trace got some additional data with DEBUG_VM_PGTABLE=y, slub_debug=P and page_poison=1: [...] irq 63: nobody cared (try booting with the "irqpoll" option) Call Trace: CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 5.13.0-rc6-PowerMacG5+ #2 [cfff7ae0] [c054eafc] .dump_stack+0xe0/0x13c (unreliable) [cfff7b80] [c00e1428] .__report_bad_irq+0x34/0xf0 [cfff7c20] [c00e1310] .note_interrupt+0x258/0x300 [cfff7ce0] [c00dd58c] .handle_irq_event_percpu+0x64/0x90 [cfff7d70] [c00dd5fc] .handle_irq_event+0x44/0x70 [cfff7e00] [c00e2a14] .handle_fasteoi_irq+0xac/0x158 [cfff7ea0] [c00dc648] .generic_handle_irq+0x38/0x58 [cfff7f10] [c0011688] .__do_irq+0x15c/0x238 [cfff7f90] [c001207c] .do_IRQ+0x180/0x188 [c12db810] [c0011f9c] .do_IRQ+0xa0/0x188 [c12db8b0] [c0007f94] hardware_interrupt_common_virt+0x1a4/0x1b0 --- interrupt: 500 at .power4_idle_nap+0x30/0x34 NIP: c002cc04 LR: c0016828 CTR: c0016768 REGS: c12db920 TRAP: 0500 Tainted: GW (5.13.0-rc6-PowerMacG5+) MSR: 90009032 CR: 44082242 XER: IRQMASK: 0 GPR00: c00167dc c12dbbc0 c12df700 0001 GPR04: 0002 90049032 GPR08: 0001 c11b3b80 0001 0016 GPR12: 44082242 c23a6000 0014aa88 ffb30100 GPR16: 01e7b8da 01e7bd5f 01e7b9f0 01e88d8d GPR20: 01e7bd3d 01e7b98b 01e7bbb2 01e7b89c GPR24: 0270f700 c1081008 c0a7c02d GPR28: c12edb9c c11b3b80 90009032 c12ed985 NIP [c002cc04] .power4_idle_nap+0x30/0x34 LR [c0016828] .power4_idle+0xc0/0xe8 --- interrupt: 500 [c12dbbc0] [c00167dc] .power4_idle+0x74/0xe8 (unreliable) handlers: [c12dbc40] [c001665c] .arch_cpu_idle+0x80/0x18c [c12dbcc0] [c081f058] .default_idle_call+0x7c/0xd0 [c12dbd30] [c00a7bcc] .do_idle+0x128/0x140 [c12dbdd0] [c00a7eb4] .cpu_startup_entry+0x28/0x2c [c12dbe40] [c0010044] .rest_init+0x1b0/0x1bc [c12dbec0] [c10047f4] .start_kernel+0x934/0x9b8 [c12dbf90] [c000b390] start_here_common+0x1c/0x8c [<1553d54b>] .nvme_irq [<1553d54b>] .nvme_irq Disabling IRQ #63 -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #296759|0 |1 is obsolete|| Attachment #296761|0 |1 is obsolete|| --- Comment #10 from Erhard F. (erhar...@mailbox.org) --- Created attachment 297439 --> https://bugzilla.kernel.org/attachment.cgi?id=297439=edit kernel .config (5.13-rc6, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #9 from Erhard F. (erhar...@mailbox.org) --- Created attachment 297437 --> https://bugzilla.kernel.org/attachment.cgi?id=297437=edit dmesg (5.13-rc6 w. patch fbbefb3 reverted + debug, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #8 from Erhard F. (erhar...@mailbox.org) --- Created attachment 297435 --> https://bugzilla.kernel.org/attachment.cgi?id=297435=edit dmesg (5.13-rc6 + debug, PowerMac G5 11,2) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #7 from Erhard F. (erhar...@mailbox.org) --- (In reply to Oliver O'Halloran from comment #5) > Could you add "debug" to the kernel command line and post the dmesg output > for a boot with the patch applied and reverted? Ok, on top of 5.13-rc6 I reverted fbbefb3, which went fine execpt the "pci-ioda.c"-part where I needed to manually apple the old code. Here's the vanilla debug dmesg and the debug dmesg with the patch reverted. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #6 from Erhard F. (erhar...@mailbox.org) --- This is already a custom built kernel with lots of debugging options turned on (see bugzilla attached kernel .config). But of course I can add "debug" to the other kernel command line parameters. I'll report back when I get access to this G5 next time in about 2-3 weeks. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #5 from Oliver O'Halloran (ooh...@gmail.com) --- Hmm, it's pretty weird to see an NVMe drive using LSIs. Not too sure what to make of that. I figure there's something screwy going on with interrupt routing, but I don't have any g5 hardware to replicate this with. Could you add "debug" to the kernel command line and post the dmesg output for a boot with the patch applied and reverted? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3
https://bugzilla.kernel.org/show_bug.cgi?id=213079 --- Comment #4 from Erhard F. (erhar...@mailbox.org) --- Created attachment 297191 --> https://bugzilla.kernel.org/attachment.cgi?id=297191=edit bisect.log Turns out the problem was introduced between v5.11 and v5.12 by following commit: # git bisect good fbbefb320214db14c3e740fce98e2c95c9d0669b is the first bad commit commit fbbefb320214db14c3e740fce98e2c95c9d0669b Author: Oliver O'Halloran Date: Tue Nov 3 15:35:07 2020 +1100 powerpc/pci: Move PHB discovery for PCI_DN using platforms Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs. These platforms need to be done together because they all depend on pci_dn's being created from the DT. The pci_dn contains a pointer to the relevant pci_controller so they need to be created after the pci_controller structures are available, but before PCI devices are scanned. Currently this ordering is provided by initcalls and the sequence is: 1. PHBs are discovered (setup_arch) (early boot, pre-initcalls) 2. pci_dn are created from the unflattended DT (core initcall) 3. PHBs are scanned pcibios_init() (subsys initcall) The new ppc_md.discover_phbs() function is also a core_initcall so we can't guarantee ordering between the creation of pci_controllers and the creation of pci_dn's which require a pci_controller. We could use the postcore, or core_sync initcall levels, but it's cleaner to just move the pci_dn setup into the per-PHB inits which occur inside of .discover_phb() for these platforms. This brings the boot-time path in line with the PHB hotplug path that is used for pseries DLPAR operations too. Signed-off-by: Oliver O'Halloran [mpe: Squash powermac & maple in to avoid breakage those platforms, convert memblock allocs to use kmalloc to avoid warnings] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201103043523.916109-2-ooh...@gmail.com -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.