[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2022-07-29 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 Status|NEW |NEEDINFO
 CC||mich...@ellerman.id.au

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2022-07-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #19 from Erhard F. (erhar...@mailbox.org) ---
(Luckily) I am no longer able to reproduce this. Re-tested on 5.19-rc5.

Perhaps the problem was also specific for this specific NVMe SSD. I swapped it
for another one and now I have not seen this issue so far.

I'll keep an eye on it and will close here if it stays like that for the next
few stable kernels.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-08-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #18 from Erhard F. (erhar...@mailbox.org) ---
The 'hackfix for MSI init' patch also applies on top of v5.14-rc6.

But unchanged the G5 runs later into bug #213837.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-08-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #297439|0   |1
is obsolete||

--- Comment #17 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 298373
  --> https://bugzilla.kernel.org/attachment.cgi?id=298373=edit
kernel .config (5.14-rc6, PowerMac G5 11,2)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-08-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #297473|0   |1
is obsolete||

--- Comment #16 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 298371
  --> https://bugzilla.kernel.org/attachment.cgi?id=298371=edit
dmesg (5.14-rc6, PowerMac G5 11,2)

As there is a fix now for bug #213803 I was able to build v5.14-rc6 and gave it
a testride. Looks like the issue persists:

[...]
irq 63: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 10732 Comm: emerge Tainted: GW
5.14.0-rc6-PowerMacG5+ #2
Call Trace:
[cfff7af0] [c054de24] .dump_stack_lvl+0x98/0xe0 (unreliable)
[cfff7b80] [c00e1724] .__report_bad_irq+0x34/0xf0
[cfff7c20] [c00e160c] .note_interrupt+0x258/0x300
[cfff7ce0] [c00dd840] .handle_irq_event_percpu+0x5c/0x88
[cfff7d70] [c00dd8b0] .handle_irq_event+0x44/0x70
[cfff7e00] [c00e2d34] .handle_fasteoi_irq+0xac/0x158
[cfff7ea0] [c00dc8bc] .handle_irq_desc+0x34/0x54
[cfff7f10] [c0012058] .__do_irq+0x15c/0x238
[cfff7f90] [c0012978] .__do_IRQ+0xac/0xb4
[c0001e9cfcf0] [c0001e9cfd90] 0xc0001e9cfd90
[c0001e9cfd90] [c0012ac4] .do_IRQ+0x144/0x194
[c0001e9cfe10] [c0008050]
hardware_interrupt_common_virt+0x210/0x220
--- interrupt: 500 at 0x3fffb9b25d9c
NIP:  3fffb9b25d9c LR: 3fffb9b2811c CTR: 3fffb9b25d9c
REGS: c0001e9cfe80 TRAP: 0500   Tainted: GW 
(5.14.0-rc6-PowerMacG5+)
MSR:  9000f032   CR: 22482822  XER:
2000
IRQMASK: 0 
GPR00: 3fffb9b28100 3d4e7550 3fffb9ef6200 3fffb7977790 
GPR04: 3fffb7977790 3fffb55e8b80  3fffb9eccac0 
GPR08: 3fffb9b25d9c  000f  
GPR12: 3fffb9b7eeb0 3fffb9fc8890 3d4e7658 3fffb395c548 
GPR16: 3d4e7670  3fffb7902480  
GPR20:  3fffb395c528 00014b8f7878  
GPR24: 3fffb7969a80 00014b8f7830 3fffb7a750d0 000a 
GPR28: 3fffb7a750dc 007c 00014b8f9420 3fffb395c3c0 
NIP [3fffb9b25d9c] 0x3fffb9b25d9c
LR [3fffb9b2811c] 0x3fffb9b2811c
--- interrupt: 500
handlers:
[] .nvme_irq
[] .nvme_irq
Disabling IRQ #63

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-07-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #15 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Oliver O'Halloran from comment #13)
> In the meanwhile, can you try the patch above? That seems to fix bug which
> is causing MSIs to be unusable. I'm not 100% sure why that woudld matter,
> but it's possible the crashes are due to some other bug which doesn't appear
> when MSIs are in use.
Now I had time to test your patch on top of kernel 5.13-rc6 and 5.13.4. Can't
test it on top of 5.14-rc2 due to bug #213803.

Your patch seems to work fine and I don't get this "irq 63: nobody cared"
messages and crashes any longer! However now when building stuff the G5 sooner
or later crashes with:

[...]
Kernel panic - not syncing: corrupted stack end detected inside scheduler
Call Trace:
CPU: 1 PID: 2968 Comm: powerpc64-unkno Tainted: GW
5.13.0-rc6-PowerMacG5+ #2
[c000717178c0] [c05412d0] .dump_stack+0xe0/0x13c (unreliable)
[c00071717960] [c00681a0] .panic+0x168/0x430
[c00071717a10] [c0809ca0] .__schedule+0x80/0x840
[c00071717af0] [c00a0ea8] .do_task_dead+0x54/0x58
[c00071717b70] [c006e7b4] .do_exit+0xa14/0xa6c
[c00071717c60] [c006e89c] .do_group_exit+0x50/0xb0
[c00071717cf0] [c006e910] .__wake_up_parent+0x0/0x34
[c00071717d60] [c0021530] .system_call_exception+0x1b4/0x1ec
[c00071717e10] [c000b9c4] system_call_common+0xe4/0x214
--- interrupt: c00 at 0x3fffa8092aa8
NIP:  3fffa8092aa8 LR: 3fffa7ff2d04 CTR: 
REGS: c00071717e80 TRAP: 0c00   Tainted: GW 
(5.13.0-rc6-PowerMacG5+)
MSR:  9200f032   CR: 22000482  XER:

IRQMASK: 0 
GPR00: 00ea 3fffd04ef2a0 3fffa81b1300  
GPR04:     
GPR08:     
GPR12:  3fffa8318c30 00012e5ff800 0001136b53b0 
GPR16: 0001200cec38 3fffddea1c68 0001200ceb28 002f 
GPR20:  3fffa81abff8 0001 3fffa81aaa58 
GPR24:   0003 0001 
GPR28:  3fffa8311c50 f000  
NIP [3fffa8092aa8] 0x3fffa8092aa8
LR [3fffa7ff2d04] 0x3fffa7ff2d04
--- interrupt: c00
Rebooting in 120 seconds..


Don't know whether this is related. I'll throw more debugging stuff in,  file
this as a seperate issue and link it here just in case.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-07-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #14 from Erhard F. (erhar...@mailbox.org) ---
Thanks for the patch! I will try it as soon as I get to this G5 again.

Don't know whether write access is necessary to trigger the bug. The past
weekend I've seen it only by doing an 'emerge -pv distcc' on its' Gentoo
partition, which only shows the flags and version distcc is going to be
installed, but does not build anything yet. Still the bug was triggered.
Filesystem was ext4, but I've seen it on btrfs at other times. Running kernel
5.10.x LTS for the time being which works just fine.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-07-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #13 from Oliver O'Halloran (ooh...@gmail.com) ---
Hi,

I got a loaner G5 with an NVMe drive, but I haven't been able to replicate the
crash you're seeing. However, I think that's probably because I'm only reading
from the NVMe since it's NTFS formatted and I didn't want to trash someone
else's files. I'm waiting for a new NVMe drive to arrive so I can do some
destructive testing which should hopefully replicate the bug.

In the meanwhile, can you try the patch above? That seems to fix bug which is
causing MSIs to be unusable. I'm not 100% sure why that woudld matter, but it's
possible the crashes are due to some other bug which doesn't appear when MSIs
are in use.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-07-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #12 from Oliver O'Halloran (ooh...@gmail.com) ---
Created attachment 297755
  --> https://bugzilla.kernel.org/attachment.cgi?id=297755=edit
hackfix for MSI init

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #11 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297473
  --> https://bugzilla.kernel.org/attachment.cgi?id=297473=edit
dmesg (5.13-rc6 + DEBUG_VM_PGTABLE, PowerMac G5 11,2)

The trace got some additional data with DEBUG_VM_PGTABLE=y, slub_debug=P and
page_poison=1:

[...]
irq 63: nobody cared (try booting with the "irqpoll" option)
Call Trace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW
5.13.0-rc6-PowerMacG5+ #2
[cfff7ae0] [c054eafc] .dump_stack+0xe0/0x13c (unreliable)
[cfff7b80] [c00e1428] .__report_bad_irq+0x34/0xf0
[cfff7c20] [c00e1310] .note_interrupt+0x258/0x300
[cfff7ce0] [c00dd58c] .handle_irq_event_percpu+0x64/0x90
[cfff7d70] [c00dd5fc] .handle_irq_event+0x44/0x70
[cfff7e00] [c00e2a14] .handle_fasteoi_irq+0xac/0x158
[cfff7ea0] [c00dc648] .generic_handle_irq+0x38/0x58
[cfff7f10] [c0011688] .__do_irq+0x15c/0x238
[cfff7f90] [c001207c] .do_IRQ+0x180/0x188
[c12db810] [c0011f9c] .do_IRQ+0xa0/0x188
[c12db8b0] [c0007f94]
hardware_interrupt_common_virt+0x1a4/0x1b0
--- interrupt: 500 at .power4_idle_nap+0x30/0x34
NIP:  c002cc04 LR: c0016828 CTR: c0016768
REGS: c12db920 TRAP: 0500   Tainted: GW 
(5.13.0-rc6-PowerMacG5+)
MSR:  90009032   CR: 44082242  XER: 
IRQMASK: 0 
GPR00: c00167dc c12dbbc0 c12df700 0001 
GPR04:   0002 90049032 
GPR08: 0001 c11b3b80 0001 0016 
GPR12: 44082242 c23a6000 0014aa88 ffb30100 
GPR16: 01e7b8da 01e7bd5f 01e7b9f0 01e88d8d 
GPR20: 01e7bd3d 01e7b98b 01e7bbb2 01e7b89c 
GPR24: 0270f700 c1081008 c0a7c02d  
GPR28: c12edb9c c11b3b80 90009032 c12ed985 
NIP [c002cc04] .power4_idle_nap+0x30/0x34
LR [c0016828] .power4_idle+0xc0/0xe8
--- interrupt: 500
[c12dbbc0] [c00167dc] .power4_idle+0x74/0xe8 (unreliable)
handlers:
[c12dbc40] [c001665c] .arch_cpu_idle+0x80/0x18c
[c12dbcc0] [c081f058] .default_idle_call+0x7c/0xd0
[c12dbd30] [c00a7bcc] .do_idle+0x128/0x140
[c12dbdd0] [c00a7eb4] .cpu_startup_entry+0x28/0x2c
[c12dbe40] [c0010044] .rest_init+0x1b0/0x1bc
[c12dbec0] [c10047f4] .start_kernel+0x934/0x9b8
[c12dbf90] [c000b390] start_here_common+0x1c/0x8c
[<1553d54b>] .nvme_irq
[<1553d54b>] .nvme_irq
Disabling IRQ #63

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #296759|0   |1
is obsolete||
 Attachment #296761|0   |1
is obsolete||

--- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297439
  --> https://bugzilla.kernel.org/attachment.cgi?id=297439=edit
kernel .config (5.13-rc6, PowerMac G5 11,2)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #9 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297437
  --> https://bugzilla.kernel.org/attachment.cgi?id=297437=edit
dmesg (5.13-rc6 w. patch fbbefb3 reverted + debug, PowerMac G5 11,2)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #8 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297435
  --> https://bugzilla.kernel.org/attachment.cgi?id=297435=edit
dmesg (5.13-rc6 + debug, PowerMac G5 11,2)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #7 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Oliver O'Halloran from comment #5)
> Could you add "debug" to the kernel command line and post the dmesg output
> for a boot with the patch applied and reverted?
Ok, on top of 5.13-rc6 I reverted fbbefb3, which went fine execpt the
"pci-ioda.c"-part where I needed to manually apple the old code.

Here's the vanilla debug dmesg and the debug dmesg with the patch reverted.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
This is already a custom built kernel with lots of debugging options turned on
(see bugzilla attached kernel .config). But of course I can add "debug" to the
other kernel command line parameters.

I'll report back when I get access to this G5 next time in about 2-3 weeks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #5 from Oliver O'Halloran (ooh...@gmail.com) ---
Hmm, it's pretty weird to see an NVMe drive using LSIs. Not too sure what to
make of that. I figure there's something screwy going on with interrupt
routing, but I don't have any g5 hardware to replicate this with.

Could you add "debug" to the kernel command line and post the dmesg output for
a boot with the patch applied and reverted?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #4 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297191
  --> https://bugzilla.kernel.org/attachment.cgi?id=297191=edit
bisect.log

Turns out the problem was introduced between v5.11 and v5.12 by following
commit:

 # git bisect good
fbbefb320214db14c3e740fce98e2c95c9d0669b is the first bad commit
commit fbbefb320214db14c3e740fce98e2c95c9d0669b
Author: Oliver O'Halloran 
Date:   Tue Nov 3 15:35:07 2020 +1100

powerpc/pci: Move PHB discovery for PCI_DN using platforms

Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs.
These platforms need to be done together because they all depend on
pci_dn's being created from the DT. The pci_dn contains a pointer to
the relevant pci_controller so they need to be created after the
pci_controller structures are available, but before PCI devices are
scanned. Currently this ordering is provided by initcalls and the
sequence is:

  1. PHBs are discovered (setup_arch) (early boot, pre-initcalls)
  2. pci_dn are created from the unflattended DT (core initcall)
  3. PHBs are scanned pcibios_init() (subsys initcall)

The new ppc_md.discover_phbs() function is also a core_initcall so we
can't guarantee ordering between the creation of pci_controllers and
the creation of pci_dn's which require a pci_controller. We could use
the postcore, or core_sync initcall levels, but it's cleaner to just
move the pci_dn setup into the per-PHB inits which occur inside of
.discover_phb() for these platforms. This brings the boot-time path in
line with the PHB hotplug path that is used for pseries DLPAR
operations too.

Signed-off-by: Oliver O'Halloran 
[mpe: Squash powermac & maple in to avoid breakage those platforms,
  convert memblock allocs to use kmalloc to avoid warnings]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20201103043523.916109-2-ooh...@gmail.com

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.