Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang
Yup Tony, thats it. Beofre actually detecting this panic we uses to just "hang" with the last of the buffered output showing that Linux was processing the dt_* tree (IIRC), now at least the panic flushed out put and shows you why. -JX On Sep 14, 2006, at 3:11 AM, Tony Breeds wrote: On Tue, Sep 12, 2006 at 06:00:17PM +0200, Segher Boessenkool wrote: Sometimes when Xen is booted and we let Linux init the MPIC for "the second time" Xen could end up in a loop where the CPU is constantly being interrupted by the MPIC. Because of console buffering, the last message you see is some message from early kernel boot. Anyway.. we detect this now and you see a panic. There seems to be a problem with the U3/U4 MPIC, where edge- triggered interrupts are delivered to more than one CPU. Every CPU other than the one that ACKed it first, will get the spurious vector (so functionally, the impact of this bug isn't that bad; performance-wise it might be different). The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce console output for every spurious interrupt, you'll get a nice little storm. Is that what's happening? Yes, I believe, it has something to do with temperature. Interesting observation, never thought of investigating that -- it's in line with my suspicion that something in the MPIC is metastable though. I don't know if this is realted or something else but I just got: --- *** : Setup Done [boot]0015 Setup Done Built 1 zonelists. Total pages: 49152 Kernel command line: root=/dev/hda3 ro sysrq=1 Sharing PIC with Xen -> maple_init_IRQ <6>mpic: Setting up MPIC "U3-MPIC" version 1.2 at f804, max 4 CPUs mpic: ISU size: 124, shift: 7, mask: 7f mpic: Initializing for 124 sources mpic: Setting up HT PICs workarounds for U3/U4 mpic: - HT:01.0 [0xb8] vendor 1022 device 7450 has 4 irqs mpic: - HT:02.0 [0xb8] vendor 1022 device 7450 has 4 irqs mpic: - HT:03.0 [0xf0] vendor 1022 device 7460 has 24 irqs (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Too many (100) spurrious interrupts in a row (XEN) Known problem, please halt and let machine idle/cool then reboot (XEN) (XEN) (XEN) Reboot in five seconds... --- on my JS20, rebooting cleared the error. Yours Tony linux.conf.au http://linux.conf.au/ || http:// lca2007.linux.org.au/ Jan 15-20 2007 The Australian Linux Technical Conference! ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang
On Tue, Sep 12, 2006 at 06:00:17PM +0200, Segher Boessenkool wrote: > >Sometimes when Xen is booted and we let Linux init the MPIC for > >"the second time" Xen could end up in a loop where the CPU is > >constantly being interrupted by the MPIC. > > > >Because of console buffering, the last message you see is some > >message from early kernel boot. > >Anyway.. we detect this now and you see a panic. > > There seems to be a problem with the U3/U4 MPIC, where edge- > triggered interrupts are delivered to more than one CPU. Every > CPU other than the one that ACKed it first, will get the spurious > vector (so functionally, the impact of this bug isn't that bad; > performance-wise it might be different). > > The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce > console output for every spurious interrupt, you'll get a nice > little storm. Is that what's happening? > > >Yes, I believe, it has something to do with temperature. > > Interesting observation, never thought of investigating that -- > it's in line with my suspicion that something in the MPIC is > metastable though. I don't know if this is realted or something else but I just got: --- *** : Setup Done [boot]0015 Setup Done Built 1 zonelists. Total pages: 49152 Kernel command line: root=/dev/hda3 ro sysrq=1 Sharing PIC with Xen -> maple_init_IRQ <6>mpic: Setting up MPIC "U3-MPIC" version 1.2 at f804, max 4 CPUs mpic: ISU size: 124, shift: 7, mask: 7f mpic: Initializing for 124 sources mpic: Setting up HT PICs workarounds for U3/U4 mpic: - HT:01.0 [0xb8] vendor 1022 device 7450 has 4 irqs mpic: - HT:02.0 [0xb8] vendor 1022 device 7450 has 4 irqs mpic: - HT:03.0 [0xf0] vendor 1022 device 7460 has 24 irqs (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Too many (100) spurrious interrupts in a row (XEN) Known problem, please halt and let machine idle/cool then reboot (XEN) (XEN) (XEN) Reboot in five seconds... --- on my JS20, rebooting cleared the error. Yours Tony linux.conf.au http://linux.conf.au/ || http://lca2007.linux.org.au/ Jan 15-20 2007 The Australian Linux Technical Conference! ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang
Sometimes when Xen is booted and we let Linux init the MPIC for "the second time" Xen could end up in a loop where the CPU is constantly being interrupted by the MPIC. Because of console buffering, the last message you see is some message from early kernel boot. Anyway.. we detect this now and you see a panic. There seems to be a problem with the U3/U4 MPIC, where edge- triggered interrupts are delivered to more than one CPU. Every CPU other than the one that ACKed it first, will get the spurious vector (so functionally, the impact of this bug isn't that bad; performance-wise it might be different). The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce console output for every spurious interrupt, you'll get a nice little storm. Is that what's happening? Yes, I believe, it has something to do with temperature. Interesting observation, never thought of investigating that -- it's in line with my suspicion that something in the MPIC is metastable though. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang
Sometimes when Xen is booted and we let Linux init the MPIC for "the second time" Xen could end up in a loop where the CPU is constantly being interrupted by the MPIC. Because of console buffering, the last message you see is some message from early kernel boot. Anyway.. we detect this now and you see a panic. Yes, I believe, it has something to do with temperature. Hopefully when we move to a Xen only MPIc model this issue will be resolved. -JX On Sep 12, 2006, at 6:50 AM, Xen patchbot-xenppc-unstable wrote: # HG changeset patch # User Jimi Xenidis <[EMAIL PROTECTED]> # Node ID 5495e4525844250d5f359bd4d3bda8787e817711 # Parent a79b3252bbe46a13d91586081e7f6be278b07126 [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang When handing off the MPIC from Xen to Dom0, which is the current yet not permamnet design, the MPIC can cause the processor to assert an external interrupt when none is available. Rather then simply hang in this condition we now panic so the user can see that there is indeed a problem and identify it as this one. This condition seems to be related to temperature and the probablity of it occuring decreases if the machine is allowed to stay idle (not in the Xen panic loop) for a minute or two. Signed-off-by: Jimi Xenidis <[EMAIL PROTECTED]> --- xen/arch/powerpc/external.c |9 + 1 files changed, 9 insertions(+) diff -r a79b3252bbe4 -r 5495e4525844 xen/arch/powerpc/external.c --- a/xen/arch/powerpc/external.c Fri Sep 08 12:37:27 2006 -0500 +++ b/xen/arch/powerpc/external.c Tue Sep 12 06:47:22 2006 -0400 @@ -75,6 +75,7 @@ void do_external(struct cpu_user_regs *r void do_external(struct cpu_user_regs *regs) { int vec; +static unsigned spur_count; BUG_ON(!(regs->msr & MSR_EE)); BUG_ON(mfmsr() & MSR_EE); @@ -87,6 +88,14 @@ void do_external(struct cpu_user_regs *r do_IRQ(regs); BUG_ON(mfmsr() & MSR_EE); +spur_count = 0; +} else { +++spur_count; +if (spur_count > 100) +panic("Too many (%d) spurrious interrupts in a row\n" + " Known problem, please halt and let machine idle/cool " + " then reboot\n", + 100); } } ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel