Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang

2006-09-14 Thread Jimi Xenidis

Yup Tony, thats it.
Beofre actually detecting this panic we uses to just "hang" with the  
last of the buffered output showing that Linux was processing the  
dt_* tree (IIRC), now at least the panic flushed out put and shows  
you why.

-JX
On Sep 14, 2006, at 3:11 AM, Tony Breeds wrote:


On Tue, Sep 12, 2006 at 06:00:17PM +0200, Segher Boessenkool wrote:

Sometimes when Xen is booted and we let Linux init the MPIC for
"the second time" Xen could end up in a loop where the CPU is
constantly being interrupted by the MPIC.

Because of console buffering, the last message you see is some
message from early kernel boot.
Anyway.. we detect this now and you see a panic.


There seems to be a problem with the U3/U4 MPIC, where edge-
triggered interrupts are delivered to more than one CPU.  Every
CPU other than the one that ACKed it first, will get the spurious
vector (so functionally, the impact of this bug isn't that bad;
performance-wise it might be different).

The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce
console output for every spurious interrupt, you'll get a nice
little storm.  Is that what's happening?


Yes, I believe, it has something to do with temperature.


Interesting observation, never thought of investigating that --
it's in line with my suspicion that something in the MPIC is
metastable though.


I don't know if this is realted or something else but I just got:
---
***  : Setup Done
[boot]0015 Setup Done
Built 1 zonelists.  Total pages: 49152
Kernel command line: root=/dev/hda3 ro sysrq=1
Sharing PIC with Xen -> maple_init_IRQ
<6>mpic: Setting up MPIC "U3-MPIC" version 1.2 at f804, max 4 CPUs
mpic: ISU size: 124, shift: 7, mask: 7f
mpic: Initializing for 124 sources
mpic: Setting up HT PICs workarounds for U3/U4
mpic:   - HT:01.0 [0xb8] vendor 1022 device 7450 has 4 irqs
mpic:   - HT:02.0 [0xb8] vendor 1022 device 7450 has 4 irqs
mpic:   - HT:03.0 [0xf0] vendor 1022 device 7460 has 24 irqs
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Too many (100) spurrious interrupts in a row
(XEN)   Known problem, please halt and let machine idle/cool   then  
reboot

(XEN) 
(XEN)
(XEN) Reboot in five seconds...
---

on my JS20, rebooting cleared the error.

Yours Tony

   linux.conf.au   http://linux.conf.au/ || http:// 
lca2007.linux.org.au/

   Jan 15-20 2007  The Australian Linux Technical Conference!




___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang

2006-09-14 Thread Tony Breeds
On Tue, Sep 12, 2006 at 06:00:17PM +0200, Segher Boessenkool wrote:
> >Sometimes when Xen is booted and we let Linux init the MPIC for  
> >"the second time" Xen could end up in a loop where the CPU is  
> >constantly being interrupted by the MPIC.
> >
> >Because of console buffering, the last message you see is some  
> >message from early kernel boot.
> >Anyway.. we detect this now and you see a panic.
> 
> There seems to be a problem with the U3/U4 MPIC, where edge-
> triggered interrupts are delivered to more than one CPU.  Every
> CPU other than the one that ACKed it first, will get the spurious
> vector (so functionally, the impact of this bug isn't that bad;
> performance-wise it might be different).
> 
> The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce
> console output for every spurious interrupt, you'll get a nice
> little storm.  Is that what's happening?
> 
> >Yes, I believe, it has something to do with temperature.
> 
> Interesting observation, never thought of investigating that --
> it's in line with my suspicion that something in the MPIC is
> metastable though.

I don't know if this is realted or something else but I just got:
---
***  : Setup Done
[boot]0015 Setup Done
Built 1 zonelists.  Total pages: 49152
Kernel command line: root=/dev/hda3 ro sysrq=1
Sharing PIC with Xen -> maple_init_IRQ
<6>mpic: Setting up MPIC "U3-MPIC" version 1.2 at f804, max 4 CPUs
mpic: ISU size: 124, shift: 7, mask: 7f
mpic: Initializing for 124 sources
mpic: Setting up HT PICs workarounds for U3/U4
mpic:   - HT:01.0 [0xb8] vendor 1022 device 7450 has 4 irqs
mpic:   - HT:02.0 [0xb8] vendor 1022 device 7450 has 4 irqs
mpic:   - HT:03.0 [0xf0] vendor 1022 device 7460 has 24 irqs
(XEN) 
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Too many (100) spurrious interrupts in a row
(XEN)   Known problem, please halt and let machine idle/cool   then reboot
(XEN) 
(XEN) 
(XEN) Reboot in five seconds...
---

on my JS20, rebooting cleared the error.

Yours Tony

   linux.conf.au   http://linux.conf.au/ || http://lca2007.linux.org.au/
   Jan 15-20 2007  The Australian Linux Technical Conference!


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang

2006-09-12 Thread Segher Boessenkool
Sometimes when Xen is booted and we let Linux init the MPIC for  
"the second time" Xen could end up in a loop where the CPU is  
constantly being interrupted by the MPIC.


Because of console buffering, the last message you see is some  
message from early kernel boot.

Anyway.. we detect this now and you see a panic.


There seems to be a problem with the U3/U4 MPIC, where edge-
triggered interrupts are delivered to more than one CPU.  Every
CPU other than the one that ACKed it first, will get the spurious
vector (so functionally, the impact of this bug isn't that bad;
performance-wise it might be different).

The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce
console output for every spurious interrupt, you'll get a nice
little storm.  Is that what's happening?


Yes, I believe, it has something to do with temperature.


Interesting observation, never thought of investigating that --
it's in line with my suspicion that something in the MPIC is
metastable though.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang

2006-09-12 Thread Jimi Xenidis
Sometimes when Xen is booted and we let Linux init the MPIC for "the  
second time" Xen could end up in a loop where the CPU is constantly  
being interrupted by the MPIC.


Because of console buffering, the last message you see is some  
message from early kernel boot.

Anyway.. we detect this now and you see a panic.

Yes, I believe, it has something to do with temperature.
Hopefully when we move to a Xen only MPIc model this issue will be  
resolved.


-JX
On Sep 12, 2006, at 6:50 AM, Xen patchbot-xenppc-unstable wrote:


# HG changeset patch
# User Jimi Xenidis <[EMAIL PROTECTED]>
# Node ID 5495e4525844250d5f359bd4d3bda8787e817711
# Parent  a79b3252bbe46a13d91586081e7f6be278b07126
[POWERPC][XEN] Detect bad spurious interrupt condition and panic  
instead of hang


When handing off the MPIC from Xen to Dom0, which is the current yet
not permamnet design, the MPIC can cause the processor to assert an
external interrupt when none is available.  Rather then simply hang in
this condition we now panic so the user can see that there is indeed a
problem and identify it as this one.

This condition seems to be related to temperature and the probablity
of it occuring decreases if the machine is allowed to stay idle (not
in the Xen panic loop) for a minute or two.

Signed-off-by: Jimi Xenidis <[EMAIL PROTECTED]>
---
 xen/arch/powerpc/external.c |9 +
 1 files changed, 9 insertions(+)

diff -r a79b3252bbe4 -r 5495e4525844 xen/arch/powerpc/external.c
--- a/xen/arch/powerpc/external.c   Fri Sep 08 12:37:27 2006 -0500
+++ b/xen/arch/powerpc/external.c   Tue Sep 12 06:47:22 2006 -0400
@@ -75,6 +75,7 @@ void do_external(struct cpu_user_regs *r
 void do_external(struct cpu_user_regs *regs)
 {
 int vec;
+static unsigned spur_count;

 BUG_ON(!(regs->msr & MSR_EE));
 BUG_ON(mfmsr() & MSR_EE);
@@ -87,6 +88,14 @@ void do_external(struct cpu_user_regs *r
 do_IRQ(regs);

 BUG_ON(mfmsr() & MSR_EE);
+spur_count = 0;
+} else {
+++spur_count;
+if (spur_count > 100)
+panic("Too many (%d) spurrious interrupts in a row\n"
+  "  Known problem, please halt and let machine  
idle/cool "

+  "  then reboot\n",
+  100);
 }
 }


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel



___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel