Re: Serial related oops

2007-03-01 Thread Jose Goncalves
Russell King wrote: > On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: > >> I've also done your suggestion and I've inserted "msleep(10);" just >> before the "And clear the interrupt registers again for luck." and my >> application is now running without problems fore more than

Re: Serial related oops

2007-03-01 Thread Russell King
On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: > I've also done your suggestion and I've inserted "msleep(10);" just > before the "And clear the interrupt registers again for luck." and my > application is now running without problems fore more than 24H! So, > inserting a delay in

Re: Serial related oops

2007-03-01 Thread Jose Goncalves
Hi again Russel, I'm back, after some more testing. Here goes my report. I've switched to another SBC and the kernel still Oops, so is not a one-off fault on the hardware. I've also run memtest86+ on this board for the maximum period that I reach an Oops with my application (24 H) and it not

Re: Serial related oops

2007-03-01 Thread Jose Goncalves
Hi again Russel, I'm back, after some more testing. Here goes my report. I've switched to another SBC and the kernel still Oops, so is not a one-off fault on the hardware. I've also run memtest86+ on this board for the maximum period that I reach an Oops with my application (24 H) and it not

Re: Serial related oops

2007-03-01 Thread Russell King
On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: I've also done your suggestion and I've inserted msleep(10); just before the And clear the interrupt registers again for luck. and my application is now running without problems fore more than 24H! So, inserting a delay in this

Re: Serial related oops

2007-03-01 Thread Jose Goncalves
Russell King wrote: On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: I've also done your suggestion and I've inserted msleep(10); just before the And clear the interrupt registers again for luck. and my application is now running without problems fore more than 24H! So,

Re: Serial related oops

2007-02-23 Thread Michael K. Edwards
Russell, thanks again for offering to look at this; the more oopses and soft lockups I see on this board, the more I think you're right and we have an IRQ handling race. Here's the struct irqchip setup: /* mask irq, refer ssection 2.6 under chip 8618 document */ static void

Re: Serial related oops

2007-02-23 Thread Michael K. Edwards
Russell, thanks again for offering to look at this; the more oopses and soft lockups I see on this board, the more I think you're right and we have an IRQ handling race. Here's the struct irqchip setup: /* mask irq, refer ssection 2.6 under chip 8618 document */ static void

Re: Serial related oops

2007-02-22 Thread Paul Fulghum
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: What I find real hard to understand is why a hardware fault happens always in the same software instruction! I would expect a hardware fault to hit randomly... I've experienced just such a hardware fault. The Infineon DSCC4

Re: Serial related oops

2007-02-22 Thread jose . goncalves
Quoting Russell King <[EMAIL PROTECTED]>: On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: Russell King wrote: > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things >> to us, at least on

Re: Serial related oops

2007-02-22 Thread jose . goncalves
Quoting Russell King <[EMAIL PROTECTED]>: On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: It could be a silly question (tamper with me as I'm not familiar with such low level programming), but couldn't it be possible for a interrupt to hit in the middle of the serial_in() calls

Re: Serial related oops

2007-02-22 Thread Russell King
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: > It could be a silly question (tamper with me as I'm not familiar with > such low level programming), but couldn't it be possible for a interrupt > to hit in the middle of the serial_in() calls and mess with %ebx? I'm no expert on

Re: Serial related oops

2007-02-22 Thread Russell King
On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: > Russell King wrote: > > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > > > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things > >> to us, at least on an ARM target ... > >> > > >

Re: Serial related oops

2007-02-22 Thread Jose Goncalves
Russell King wrote: > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things >> to us, at least on an ARM target ... >> > > That's ruled out. Please think about it for a moment - serial_in() > managed

Re: Serial related oops

2007-02-22 Thread Jose Goncalves
Russell King wrote: > On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > >> <1>[18840.304048] Unable to handle kernel NULL pointer dereference at >> virtual address 0012 >> <1>[18840.313046] printing eip: >> <4>[18840.321687] c01bfa7a >> <1>[18840.321714] *pde = >>

Re: Serial related oops

2007-02-22 Thread Russell King
On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things > to us, at least on an ARM target ... That's ruled out. Please think about it for a moment - serial_in() managed to work correctly most of the time, and

Re: Serial related oops

2007-02-22 Thread Russell King
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: > Russell King wrote: > > > > >Plainly, %ebx changed across the call to serial_in() at c01c0f7b. > >First thing to notice is this violates the C code - "up" can not > >change. > > > >Now let's look at serial_in: > > > >c01bfa70:

Re: Serial related oops

2007-02-22 Thread Russell King
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: Russell King wrote: Plainly, %ebx changed across the call to serial_in() at c01c0f7b. First thing to notice is this violates the C code - up can not change. Now let's look at serial_in: c01bfa70: 55

Re: Serial related oops

2007-02-22 Thread Russell King
On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... That's ruled out. Please think about it for a moment - serial_in() managed to work correctly most of the time, and then

Re: Serial related oops

2007-02-22 Thread Jose Goncalves
Russell King wrote: On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: 1[18840.304048] Unable to handle kernel NULL pointer dereference at virtual address 0012 1[18840.313046] printing eip: 4[18840.321687] c01bfa7a 1[18840.321714] *pde = 0[18840.331287] Oops:

Re: Serial related oops

2007-02-22 Thread Jose Goncalves
Russell King wrote: On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... That's ruled out. Please think about it for a moment - serial_in() managed to work

Re: Serial related oops

2007-02-22 Thread Russell King
On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: Russell King wrote: On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... That's ruled

Re: Serial related oops

2007-02-22 Thread Russell King
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: It could be a silly question (tamper with me as I'm not familiar with such low level programming), but couldn't it be possible for a interrupt to hit in the middle of the serial_in() calls and mess with %ebx? I'm no expert on x86,

Re: Serial related oops

2007-02-22 Thread jose . goncalves
Quoting Russell King [EMAIL PROTECTED]: On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: It could be a silly question (tamper with me as I'm not familiar with such low level programming), but couldn't it be possible for a interrupt to hit in the middle of the serial_in() calls

Re: Serial related oops

2007-02-22 Thread jose . goncalves
Quoting Russell King [EMAIL PROTECTED]: On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: Russell King wrote: On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM

Re: Serial related oops

2007-02-22 Thread Paul Fulghum
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: What I find real hard to understand is why a hardware fault happens always in the same software instruction! I would expect a hardware fault to hit randomly... I've experienced just such a hardware fault. The Infineon DSCC4

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: > Russell King wrote: > > >Plainly, %ebx changed across the call to serial_in() at c01c0f7b. > >First thing to notice is this violates the C code - "up" can not > >change. > >Now let's look at serial_in: > >c01bfa70: 55

Re: Serial related oops

2007-02-21 Thread H. Peter Anvin
Russell King wrote: Plainly, %ebx changed across the call to serial_in() at c01c0f7b. First thing to notice is this violates the C code - "up" can not change. Now let's look at serial_in: c01bfa70: 55 push %ebp c01bfa71: 89 e5 mov

Re: Serial related oops

2007-02-21 Thread Michael K. Edwards
Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Serial related oops

2007-02-21 Thread Russell King
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > <1>[18840.304048] Unable to handle kernel NULL pointer dereference at virtual > address 0012 > <1>[18840.313046] printing eip: > <4>[18840.321687] c01bfa7a > <1>[18840.321714] *pde = > <0>[18840.331287] Oops:

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > New devolpments. > I have upgraded to 2.6.16.41, applied a patch sent by Frederik that > removed the changed made in http://lkml.org/lkml/2005/6/23/266 and > activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, >

Re: Serial related oops

2007-02-21 Thread Jose Goncalves
Jose Goncalves wrote: > New devolpments. > I have upgraded to 2.6.16.41, applied a patch sent by Frederik that > removed the changed made in http://lkml.org/lkml/2005/6/23/266 and > activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, > CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP,

Re: Serial related oops

2007-02-21 Thread Jose Goncalves
New devolpments. I have upgraded to 2.6.16.41, applied a patch sent by Frederik that removed the changed made in http://lkml.org/lkml/2005/6/23/266 and activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB,

Re: Serial related oops

2007-02-21 Thread Jose Goncalves
New devolpments. I have upgraded to 2.6.16.41, applied a patch sent by Frederik that removed the changed made in http://lkml.org/lkml/2005/6/23/266 and activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB,

Re: Serial related oops

2007-02-21 Thread Jose Goncalves
Jose Goncalves wrote: New devolpments. I have upgraded to 2.6.16.41, applied a patch sent by Frederik that removed the changed made in http://lkml.org/lkml/2005/6/23/266 and activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP,

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: New devolpments. I have upgraded to 2.6.16.41, applied a patch sent by Frederik that removed the changed made in http://lkml.org/lkml/2005/6/23/266 and activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,

Re: Serial related oops

2007-02-21 Thread Russell King
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: 1[18840.304048] Unable to handle kernel NULL pointer dereference at virtual address 0012 1[18840.313046] printing eip: 4[18840.321687] c01bfa7a 1[18840.321714] *pde = 0[18840.331287] Oops: [#1]

Re: Serial related oops

2007-02-21 Thread Michael K. Edwards
Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Serial related oops

2007-02-21 Thread H. Peter Anvin
Russell King wrote: Plainly, %ebx changed across the call to serial_in() at c01c0f7b. First thing to notice is this violates the C code - up can not change. Now let's look at serial_in: c01bfa70: 55 push %ebp c01bfa71: 89 e5 mov

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: Russell King wrote: Plainly, %ebx changed across the call to serial_in() at c01c0f7b. First thing to notice is this violates the C code - up can not change. Now let's look at serial_in: c01bfa70: 55

Re: Serial related oops

2007-02-19 Thread Robert Hancock
Michael K. Edwards wrote: Of course not. But dealing with a stuck IRQ line by locking up isn't very practical either. IRQ sharing is stupid yet universal, and it And we don't, that's why we have that "nobody cared" logic that disables the interrupt line if no driver services the interrupt.

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Robert Hancock <[EMAIL PROTECTED]> wrote: How do you propose to do this? Drivers can get loaded and unloaded at any time. If you have a device generating spurious interrupts on a shared IRQ line, there's no way you can use any device on that line until that interrupt is shut off.

Re: Serial related oops

2007-02-19 Thread Robert Hancock
Michael K. Edwards wrote: Still open, though it's a pity you're more interested in my flawed understanding that in the possibility that the kernel could be systematically made more robust against hardware bugs and coding errors by the simple expedient of putting all the ISRs in before turning on

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: This can't happen because when __do_irq unmasks the interrupt source, the CPU mask is set, thereby preventing any further interrupt exceptions being taken. This is done precisely to prevent this situation happening. If you are seeing

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 04:04:26PM -0800, Michael K. Edwards wrote: > On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > >The second interrupt comes in, and when you go to disable that > >source, you inadvertently re-enable the UART interrupt, despite it > >still being serviced. > > Incorrect.

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: I think something else is going on here. I think you're getting an interrupt for the UART, and another interrupt is also pending. Correct. An interrupt for the other UART on the same IRQ. When the UART interrupt is handled, it is masked

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 02:16:41PM -0800, Michael K. Edwards wrote: > Right. But as soon as you turn the source back on, in the postamble > of the interrupt dispatch handler, it fires again. At least on ARM, > that gives you recursive hits to __irq_svc and a couple of nested > calls within it.

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > setup_irq() is where things go wrong, at least for us, at least on > 2.6.16.x. Interrupts are not disabled at the point in request_irq() > when the interrupt controller is poked to enable the IRQ source. If > you're lucky, and you're on an

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 01:24:17PM -0800, Michael K. Edwards wrote: > On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > >On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > >> What we've seen on our embedded ARM is that enabling an interrupt that > >> is shared between

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > What we've seen on our embedded ARM is that enabling an interrupt that > is shared between multiple UARTs, at a stage when you have not set up > all the data structures

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 05:54:52PM +, Jose Goncalves wrote: > Russell King wrote: > Result is attached. Right... in depth analysis follows. [15423.650518] [] uart_startup+0x63/0xf4 equates to 0xc01ba49a, which is indeed the instruction after the call to port->ops->startup. The important

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > What we've seen on our embedded ARM is that enabling an interrupt that > is shared between multiple UARTs, at a stage when you have not set up > all the data structures touched by the ISR and softirq, can have > horrible

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs, at a stage when you have not set up all the data structures touched by the ISR and softirq, can have horrible consequences, including soft lockups and fandangos on core. You will be vulnerable

Re: Serial related oops

2007-02-19 Thread Jose Goncalves
Russell King wrote: > On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: > >> Russell King wrote: >> >>> On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: >>> >>> (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: > Russell King wrote: > > On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > > > >> (trimmed tie-fei.zang from the CC, added by mistake) > >> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > >>

Re: Serial related oops

2007-02-19 Thread Jose Goncalves
Russell King wrote: > On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > >> (trimmed tie-fei.zang from the CC, added by mistake) >> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: >> Neither did I, but introducing printk's through the function, we

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > (trimmed tie-fei.zang from the CC, added by mistake) > On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > > > Neither did I, but introducing printk's through the function, we narrowed > > > the problem to this part

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt
(trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > > Neither did I, but introducing printk's through the function, we narrowed > > the problem to this part of the code. And removing it makes the problem > > go away. We inserted 37

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 02:24:42PM +, Frederik Deweerdt wrote: > On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: > > On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > > > (Sorry for the resend, I forgot to cc the list) > > > Hi Russell, > > > > > > It seems

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt
On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: > On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > > (Sorry for the resend, I forgot to cc the list) > > Hi Russell, > > > > It seems that the following change in drivers/serial/8250.c > > > > + > > + /* > > +

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > (Sorry for the resend, I forgot to cc the list) > Hi Russell, > > It seems that the following change in drivers/serial/8250.c > > + > + /* > + * Do a quick test to see if we receive an > + * interrupt when we

Serial related oops

2007-02-19 Thread Frederik Deweerdt
(Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the following change in drivers/serial/8250.c + + /* +* Do a quick test to see if we receive an +* interrupt when we enable the TX irq. +*/ + serial_outp(up, UART_IER, UART_IER_THRI); +

Serial related oops

2007-02-19 Thread Frederik Deweerdt
(Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the following change in drivers/serial/8250.c + + /* +* Do a quick test to see if we receive an +* interrupt when we enable the TX irq. +*/ + serial_outp(up, UART_IER, UART_IER_THRI); +

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: (Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the following change in drivers/serial/8250.c + + /* + * Do a quick test to see if we receive an + * interrupt when we enable the TX

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt
On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: (Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the following change in drivers/serial/8250.c + + /* +* Do a quick

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 02:24:42PM +, Frederik Deweerdt wrote: On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: (Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt
(trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: Neither did I, but introducing printk's through the function, we narrowed the problem to this part of the code. And removing it makes the problem go away. We inserted 37

Re: Serial related oops

2007-02-19 Thread Russell King
On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: Neither did I, but introducing printk's through the function, we narrowed the problem to this part of the

Re: Serial related oops

2007-02-19 Thread Jose Goncalves
Russell King wrote: On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: Neither did I, but introducing printk's through the function, we narrowed the

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: Russell King wrote: On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: Neither did

Re: Serial related oops

2007-02-19 Thread Jose Goncalves
Russell King wrote: On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: Russell King wrote: On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +,

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs, at a stage when you have not set up all the data structures touched by the ISR and softirq, can have horrible consequences, including soft lockups and fandangos on core. You will be vulnerable

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs, at a stage when you have not set up all the data structures touched by the ISR and softirq, can have horrible

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 05:54:52PM +, Jose Goncalves wrote: Russell King wrote: Result is attached. Right... in depth analysis follows. [15423.650518] [] uart_startup+0x63/0xf4 equates to 0xc01ba49a, which is indeed the instruction after the call to port-ops-startup. The important code

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King [EMAIL PROTECTED] wrote: On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs, at a stage when you have not set up all the data structures touched by

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 01:24:17PM -0800, Michael K. Edwards wrote: On 2/19/07, Russell King [EMAIL PROTECTED] wrote: On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs,

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King [EMAIL PROTECTED] wrote: setup_irq() is where things go wrong, at least for us, at least on 2.6.16.x. Interrupts are not disabled at the point in request_irq() when the interrupt controller is poked to enable the IRQ source. If you're lucky, and you're on an

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 02:16:41PM -0800, Michael K. Edwards wrote: Right. But as soon as you turn the source back on, in the postamble of the interrupt dispatch handler, it fires again. At least on ARM, that gives you recursive hits to __irq_svc and a couple of nested calls within it. I

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King [EMAIL PROTECTED] wrote: I think something else is going on here. I think you're getting an interrupt for the UART, and another interrupt is also pending. Correct. An interrupt for the other UART on the same IRQ. When the UART interrupt is handled, it is masked at

Re: Serial related oops

2007-02-19 Thread Russell King
On Mon, Feb 19, 2007 at 04:04:26PM -0800, Michael K. Edwards wrote: On 2/19/07, Russell King [EMAIL PROTECTED] wrote: The second interrupt comes in, and when you go to disable that source, you inadvertently re-enable the UART interrupt, despite it still being serviced. Incorrect. An

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Russell King [EMAIL PROTECTED] wrote: This can't happen because when __do_irq unmasks the interrupt source, the CPU mask is set, thereby preventing any further interrupt exceptions being taken. This is done precisely to prevent this situation happening. If you are seeing recursion

Re: Serial related oops

2007-02-19 Thread Robert Hancock
Michael K. Edwards wrote: Still open, though it's a pity you're more interested in my flawed understanding that in the possibility that the kernel could be systematically made more robust against hardware bugs and coding errors by the simple expedient of putting all the ISRs in before turning on

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards
On 2/19/07, Robert Hancock [EMAIL PROTECTED] wrote: How do you propose to do this? Drivers can get loaded and unloaded at any time. If you have a device generating spurious interrupts on a shared IRQ line, there's no way you can use any device on that line until that interrupt is shut off.

Re: Serial related oops

2007-02-19 Thread Robert Hancock
Michael K. Edwards wrote: Of course not. But dealing with a stuck IRQ line by locking up isn't very practical either. IRQ sharing is stupid yet universal, and it And we don't, that's why we have that nobody cared logic that disables the interrupt line if no driver services the interrupt.