[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: Philippe Gerum wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc-lock more likely arch/ppc/kernel/*.c :-) Gah... looks like I'm still confused by ia64 issues I'm chasing right now. (Why on earth do we need so many bits on our CPUs that only serve the purpose of raising so many problems?) for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(desc-lock) hw IRQ __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(desc-lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. So you will expect an addition to the ipipe patch then? Yep. We first need to find out who's grabbing the shared spinlock using the vanilla Linux primitives. /Anders -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. What does the kernel state before these lines? Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, Functional Exception), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders