Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]: On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. That's something which we can try to prove... build running now with CONFIG_PREEMPT=y Looks like you now have the omap_l3_noc error appear for sdp4430 in your logs after enabling PREEMPT. I guess that means case closed for this one? Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]: On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. That's something which we can try to prove... build running now with CONFIG_PREEMPT=y Looks like you now have the omap_l3_noc error appear for sdp4430 in your logs after enabling PREEMPT. I guess that means case closed for this one? I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. Unfortunately, the OMAP hardware appears to make it impossible to determine what the access that caused the error was, so it looks like it's pretty much undebuggable. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]: On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]: On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. That's something which we can try to prove... build running now with CONFIG_PREEMPT=y Looks like you now have the omap_l3_noc error appear for sdp4430 in your logs after enabling PREEMPT. I guess that means case closed for this one? I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. We have a wrong irq number caused by $subject. And the wrong irq gets triggered before the dma hardware is configured during dma init. And then we get the invalid access error from omap_l3_noc. Unfortunately, the OMAP hardware appears to make it impossible to determine what the access that caused the error was, so it looks like it's pretty much undebuggable. Yeah would be nice to have more info from omap_l3_noc. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]: On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]: On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. That's something which we can try to prove... build running now with CONFIG_PREEMPT=y Looks like you now have the omap_l3_noc error appear for sdp4430 in your logs after enabling PREEMPT. I guess that means case closed for this one? I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. We have a wrong irq number caused by $subject. And the wrong irq gets triggered before the dma hardware is configured during dma init. And then we get the invalid access error from omap_l3_noc. Unfortunately, the OMAP hardware appears to make it impossible to determine what the access that caused the error was, so it looks like it's pretty much undebuggable. Yeah would be nice to have more info from omap_l3_noc. you can probably get more info by decoding all L4 instance errors. It's just not implemented anywhere. In this case, we should decode l4cfg which is who generated the error. -- balbi signature.asc Description: Digital signature
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote: * Marc Zyngier marc.zyng...@arm.com [150115 06:46]: On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... I have a feeling this might bite other platforms too and we just have not noticed it yet.. I'm looking through the entire tree now, scanning for machines that have GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code. Most platforms using GIC are completely converted to DT and have no hardcoded legacy IRQs. I have checked that cns3xxx and realview are both fine by inspection. The only one I'm not sure about is shmobile, which looks like it might suffer from the same problem. Simon/Magnus, could you verify this with a multiplatform kernel on any SoC that has GIC and uses devices that have interrupts defined in setup-*.c or board-*.c? Arnd -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]: I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. We have a wrong irq number caused by $subject. And the wrong irq gets triggered before the dma hardware is configured during dma init. And then we get the invalid access error from omap_l3_noc. ... which should happen whether or not preempt is enabled, which is really my point. We know tha the wrong IRQ gets requested by the driver - and that wrong IRQ is requested whether or not we have preempt enabled. Yet we get the warning whether or not preempt is enabled. The DMA handler is not registered as a threaded handler, so it's not depending on a context switch to execute omap2_dma_irq_handler(). Another reason why I don't agree with your explanation is that by the time setup_irq() is called, we have already poked at the DMA hardware several times - omap_clear_dma() and omap2_disable_irq_lch() will have been called for each DMA channel - and both will write to the hardware. What's more is that the only things left after setup_irq() has been called is to possibly reserve the first two DMA channels and print the DMA message (via show_dma_caps). So I see nothing after setup_irq() which would finish any unfinished hardware initialisation. The final reason I don't agree is that I've put a printk() in omap2_dma_irq_handler(), and this does not trigger. So, I think this has nothing to do with the DMA hardware /at all/, but more to do with the GPIO code, and it suggests that the GPIO code publishes IRQs before it is safe for those IRQs to be used. Maybe it has to do with omap_gpio_irq_handler() being called... added printk(), nope, that's not called either. So it's not an IRQ which gets triggered at all. What is called are (in order): omap_gpio_unmask_irq() omap_set_gpio_irqenable() omap_enable_gpio_irqbank() and this reveals where the problem is, especially when you then add instrumentation into the runtime PM functions - and this reveals that when a GPIO IRQ is requested, these functions are called while the GPIO is runtime suspended. _That_ is where the *real* problem lies - requesting a GPIO interrupt results in the kernel touching possibly runtime-suspended hardware. The reason it happens with preempt is that preempt introduces scheduling points during the kernel boot which would not otherwise be there (with preempt disabled, you have to hit an explicit context switch due to contention on some lock or a wait in order for some other thread to run.) So, the GPIO driver really needs fixing - and I'd suggest fixing it first, before fixing the DMA problem, because the DMA problem allows us to see the GPIO problem. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On 16/01/15 16:56, Arnd Bergmann wrote: On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote: * Marc Zyngier marc.zyng...@arm.com [150115 06:46]: On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... I have a feeling this might bite other platforms too and we just have not noticed it yet.. I'm looking through the entire tree now, scanning for machines that have GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code. Most platforms using GIC are completely converted to DT and have no hardcoded legacy IRQs. I have checked that cns3xxx and realview are both fine by inspection. The only one I'm not sure about is shmobile, which looks like it might suffer from the same problem. Simon/Magnus, could you verify this with a multiplatform kernel on any SoC that has GIC and uses devices that have interrupts defined in setup-*.c or board-*.c? There are 3 patches floating around for shmobile, converting their non-DT support to directly initializing the GIC instead of relying on irqchip_init(). That's assuming their DT implementation doesn't use any of these device declarations. If they do, we could use a hack similar to the one I implemented for OMAP, populating the virtual IRQ in the resource at boot time, just after the irqchip initialization. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]: On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]: I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. We have a wrong irq number caused by $subject. And the wrong irq gets triggered before the dma hardware is configured during dma init. And then we get the invalid access error from omap_l3_noc. ... which should happen whether or not preempt is enabled, which is really my point. We know tha the wrong IRQ gets requested by the driver - and that wrong IRQ is requested whether or not we have preempt enabled. Yet we get the warning whether or not preempt is enabled. The DMA handler is not registered as a threaded handler, so it's not depending on a context switch to execute omap2_dma_irq_handler(). Another reason why I don't agree with your explanation is that by the time setup_irq() is called, we have already poked at the DMA hardware several times - omap_clear_dma() and omap2_disable_irq_lch() will have been called for each DMA channel - and both will write to the hardware. What's more is that the only things left after setup_irq() has been called is to possibly reserve the first two DMA channels and print the DMA message (via show_dma_caps). So I see nothing after setup_irq() which would finish any unfinished hardware initialisation. The final reason I don't agree is that I've put a printk() in omap2_dma_irq_handler(), and this does not trigger. Oh, yes that blows my theory completely then. So, I think this has nothing to do with the DMA hardware /at all/, but more to do with the GPIO code, and it suggests that the GPIO code publishes IRQs before it is safe for those IRQs to be used. Maybe it has to do with omap_gpio_irq_handler() being called... added printk(), nope, that's not called either. So it's not an IRQ which gets triggered at all. What is called are (in order): omap_gpio_unmask_irq() omap_set_gpio_irqenable() omap_enable_gpio_irqbank() and this reveals where the problem is, especially when you then add instrumentation into the runtime PM functions - and this reveals that when a GPIO IRQ is requested, these functions are called while the GPIO is runtime suspended. _That_ is where the *real* problem lies - requesting a GPIO interrupt results in the kernel touching possibly runtime-suspended hardware. The reason it happens with preempt is that preempt introduces scheduling points during the kernel boot which would not otherwise be there (with preempt disabled, you have to hit an explicit context switch due to contention on some lock or a wait in order for some other thread to run.) OK makes sense. So, the GPIO driver really needs fixing - and I'd suggest fixing it first, before fixing the DMA problem, because the DMA problem allows us to see the GPIO problem. Yes we need to fix that. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Tony Lindgren t...@atomide.com [150116 09:36]: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]: On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]: I would still like to understand /why/ enabling preempt causes the error. Changing the preempt configuration really should not change what happens on the bus. (Think about it.) It's an indication that there is some other error present. We have a wrong irq number caused by $subject. And the wrong irq gets triggered before the dma hardware is configured during dma init. And then we get the invalid access error from omap_l3_noc. ... which should happen whether or not preempt is enabled, which is really my point. We know tha the wrong IRQ gets requested by the driver - and that wrong IRQ is requested whether or not we have preempt enabled. Yet we get the warning whether or not preempt is enabled. The DMA handler is not registered as a threaded handler, so it's not depending on a context switch to execute omap2_dma_irq_handler(). Another reason why I don't agree with your explanation is that by the time setup_irq() is called, we have already poked at the DMA hardware several times - omap_clear_dma() and omap2_disable_irq_lch() will have been called for each DMA channel - and both will write to the hardware. What's more is that the only things left after setup_irq() has been called is to possibly reserve the first two DMA channels and print the DMA message (via show_dma_caps). So I see nothing after setup_irq() which would finish any unfinished hardware initialisation. The final reason I don't agree is that I've put a printk() in omap2_dma_irq_handler(), and this does not trigger. Oh, yes that blows my theory completely then. So, I think this has nothing to do with the DMA hardware /at all/, but more to do with the GPIO code, and it suggests that the GPIO code publishes IRQs before it is safe for those IRQs to be used. Maybe it has to do with omap_gpio_irq_handler() being called... added printk(), nope, that's not called either. So it's not an IRQ which gets triggered at all. What is called are (in order): omap_gpio_unmask_irq() omap_set_gpio_irqenable() omap_enable_gpio_irqbank() and this reveals where the problem is, especially when you then add instrumentation into the runtime PM functions - and this reveals that when a GPIO IRQ is requested, these functions are called while the GPIO is runtime suspended. _That_ is where the *real* problem lies - requesting a GPIO interrupt results in the kernel touching possibly runtime-suspended hardware. The reason it happens with preempt is that preempt introduces scheduling points during the kernel boot which would not otherwise be there (with preempt disabled, you have to hit an explicit context switch due to contention on some lock or a wait in order for some other thread to run.) OK makes sense. So, the GPIO driver really needs fixing - and I'd suggest fixing it first, before fixing the DMA problem, because the DMA problem allows us to see the GPIO problem. Yes we need to fix that. Posted a minimal fix for that one as a separate thread: [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq() Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote: * Tony Lindgren t...@atomide.com [150116 09:36]: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]: So, the GPIO driver really needs fixing - and I'd suggest fixing it first, before fixing the DMA problem, because the DMA problem allows us to see the GPIO problem. Yes we need to fix that. Posted a minimal fix for that one as a separate thread: [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq() Thanks, I'll throw that onto the build tree for tonights build. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 15:00]: On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote: * Tony Lindgren t...@atomide.com [150116 09:36]: * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]: So, the GPIO driver really needs fixing - and I'd suggest fixing it first, before fixing the DMA problem, because the DMA problem allows us to see the GPIO problem. Yes we need to fix that. Posted a minimal fix for that one as a separate thread: [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq() Thanks, I'll throw that onto the build tree for tonights build. Great thanks! Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Fri, Jan 16, 2015 at 05:23:05PM +, Marc Zyngier wrote: On 16/01/15 16:56, Arnd Bergmann wrote: On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote: * Marc Zyngier marc.zyng...@arm.com [150115 06:46]: On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... I have a feeling this might bite other platforms too and we just have not noticed it yet.. I'm looking through the entire tree now, scanning for machines that have GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code. Most platforms using GIC are completely converted to DT and have no hardcoded legacy IRQs. I have checked that cns3xxx and realview are both fine by inspection. The only one I'm not sure about is shmobile, which looks like it might suffer from the same problem. Simon/Magnus, could you verify this with a multiplatform kernel on any SoC that has GIC and uses devices that have interrupts defined in setup-*.c or board-*.c? There are 3 patches floating around for shmobile, converting their non-DT support to directly initializing the GIC instead of relying on irqchip_init(). There is also a fourth patch pending to fix one last SoC, the r8a73a4. My understanding is that should be the end of the problems that we have been seeing in this area. That's assuming their DT implementation doesn't use any of these device declarations. I believe that assumption is correct. shmobile does not use any devices that have interrupts defined in setup-*.c or board-*.c when booting using multiplatform. If they do, we could use a hack similar to the one I implemented for OMAP, populating the virtual IRQ in the resource at boot time, just after the irqchip initialization. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote: * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. That's something which we can try to prove... build running now with CONFIG_PREEMPT=y -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]: On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote: Hi all, Looks like the legacy IRQ numbers are now all wrong at least for omap4 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.). Instead of this: # cat /proc/interrupts CPU0 CPU1 29: 1124981 GIC 29 twd 39: 0 0 GIC 39 TWL6030-PIH 41: 0 0 GIC 41 l3-dbg-irq 42: 0 0 GIC 42 l3-app-irq 44: 0 0 GIC 44 DMA 45: 7854 0 GIC 45 omap-dma-engine 52: 0 0 GIC 52 gpmc ... We now have: # cat /proc/interrupts CPU0 CPU1 16:343 0 GIC 69 gp_timer 17: 1160 1017 GIC 29 twd 18: 0 0 GIC 41 l3-dbg-irq 19: 1 0 GIC 42 l3-app-irq 22: 7850 0 GIC 45 omap-dma-engine 44: 0 0 4a31.gpio 18 DMA 61: 2730 0 48055000.gpio 2 eth0 223: 0 0 GIC 52 gpmc ... So the DMA interrupt using the legacy mapping with something like irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately at least omaps still have a bunch of the legacy interrupts still around. And that naturally produces all kinds of strange errors like: WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access ... [c05f21e4] (__irq_svc) from [c05f1974] (_raw_spin_unlock_irqrestore+0x34/0x44) [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] (__setup_irq+0x244/0x530) [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c) [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4) [c0039c8c] (omap_system_dma_probe) from [c03b2200] (platform_drv_probe+0x44/0xa4) ... I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. I think it gets triggered if you enable PREEMPT. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Of course, this is in no way a proper fix, but I suppose the OMAP DT is still missing a few bits... I must be missing something here, but all the interrupts are listed correctly in the DT, so what is the omap_hwmod_irq_info actually achieving on omap4 and omap5? Would it work if we just remove the incorrect copy of the resource and use the one that comes from DT? By the look of it, omap_hwmod_irq_info serves multiple purposes: - low level configuration (pads, probably more stuff) - interrupt description for some drivers, using resources. It should be fairly easy to do the latter, but the former looks more tricky (it would push the pad configuration down to the drivers, which is avoided at the moment). Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... M. -- Jazz is not dead. It just smells funny. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
* Marc Zyngier marc.zyng...@arm.com [150115 06:46]: On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Of course, this is in no way a proper fix, but I suppose the OMAP DT is still missing a few bits... I must be missing something here, but all the interrupts are listed correctly in the DT, so what is the omap_hwmod_irq_info actually achieving on omap4 and omap5? Would it work if we just remove the incorrect copy of the resource and use the one that comes from DT? By the look of it, omap_hwmod_irq_info serves multiple purposes: - low level configuration (pads, probably more stuff) The muxing is only done for omap3 in legacy booting mode. - interrupt description for some drivers, using resources. That's still used to create legacy platform_device entries on omap4 for legacy DMA, DSS, PRM. The twl6040 entries are already unused and I have a patch queued to remove them. It should be fairly easy to do the latter, but the former looks more tricky (it would push the pad configuration down to the drivers, which is avoided at the moment). The pad configuration is already done with pinctrl-single. Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... I have a feeling this might bite other platforms too and we just have not noticed it yet.. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Thursday 15 January 2015 14:43:35 Marc Zyngier wrote: On Thu, Jan 15 2015 at 2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote: On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Of course, this is in no way a proper fix, but I suppose the OMAP DT is still missing a few bits... I must be missing something here, but all the interrupts are listed correctly in the DT, so what is the omap_hwmod_irq_info actually achieving on omap4 and omap5? Would it work if we just remove the incorrect copy of the resource and use the one that comes from DT? By the look of it, omap_hwmod_irq_info serves multiple purposes: - low level configuration (pads, probably more stuff) - interrupt description for some drivers, using resources. It should be fairly easy to do the latter, but the former looks more tricky (it would push the pad configuration down to the drivers, which is avoided at the moment). Probably there is a workable strategy, but my knowledge about OMAP is close to *nothing*... I don't know much about OMAP either so maybe it's better to let someone comment who understands this better. ;-) A number of commits have in the past removed omap_hwmod_irq_info entries here, the last one was 09182ab11b49 (ARM: OMAP4: hwmod data: Remove irq entries from mcspi, mmc hwmods). From what I can tell, we could drop the omap44xx_dss_dispc_irqs, omap44xx_dss_dsi1_irqs, omap44xx_dss_dsi2_irqs, and omap44xx_dss_hdmi_irqs in the samem way, after this data got added to DT as part of cfe86fcf2d0079f03 (ARM: omap4.dtsi: add omapdss information). Unfortunately, the omap-dma driver once again gets in the way, since there are still users of this that are not converted to use dmaengine, and unlike the proper driver (drivers/dma/omap-dma.c), the legacy driver (arch/arm/mach-omap2/dma.c) does not get probed from DT and instead gets the wrong irq numbers from hwmod now. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote: Hi all, Looks like the legacy IRQ numbers are now all wrong at least for omap4 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.). Instead of this: # cat /proc/interrupts CPU0 CPU1 29: 1124981 GIC 29 twd 39: 0 0 GIC 39 TWL6030-PIH 41: 0 0 GIC 41 l3-dbg-irq 42: 0 0 GIC 42 l3-app-irq 44: 0 0 GIC 44 DMA 45: 7854 0 GIC 45 omap-dma-engine 52: 0 0 GIC 52 gpmc ... We now have: # cat /proc/interrupts CPU0 CPU1 16:343 0 GIC 69 gp_timer 17: 1160 1017 GIC 29 twd 18: 0 0 GIC 41 l3-dbg-irq 19: 1 0 GIC 42 l3-app-irq 22: 7850 0 GIC 45 omap-dma-engine 44: 0 0 4a31.gpio 18 DMA 61: 2730 0 48055000.gpio 2 eth0 223: 0 0 GIC 52 gpmc ... So the DMA interrupt using the legacy mapping with something like irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately at least omaps still have a bunch of the legacy interrupts still around. And that naturally produces all kinds of strange errors like: WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access ... [c05f21e4] (__irq_svc) from [c05f1974] (_raw_spin_unlock_irqrestore+0x34/0x44) [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] (__setup_irq+0x244/0x530) [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c) [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4) [c0039c8c] (omap_system_dma_probe) from [c03b2200] (platform_drv_probe+0x44/0xa4) ... I don't think we've proven a link there. While you're right that it causes the wrong interrupt to be claimed, I have two kernels here, both claim the same interrupt, one which is multi-platform and issues that strange warning, and one which targets only OMAP4 which doesn't. There's something else going on which causes the bus errors which we haven't found. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Wed, Jan 14 2015 at 10:14:08 pm GMT, Tony Lindgren t...@atomide.com wrote: Hi all, Looks like the legacy IRQ numbers are now all wrong at least for omap4 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.). Instead of this: # cat /proc/interrupts CPU0 CPU1 29: 1124981 GIC 29 twd 39: 0 0 GIC 39 TWL6030-PIH 41: 0 0 GIC 41 l3-dbg-irq 42: 0 0 GIC 42 l3-app-irq 44: 0 0 GIC 44 DMA 45: 7854 0 GIC 45 omap-dma-engine 52: 0 0 GIC 52 gpmc ... We now have: # cat /proc/interrupts CPU0 CPU1 16:343 0 GIC 69 gp_timer 17: 1160 1017 GIC 29 twd 18: 0 0 GIC 41 l3-dbg-irq 19: 1 0 GIC 42 l3-app-irq 22: 7850 0 GIC 45 omap-dma-engine 44: 0 0 4a31.gpio 18 DMA 61: 2730 0 48055000.gpio 2 eth0 223: 0 0 GIC 52 gpmc ... So the DMA interrupt using the legacy mapping with something like irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately at least omaps still have a bunch of the legacy interrupts still around. Holy crap. How much of this do we have hanging around? And that naturally produces all kinds of strange errors like: WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access ... [c05f21e4] (__irq_svc) from [c05f1974] (_raw_spin_unlock_irqrestore+0x34/0x44) [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] (__setup_irq+0x244/0x530) [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c) [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4) [c0039c8c] (omap_system_dma_probe) from [c03b2200] (platform_drv_probe+0x44/0xa4) ... Looks like the logic changed from: if (of_property_read_u32(node, arm,routable-irqs, nr_routable_irqs)) to just if (node) Which now causes irq_domain_add_linear() to be called instead of irq_domain_add_legacy(), which causes the breakage. Anybody got a sane fix in mind for the -rc series, or should we just revert it for now? Reverting it is going to kill other platforms, and I'd rather have a workaround, short of fixing it for good (which seems ambitious at -rc4). How about something along these lines: diff --git a/arch/arm/mach-omap2/common.h b/arch/arm/mach-omap2/common.h index 377eea8..b664494 100644 --- a/arch/arm/mach-omap2/common.h +++ b/arch/arm/mach-omap2/common.h @@ -211,6 +211,7 @@ extern struct device *omap2_get_iva_device(void); extern struct device *omap2_get_l3_device(void); extern struct device *omap4_get_dsp_device(void); +unsigned int omap4_xlate_irq(unsigned int hwirq); void omap_gic_of_init(void); #ifdef CONFIG_CACHE_L2X0 diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index b7cb44a..cc30e49 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -256,6 +256,38 @@ static int __init omap4_sar_ram_init(void) } omap_early_initcall(omap4_sar_ram_init); +static struct of_device_id gic_match[] = { + { .compatible = arm,cortex-a9-gic, }, + { .compatible = arm,cortex-a15-gic, }, + { }, +}; + +static struct device_node *gic_node; + +unsigned int omap4_xlate_irq(unsigned int hwirq) +{ + struct of_phandle_args irq_data; + unsigned int irq; + + if (!gic_node) + gic_node = of_find_matching_node(NULL, gic_match); + + if (WARN_ON(!gic_node)) + return hwirq; + + irq_data.np = gic_node; + irq_data.args_count = 3; + irq_data.args[0] = 0; + irq_data.args[1] = hwirq - OMAP44XX_IRQ_GIC_START; + irq_data.args[2] = IRQ_TYPE_LEVEL_HIGH; + + irq = irq_create_of_mapping(irq_data); + if (WARN_ON(!irq)) + irq = hwirq; + + return irq; +} + void __init omap_gic_of_init(void) { struct device_node *np; diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c index cbb908d..9025fff 100644 --- a/arch/arm/mach-omap2/omap_hwmod.c +++ b/arch/arm/mach-omap2/omap_hwmod.c @@ -3534,9 +3534,15 @@ int omap_hwmod_fill_resources(struct omap_hwmod *oh, struct resource *res) mpu_irqs_cnt = _count_mpu_irqs(oh); for (i = 0; i mpu_irqs_cnt; i++) { + unsigned int irq; + + if (oh-xlate_irq) + irq = oh-xlate_irq((oh-mpu_irqs + i)-irq); + else + irq = (oh-mpu_irqs + i)-irq; (res + r)-name = (oh-mpu_irqs + i)-name; - (res + r)-start = (oh-mpu_irqs +
Re: Regression with legacy IRQ numbers caused by 9a1091ef0017
On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote: Of course, this is in no way a proper fix, but I suppose the OMAP DT is still missing a few bits... I must be missing something here, but all the interrupts are listed correctly in the DT, so what is the omap_hwmod_irq_info actually achieving on omap4 and omap5? Would it work if we just remove the incorrect copy of the resource and use the one that comes from DT? Arnd -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression with legacy IRQ numbers caused by 9a1091ef0017
Hi all, Looks like the legacy IRQ numbers are now all wrong at least for omap4 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.). Instead of this: # cat /proc/interrupts CPU0 CPU1 29: 1124981 GIC 29 twd 39: 0 0 GIC 39 TWL6030-PIH 41: 0 0 GIC 41 l3-dbg-irq 42: 0 0 GIC 42 l3-app-irq 44: 0 0 GIC 44 DMA 45: 7854 0 GIC 45 omap-dma-engine 52: 0 0 GIC 52 gpmc ... We now have: # cat /proc/interrupts CPU0 CPU1 16:343 0 GIC 69 gp_timer 17: 1160 1017 GIC 29 twd 18: 0 0 GIC 41 l3-dbg-irq 19: 1 0 GIC 42 l3-app-irq 22: 7850 0 GIC 45 omap-dma-engine 44: 0 0 4a31.gpio 18 DMA 61: 2730 0 48055000.gpio 2 eth0 223: 0 0 GIC 52 gpmc ... So the DMA interrupt using the legacy mapping with something like irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately at least omaps still have a bunch of the legacy interrupts still around. And that naturally produces all kinds of strange errors like: WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340() 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access ... [c05f21e4] (__irq_svc) from [c05f1974] (_raw_spin_unlock_irqrestore+0x34/0x44) [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] (__setup_irq+0x244/0x530) [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c) [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4) [c0039c8c] (omap_system_dma_probe) from [c03b2200] (platform_drv_probe+0x44/0xa4) ... Looks like the logic changed from: if (of_property_read_u32(node, arm,routable-irqs, nr_routable_irqs)) to just if (node) Which now causes irq_domain_add_linear() to be called instead of irq_domain_add_legacy(), which causes the breakage. Anybody got a sane fix in mind for the -rc series, or should we just revert it for now? Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html