Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Tony Lindgren
* Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]:
 On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
  * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
   I don't think we've proven a link there.  While you're right that it
   causes the wrong interrupt to be claimed, I have two kernels here,
   both claim the same interrupt, one which is multi-platform and issues
   that strange warning, and one which targets only OMAP4 which doesn't.
   
   There's something else going on which causes the bus errors which we
   haven't found.
  
  I think it gets triggered if you enable PREEMPT.
 
 That's something which we can try to prove... build running now with
 CONFIG_PREEMPT=y

Looks like you now have the omap_l3_noc error appear for sdp4430 in
your logs after enabling PREEMPT.

I guess that means case closed for this one?

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Russell King - ARM Linux
On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]:
  On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
   * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
I don't think we've proven a link there.  While you're right that it
causes the wrong interrupt to be claimed, I have two kernels here,
both claim the same interrupt, one which is multi-platform and issues
that strange warning, and one which targets only OMAP4 which doesn't.

There's something else going on which causes the bus errors which we
haven't found.
   
   I think it gets triggered if you enable PREEMPT.
  
  That's something which we can try to prove... build running now with
  CONFIG_PREEMPT=y
 
 Looks like you now have the omap_l3_noc error appear for sdp4430 in
 your logs after enabling PREEMPT.
 
 I guess that means case closed for this one?

I would still like to understand /why/ enabling preempt causes the error.
Changing the preempt configuration really should not change what happens
on the bus.  (Think about it.)  It's an indication that there is some
other error present.

Unfortunately, the OMAP hardware appears to make it impossible to
determine what the access that caused the error was, so it looks like
it's pretty much undebuggable.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Tony Lindgren
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]:
 On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
  * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]:
   On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
* Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
 I don't think we've proven a link there.  While you're right that it
 causes the wrong interrupt to be claimed, I have two kernels here,
 both claim the same interrupt, one which is multi-platform and issues
 that strange warning, and one which targets only OMAP4 which doesn't.
 
 There's something else going on which causes the bus errors which we
 haven't found.

I think it gets triggered if you enable PREEMPT.
   
   That's something which we can try to prove... build running now with
   CONFIG_PREEMPT=y
  
  Looks like you now have the omap_l3_noc error appear for sdp4430 in
  your logs after enabling PREEMPT.
  
  I guess that means case closed for this one?
 
 I would still like to understand /why/ enabling preempt causes the error.
 Changing the preempt configuration really should not change what happens
 on the bus.  (Think about it.)  It's an indication that there is some
 other error present.

We have a wrong irq number caused by $subject. And the wrong irq
gets triggered before the dma hardware is configured during dma
init. And then we get the invalid access error from omap_l3_noc.

 Unfortunately, the OMAP hardware appears to make it impossible to
 determine what the access that caused the error was, so it looks like
 it's pretty much undebuggable.

Yeah would be nice to have more info from omap_l3_noc.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Felipe Balbi
On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]:
  On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
   * Russell King - ARM Linux li...@arm.linux.org.uk [150115 09:22]:
On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
  I don't think we've proven a link there.  While you're right that it
  causes the wrong interrupt to be claimed, I have two kernels here,
  both claim the same interrupt, one which is multi-platform and 
  issues
  that strange warning, and one which targets only OMAP4 which 
  doesn't.
  
  There's something else going on which causes the bus errors which we
  haven't found.
 
 I think it gets triggered if you enable PREEMPT.

That's something which we can try to prove... build running now with
CONFIG_PREEMPT=y
   
   Looks like you now have the omap_l3_noc error appear for sdp4430 in
   your logs after enabling PREEMPT.
   
   I guess that means case closed for this one?
  
  I would still like to understand /why/ enabling preempt causes the error.
  Changing the preempt configuration really should not change what happens
  on the bus.  (Think about it.)  It's an indication that there is some
  other error present.
 
 We have a wrong irq number caused by $subject. And the wrong irq
 gets triggered before the dma hardware is configured during dma
 init. And then we get the invalid access error from omap_l3_noc.
 
  Unfortunately, the OMAP hardware appears to make it impossible to
  determine what the access that caused the error was, so it looks like
  it's pretty much undebuggable.
 
 Yeah would be nice to have more info from omap_l3_noc.

you can probably get more info by decoding all L4 instance errors. It's
just not implemented anywhere. In this case, we should decode l4cfg
which is who generated the error.

-- 
balbi


signature.asc
Description: Digital signature


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Arnd Bergmann
On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
 * Marc Zyngier marc.zyng...@arm.com [150115 06:46]:
  On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote:
   On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
  Probably there is a workable strategy, but my knowledge about OMAP is
  close to *nothing*...
 
 I have a feeling this might bite other platforms too and we just have
 not noticed it yet..

I'm looking through the entire tree now, scanning for machines that have
GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
Most platforms using GIC are completely converted to DT and have no
hardcoded legacy IRQs.

I have checked that cns3xxx and realview are both fine by inspection.

The only one I'm not sure about is shmobile, which looks like it might
suffer from the same problem. Simon/Magnus, could you verify this
with a multiplatform kernel on any SoC that has GIC and uses devices
that have interrupts defined in setup-*.c or board-*.c?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Russell King - ARM Linux
On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]:
  I would still like to understand /why/ enabling preempt causes the error.
  Changing the preempt configuration really should not change what happens
  on the bus.  (Think about it.)  It's an indication that there is some
  other error present.
 
 We have a wrong irq number caused by $subject. And the wrong irq
 gets triggered before the dma hardware is configured during dma
 init. And then we get the invalid access error from omap_l3_noc.

... which should happen whether or not preempt is enabled, which is
really my point.

We know tha the wrong IRQ gets requested by the driver - and that wrong
IRQ is requested whether or not we have preempt enabled.  Yet we get
the warning whether or not preempt is enabled.

The DMA handler is not registered as a threaded handler, so it's not
depending on a context switch to execute omap2_dma_irq_handler().

Another reason why I don't agree with your explanation is that by the
time setup_irq() is called, we have already poked at the DMA hardware
several times - omap_clear_dma() and omap2_disable_irq_lch() will have
been called for each DMA channel - and both will write to the hardware.

What's more is that the only things left after setup_irq() has been
called is to possibly reserve the first two DMA channels and print
the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
which would finish any unfinished hardware initialisation.

The final reason I don't agree is that I've put a printk() in
omap2_dma_irq_handler(), and this does not trigger.

So, I think this has nothing to do with the DMA hardware /at all/,
but more to do with the GPIO code, and it suggests that the GPIO code
publishes IRQs before it is safe for those IRQs to be used.

Maybe it has to do with omap_gpio_irq_handler() being called... added
printk(), nope, that's not called either.  So it's not an IRQ which
gets triggered at all.

What is called are (in order):

omap_gpio_unmask_irq()
omap_set_gpio_irqenable()
omap_enable_gpio_irqbank()

and this reveals where the problem is, especially when you then add
instrumentation into the runtime PM functions - and this reveals that
when a GPIO IRQ is requested, these functions are called while the
GPIO is runtime suspended.

_That_ is where the *real* problem lies - requesting a GPIO interrupt
results in the kernel touching possibly runtime-suspended hardware.

The reason it happens with preempt is that preempt introduces scheduling
points during the kernel boot which would not otherwise be there (with
preempt disabled, you have to hit an explicit context switch due to
contention on some lock or a wait in order for some other thread to run.)

So, the GPIO driver really needs fixing - and I'd suggest fixing it
first, before fixing the DMA problem, because the DMA problem allows
us to see the GPIO problem.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Marc Zyngier
On 16/01/15 16:56, Arnd Bergmann wrote:
 On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
 * Marc Zyngier marc.zyng...@arm.com [150115 06:46]:
 On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote:
 On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
 Probably there is a workable strategy, but my knowledge about OMAP is
 close to *nothing*...

 I have a feeling this might bite other platforms too and we just have
 not noticed it yet..
 
 I'm looking through the entire tree now, scanning for machines that have
 GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
 Most platforms using GIC are completely converted to DT and have no
 hardcoded legacy IRQs.
 
 I have checked that cns3xxx and realview are both fine by inspection.
 
 The only one I'm not sure about is shmobile, which looks like it might
 suffer from the same problem. Simon/Magnus, could you verify this
 with a multiplatform kernel on any SoC that has GIC and uses devices
 that have interrupts defined in setup-*.c or board-*.c?

There are 3 patches floating around for shmobile, converting their
non-DT support to directly initializing the GIC instead of relying on
irqchip_init(). That's assuming their DT implementation doesn't use any
of these device declarations.

If they do, we could use a hack similar to the one I implemented for
OMAP, populating the virtual IRQ in the resource at boot time, just
after the irqchip initialization.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Tony Lindgren
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]:
 On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
  * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]:
   I would still like to understand /why/ enabling preempt causes the error.
   Changing the preempt configuration really should not change what happens
   on the bus.  (Think about it.)  It's an indication that there is some
   other error present.
  
  We have a wrong irq number caused by $subject. And the wrong irq
  gets triggered before the dma hardware is configured during dma
  init. And then we get the invalid access error from omap_l3_noc.
 
 ... which should happen whether or not preempt is enabled, which is
 really my point.
 
 We know tha the wrong IRQ gets requested by the driver - and that wrong
 IRQ is requested whether or not we have preempt enabled.  Yet we get
 the warning whether or not preempt is enabled.
 
 The DMA handler is not registered as a threaded handler, so it's not
 depending on a context switch to execute omap2_dma_irq_handler().
 
 Another reason why I don't agree with your explanation is that by the
 time setup_irq() is called, we have already poked at the DMA hardware
 several times - omap_clear_dma() and omap2_disable_irq_lch() will have
 been called for each DMA channel - and both will write to the hardware.
 
 What's more is that the only things left after setup_irq() has been
 called is to possibly reserve the first two DMA channels and print
 the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
 which would finish any unfinished hardware initialisation.
 
 The final reason I don't agree is that I've put a printk() in
 omap2_dma_irq_handler(), and this does not trigger.

Oh, yes that blows my theory completely then.
 
 So, I think this has nothing to do with the DMA hardware /at all/,
 but more to do with the GPIO code, and it suggests that the GPIO code
 publishes IRQs before it is safe for those IRQs to be used.
 
 Maybe it has to do with omap_gpio_irq_handler() being called... added
 printk(), nope, that's not called either.  So it's not an IRQ which
 gets triggered at all.
 
 What is called are (in order):
 
 omap_gpio_unmask_irq()
 omap_set_gpio_irqenable()
 omap_enable_gpio_irqbank()
 
 and this reveals where the problem is, especially when you then add
 instrumentation into the runtime PM functions - and this reveals that
 when a GPIO IRQ is requested, these functions are called while the
 GPIO is runtime suspended.
 
 _That_ is where the *real* problem lies - requesting a GPIO interrupt
 results in the kernel touching possibly runtime-suspended hardware.
 
 The reason it happens with preempt is that preempt introduces scheduling
 points during the kernel boot which would not otherwise be there (with
 preempt disabled, you have to hit an explicit context switch due to
 contention on some lock or a wait in order for some other thread to run.)

OK makes sense.
 
 So, the GPIO driver really needs fixing - and I'd suggest fixing it
 first, before fixing the DMA problem, because the DMA problem allows
 us to see the GPIO problem.

Yes we need to fix that.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Tony Lindgren
* Tony Lindgren t...@atomide.com [150116 09:36]:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]:
  On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
   * Russell King - ARM Linux li...@arm.linux.org.uk [150116 08:33]:
I would still like to understand /why/ enabling preempt causes the 
error.
Changing the preempt configuration really should not change what happens
on the bus.  (Think about it.)  It's an indication that there is some
other error present.
   
   We have a wrong irq number caused by $subject. And the wrong irq
   gets triggered before the dma hardware is configured during dma
   init. And then we get the invalid access error from omap_l3_noc.
  
  ... which should happen whether or not preempt is enabled, which is
  really my point.
  
  We know tha the wrong IRQ gets requested by the driver - and that wrong
  IRQ is requested whether or not we have preempt enabled.  Yet we get
  the warning whether or not preempt is enabled.
  
  The DMA handler is not registered as a threaded handler, so it's not
  depending on a context switch to execute omap2_dma_irq_handler().
  
  Another reason why I don't agree with your explanation is that by the
  time setup_irq() is called, we have already poked at the DMA hardware
  several times - omap_clear_dma() and omap2_disable_irq_lch() will have
  been called for each DMA channel - and both will write to the hardware.
  
  What's more is that the only things left after setup_irq() has been
  called is to possibly reserve the first two DMA channels and print
  the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
  which would finish any unfinished hardware initialisation.
  
  The final reason I don't agree is that I've put a printk() in
  omap2_dma_irq_handler(), and this does not trigger.
 
 Oh, yes that blows my theory completely then.
  
  So, I think this has nothing to do with the DMA hardware /at all/,
  but more to do with the GPIO code, and it suggests that the GPIO code
  publishes IRQs before it is safe for those IRQs to be used.
  
  Maybe it has to do with omap_gpio_irq_handler() being called... added
  printk(), nope, that's not called either.  So it's not an IRQ which
  gets triggered at all.
  
  What is called are (in order):
  
  omap_gpio_unmask_irq()
  omap_set_gpio_irqenable()
  omap_enable_gpio_irqbank()
  
  and this reveals where the problem is, especially when you then add
  instrumentation into the runtime PM functions - and this reveals that
  when a GPIO IRQ is requested, these functions are called while the
  GPIO is runtime suspended.
  
  _That_ is where the *real* problem lies - requesting a GPIO interrupt
  results in the kernel touching possibly runtime-suspended hardware.
  
  The reason it happens with preempt is that preempt introduces scheduling
  points during the kernel boot which would not otherwise be there (with
  preempt disabled, you have to hit an explicit context switch due to
  contention on some lock or a wait in order for some other thread to run.)
 
 OK makes sense.
  
  So, the GPIO driver really needs fixing - and I'd suggest fixing it
  first, before fixing the DMA problem, because the DMA problem allows
  us to see the GPIO problem.
 
 Yes we need to fix that.

Posted a minimal fix for that one as a separate thread:

[PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Russell King - ARM Linux
On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote:
 * Tony Lindgren t...@atomide.com [150116 09:36]:
  * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]:
   So, the GPIO driver really needs fixing - and I'd suggest fixing it
   first, before fixing the DMA problem, because the DMA problem allows
   us to see the GPIO problem.
  
  Yes we need to fix that.
 
 Posted a minimal fix for that one as a separate thread:
 
 [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()

Thanks, I'll throw that onto the build tree for tonights build.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Tony Lindgren
* Russell King - ARM Linux li...@arm.linux.org.uk [150116 15:00]:
 On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote:
  * Tony Lindgren t...@atomide.com [150116 09:36]:
   * Russell King - ARM Linux li...@arm.linux.org.uk [150116 09:25]:
So, the GPIO driver really needs fixing - and I'd suggest fixing it
first, before fixing the DMA problem, because the DMA problem allows
us to see the GPIO problem.
   
   Yes we need to fix that.
  
  Posted a minimal fix for that one as a separate thread:
  
  [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()
 
 Thanks, I'll throw that onto the build tree for tonights build.

Great thanks!

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-16 Thread Simon Horman
On Fri, Jan 16, 2015 at 05:23:05PM +, Marc Zyngier wrote:
 On 16/01/15 16:56, Arnd Bergmann wrote:
  On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
  * Marc Zyngier marc.zyng...@arm.com [150115 06:46]:
  On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de 
  wrote:
  On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
  Probably there is a workable strategy, but my knowledge about OMAP is
  close to *nothing*...
 
  I have a feeling this might bite other platforms too and we just have
  not noticed it yet..
  
  I'm looking through the entire tree now, scanning for machines that have
  GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
  Most platforms using GIC are completely converted to DT and have no
  hardcoded legacy IRQs.
  
  I have checked that cns3xxx and realview are both fine by inspection.
  
  The only one I'm not sure about is shmobile, which looks like it might
  suffer from the same problem. Simon/Magnus, could you verify this
  with a multiplatform kernel on any SoC that has GIC and uses devices
  that have interrupts defined in setup-*.c or board-*.c?
 
 There are 3 patches floating around for shmobile, converting their
 non-DT support to directly initializing the GIC instead of relying on
 irqchip_init().

There is also a fourth patch pending to fix one last SoC, the r8a73a4.
My understanding is that should be the end of the problems that
we have been seeing in this area.

 That's assuming their DT implementation doesn't use any
 of these device declarations.

I believe that assumption is correct. shmobile does not use any
devices that have interrupts defined in setup-*.c or board-*.c when
booting using multiplatform.

 If they do, we could use a hack similar to the one I implemented for
 OMAP, populating the virtual IRQ in the resource at boot time, just
 after the irqchip initialization.
 
 Thanks,
 
   M.
 -- 
 Jazz is not dead. It just smells funny...
 
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Russell King - ARM Linux
On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
 * Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
  I don't think we've proven a link there.  While you're right that it
  causes the wrong interrupt to be claimed, I have two kernels here,
  both claim the same interrupt, one which is multi-platform and issues
  that strange warning, and one which targets only OMAP4 which doesn't.
  
  There's something else going on which causes the bus errors which we
  haven't found.
 
 I think it gets triggered if you enable PREEMPT.

That's something which we can try to prove... build running now with
CONFIG_PREEMPT=y

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Tony Lindgren
* Russell King - ARM Linux li...@arm.linux.org.uk [150115 02:53]:
 On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote:
  Hi all,
  
  Looks like the legacy IRQ numbers are now all wrong at least for omap4
  since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.).
  
  Instead of this:
  
  # cat /proc/interrupts 
  CPU0   CPU1   
   29:   1124981   GIC  29  twd
   39:  0  0   GIC  39  TWL6030-PIH
   41:  0  0   GIC  41  l3-dbg-irq
   42:  0  0   GIC  42  l3-app-irq
   44:  0  0   GIC  44  DMA
   45:   7854  0   GIC  45  omap-dma-engine
   52:  0  0   GIC  52  gpmc
  ...
  
  
  We now have:
  
  # cat /proc/interrupts 
  CPU0   CPU1   
   16:343  0   GIC  69  gp_timer
   17:   1160   1017   GIC  29  twd
   18:  0  0   GIC  41  l3-dbg-irq
   19:  1  0   GIC  42  l3-app-irq
   22:   7850  0   GIC  45  omap-dma-engine
   44:  0  0  4a31.gpio  18  DMA
   61:   2730  0  48055000.gpio   2  eth0
  223:  0  0   GIC  52  gpmc
  ...
  
  So the DMA interrupt using the legacy mapping with something like
  irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
  at least omaps still have a bunch of the legacy interrupts still
  around.
  
  And that naturally produces all kinds of strange errors like:
  
  WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 
  l3_interrupt_handler+0x214/0x340()
  4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access 
  in Supervisor mode during Functional access
  ...
  [c05f21e4] (__irq_svc) from [c05f1974] 
  (_raw_spin_unlock_irqrestore+0x34/0x44)
  [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] 
  (__setup_irq+0x244/0x530)
  [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c)
  [c00917d4] (setup_irq) from [c0039c8c] 
  (omap_system_dma_probe+0x1d4/0x2b4)
  [c0039c8c] (omap_system_dma_probe) from [c03b2200] 
  (platform_drv_probe+0x44/0xa4)
  ...
 
 I don't think we've proven a link there.  While you're right that it
 causes the wrong interrupt to be claimed, I have two kernels here,
 both claim the same interrupt, one which is multi-platform and issues
 that strange warning, and one which targets only OMAP4 which doesn't.
 
 There's something else going on which causes the bus errors which we
 haven't found.

I think it gets triggered if you enable PREEMPT.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Marc Zyngier
On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote:
 On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
 Of course, this is in no way a proper fix, but I suppose the OMAP DT is
 still missing a few bits...

 I must be missing something here, but all the interrupts are listed
 correctly in the DT, so what is the omap_hwmod_irq_info actually
 achieving on omap4 and omap5?

 Would it work if we just remove the incorrect copy of the resource
 and use the one that comes from DT?

By the look of it, omap_hwmod_irq_info serves multiple purposes:
- low level configuration (pads, probably more stuff)
- interrupt description for some drivers, using resources.

It should be fairly easy to do the latter, but the former looks more
tricky (it would push the pad configuration down to the drivers, which
is avoided at the moment).

Probably there is a workable strategy, but my knowledge about OMAP is
close to *nothing*...

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Tony Lindgren
* Marc Zyngier marc.zyng...@arm.com [150115 06:46]:
 On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote:
  On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
  Of course, this is in no way a proper fix, but I suppose the OMAP DT is
  still missing a few bits...
 
  I must be missing something here, but all the interrupts are listed
  correctly in the DT, so what is the omap_hwmod_irq_info actually
  achieving on omap4 and omap5?
 
  Would it work if we just remove the incorrect copy of the resource
  and use the one that comes from DT?
 
 By the look of it, omap_hwmod_irq_info serves multiple purposes:
 - low level configuration (pads, probably more stuff)

The muxing is only done for omap3 in legacy booting mode.

 - interrupt description for some drivers, using resources.

That's still used to create legacy platform_device entries on omap4
for legacy DMA, DSS, PRM. The twl6040 entries are already unused
and I have a patch queued to remove them.
 
 It should be fairly easy to do the latter, but the former looks more
 tricky (it would push the pad configuration down to the drivers, which
 is avoided at the moment).

The pad configuration is already done with pinctrl-single.
 
 Probably there is a workable strategy, but my knowledge about OMAP is
 close to *nothing*...

I have a feeling this might bite other platforms too and we just have
not noticed it yet..

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Arnd Bergmann
On Thursday 15 January 2015 14:43:35 Marc Zyngier wrote:
 On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann a...@arndb.de wrote:
  On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
  Of course, this is in no way a proper fix, but I suppose the OMAP DT is
  still missing a few bits...
 
  I must be missing something here, but all the interrupts are listed
  correctly in the DT, so what is the omap_hwmod_irq_info actually
  achieving on omap4 and omap5?
 
  Would it work if we just remove the incorrect copy of the resource
  and use the one that comes from DT?
 
 By the look of it, omap_hwmod_irq_info serves multiple purposes:
 - low level configuration (pads, probably more stuff)
 - interrupt description for some drivers, using resources.
 
 It should be fairly easy to do the latter, but the former looks more
 tricky (it would push the pad configuration down to the drivers, which
 is avoided at the moment).
 
 Probably there is a workable strategy, but my knowledge about OMAP is
 close to *nothing*...

I don't know much about OMAP either so maybe it's better to let someone
comment who understands this better. ;-)

A number of commits have in the past removed omap_hwmod_irq_info entries
here, the last one was  09182ab11b49 (ARM: OMAP4: hwmod data: Remove
irq entries from mcspi, mmc hwmods).

From what I can tell, we could drop the omap44xx_dss_dispc_irqs,
omap44xx_dss_dsi1_irqs, omap44xx_dss_dsi2_irqs, and omap44xx_dss_hdmi_irqs
in the samem way, after this data got added to DT as part of
cfe86fcf2d0079f03 (ARM: omap4.dtsi: add omapdss information).

Unfortunately, the omap-dma driver once again gets in the way, since
there are still users of this that are not converted to use dmaengine,
and unlike the proper driver (drivers/dma/omap-dma.c), the legacy
driver (arch/arm/mach-omap2/dma.c) does not get probed from DT and
instead gets the wrong irq numbers from hwmod now.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Russell King - ARM Linux
On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote:
 Hi all,
 
 Looks like the legacy IRQ numbers are now all wrong at least for omap4
 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.).
 
 Instead of this:
 
 # cat /proc/interrupts 
 CPU0   CPU1   
  29:   1124981   GIC  29  twd
  39:  0  0   GIC  39  TWL6030-PIH
  41:  0  0   GIC  41  l3-dbg-irq
  42:  0  0   GIC  42  l3-app-irq
  44:  0  0   GIC  44  DMA
  45:   7854  0   GIC  45  omap-dma-engine
  52:  0  0   GIC  52  gpmc
 ...
 
 
 We now have:
 
 # cat /proc/interrupts 
 CPU0   CPU1   
  16:343  0   GIC  69  gp_timer
  17:   1160   1017   GIC  29  twd
  18:  0  0   GIC  41  l3-dbg-irq
  19:  1  0   GIC  42  l3-app-irq
  22:   7850  0   GIC  45  omap-dma-engine
  44:  0  0  4a31.gpio  18  DMA
  61:   2730  0  48055000.gpio   2  eth0
 223:  0  0   GIC  52  gpmc
 ...
 
 So the DMA interrupt using the legacy mapping with something like
 irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
 at least omaps still have a bunch of the legacy interrupts still
 around.
 
 And that naturally produces all kinds of strange errors like:
 
 WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 
 l3_interrupt_handler+0x214/0x340()
 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in 
 Supervisor mode during Functional access
 ...
 [c05f21e4] (__irq_svc) from [c05f1974] 
 (_raw_spin_unlock_irqrestore+0x34/0x44)
 [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] 
 (__setup_irq+0x244/0x530)
 [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c)
 [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4)
 [c0039c8c] (omap_system_dma_probe) from [c03b2200] 
 (platform_drv_probe+0x44/0xa4)
 ...

I don't think we've proven a link there.  While you're right that it
causes the wrong interrupt to be claimed, I have two kernels here,
both claim the same interrupt, one which is multi-platform and issues
that strange warning, and one which targets only OMAP4 which doesn't.

There's something else going on which causes the bus errors which we
haven't found.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Marc Zyngier
On Wed, Jan 14 2015 at 10:14:08 pm GMT, Tony Lindgren t...@atomide.com wrote:
 Hi all,

 Looks like the legacy IRQ numbers are now all wrong at least for omap4
 since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.).

 Instead of this:

 # cat /proc/interrupts 
 CPU0   CPU1   
  29:   1124981   GIC  29  twd
  39:  0  0   GIC  39  TWL6030-PIH
  41:  0  0   GIC  41  l3-dbg-irq
  42:  0  0   GIC  42  l3-app-irq
  44:  0  0   GIC  44  DMA
  45:   7854  0   GIC  45  omap-dma-engine
  52:  0  0   GIC  52  gpmc
 ...


 We now have:

 # cat /proc/interrupts 
 CPU0   CPU1   
  16:343  0   GIC  69  gp_timer
  17:   1160   1017   GIC  29  twd
  18:  0  0   GIC  41  l3-dbg-irq
  19:  1  0   GIC  42  l3-app-irq
  22:   7850  0   GIC  45  omap-dma-engine
  44:  0  0  4a31.gpio  18  DMA
  61:   2730  0  48055000.gpio   2  eth0
 223:  0  0   GIC  52  gpmc
 ...

 So the DMA interrupt using the legacy mapping with something like
 irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
 at least omaps still have a bunch of the legacy interrupts still
 around.

Holy crap. How much of this do we have hanging around?

 And that naturally produces all kinds of strange errors like:

 WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 
 l3_interrupt_handler+0x214/0x340()
 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in 
 Supervisor mode during Functional access
 ...
 [c05f21e4] (__irq_svc) from [c05f1974] 
 (_raw_spin_unlock_irqrestore+0x34/0x44)
 [c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] 
 (__setup_irq+0x244/0x530)
 [c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c)
 [c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4)
 [c0039c8c] (omap_system_dma_probe) from [c03b2200] 
 (platform_drv_probe+0x44/0xa4)
 ...

 Looks like the logic changed from:

 if (of_property_read_u32(node, arm,routable-irqs, nr_routable_irqs))

 to just

 if (node)

 Which now causes irq_domain_add_linear() to be called instead of
 irq_domain_add_legacy(), which causes the breakage.

 Anybody got a sane fix in mind for the -rc series, or should we just
 revert it for now?

Reverting it is going to kill other platforms, and I'd rather have a
workaround, short of fixing it for good (which seems ambitious at -rc4).

How about something along these lines:

diff --git a/arch/arm/mach-omap2/common.h b/arch/arm/mach-omap2/common.h
index 377eea8..b664494 100644
--- a/arch/arm/mach-omap2/common.h
+++ b/arch/arm/mach-omap2/common.h
@@ -211,6 +211,7 @@ extern struct device *omap2_get_iva_device(void);
 extern struct device *omap2_get_l3_device(void);
 extern struct device *omap4_get_dsp_device(void);
 
+unsigned int omap4_xlate_irq(unsigned int hwirq);
 void omap_gic_of_init(void);
 
 #ifdef CONFIG_CACHE_L2X0
diff --git a/arch/arm/mach-omap2/omap4-common.c 
b/arch/arm/mach-omap2/omap4-common.c
index b7cb44a..cc30e49 100644
--- a/arch/arm/mach-omap2/omap4-common.c
+++ b/arch/arm/mach-omap2/omap4-common.c
@@ -256,6 +256,38 @@ static int __init omap4_sar_ram_init(void)
 }
 omap_early_initcall(omap4_sar_ram_init);
 
+static struct of_device_id gic_match[] = {
+   { .compatible = arm,cortex-a9-gic, },
+   { .compatible = arm,cortex-a15-gic, },
+   { },
+};
+
+static struct device_node *gic_node;
+
+unsigned int omap4_xlate_irq(unsigned int hwirq)
+{
+   struct of_phandle_args irq_data;
+   unsigned int irq;
+
+   if (!gic_node)
+   gic_node = of_find_matching_node(NULL, gic_match);
+
+   if (WARN_ON(!gic_node))
+   return hwirq;
+
+   irq_data.np = gic_node;
+   irq_data.args_count = 3;
+   irq_data.args[0] = 0;
+   irq_data.args[1] = hwirq - OMAP44XX_IRQ_GIC_START;
+   irq_data.args[2] = IRQ_TYPE_LEVEL_HIGH;
+
+   irq = irq_create_of_mapping(irq_data);
+   if (WARN_ON(!irq))
+   irq = hwirq;
+
+   return irq;
+}
+
 void __init omap_gic_of_init(void)
 {
struct device_node *np;
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index cbb908d..9025fff 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -3534,9 +3534,15 @@ int omap_hwmod_fill_resources(struct omap_hwmod *oh, 
struct resource *res)
 
mpu_irqs_cnt = _count_mpu_irqs(oh);
for (i = 0; i  mpu_irqs_cnt; i++) {
+   unsigned int irq;
+
+   if (oh-xlate_irq)
+   irq = oh-xlate_irq((oh-mpu_irqs + i)-irq);
+   else
+   irq = (oh-mpu_irqs + i)-irq;
(res + r)-name = (oh-mpu_irqs + i)-name;
-   (res + r)-start = (oh-mpu_irqs + 

Re: Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-15 Thread Arnd Bergmann
On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
 Of course, this is in no way a proper fix, but I suppose the OMAP DT is
 still missing a few bits...

I must be missing something here, but all the interrupts are listed
correctly in the DT, so what is the omap_hwmod_irq_info actually
achieving on omap4 and omap5?

Would it work if we just remove the incorrect copy of the resource
and use the one that comes from DT?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Regression with legacy IRQ numbers caused by 9a1091ef0017

2015-01-14 Thread Tony Lindgren
Hi all,

Looks like the legacy IRQ numbers are now all wrong at least for omap4
since commit 9a1091ef0017 (irqchip: gic: Support hierarchy irq domain.).

Instead of this:

# cat /proc/interrupts 
CPU0   CPU1   
 29:   1124981   GIC  29  twd
 39:  0  0   GIC  39  TWL6030-PIH
 41:  0  0   GIC  41  l3-dbg-irq
 42:  0  0   GIC  42  l3-app-irq
 44:  0  0   GIC  44  DMA
 45:   7854  0   GIC  45  omap-dma-engine
 52:  0  0   GIC  52  gpmc
...


We now have:

# cat /proc/interrupts 
CPU0   CPU1   
 16:343  0   GIC  69  gp_timer
 17:   1160   1017   GIC  29  twd
 18:  0  0   GIC  41  l3-dbg-irq
 19:  1  0   GIC  42  l3-app-irq
 22:   7850  0   GIC  45  omap-dma-engine
 44:  0  0  4a31.gpio  18  DMA
 61:   2730  0  48055000.gpio   2  eth0
223:  0  0   GIC  52  gpmc
...

So the DMA interrupt using the legacy mapping with something like
irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
at least omaps still have a bunch of the legacy interrupts still
around.

And that naturally produces all kinds of strange errors like:

WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 
l3_interrupt_handler+0x214/0x340()
4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in 
Supervisor mode during Functional access
...
[c05f21e4] (__irq_svc) from [c05f1974] 
(_raw_spin_unlock_irqrestore+0x34/0x44)
[c05f1974] (_raw_spin_unlock_irqrestore) from [c00914a8] 
(__setup_irq+0x244/0x530)
[c00914a8] (__setup_irq) from [c00917d4] (setup_irq+0x40/0x8c)
[c00917d4] (setup_irq) from [c0039c8c] (omap_system_dma_probe+0x1d4/0x2b4)
[c0039c8c] (omap_system_dma_probe) from [c03b2200] 
(platform_drv_probe+0x44/0xa4)
...

Looks like the logic changed from:

if (of_property_read_u32(node, arm,routable-irqs, nr_routable_irqs))

to just

if (node)

Which now causes irq_domain_add_linear() to be called instead of
irq_domain_add_legacy(), which causes the breakage.

Anybody got a sane fix in mind for the -rc series, or should we just
revert it for now?

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html