Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Russell King - ARM Linux
On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote:
 I am trying to get the latest IRQ registers from a timer or a work queue
 but I am running into problems:
 - get_irq_regs() returns NULL in some cases,

It will always return NULL outside of IRQ context - and only returns valid
pointers when used inside IRQ context.

It's one of these things that nests itself - when you have several IRQs
being processed on one CPU, there are several register contexts saved,
and get_irq_regs() returns the most recent one.

 The use case is that the performance unit (PMNC) of the Cortex A8 has some 
 serious bug, in short the performance counters overflow IRQ is to be avoided.

I don't follow.  None of the PMNC support code in the mainline kernel
uses get_irq_regs() outside of IRQ context.

 Some questions:
 - is there a way to get the last 'real' IRQ registers from a timer or work 
 queue handler?

No.  Outside of IRQ events, the saved IRQ context does not exist.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
 On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote:
  I am trying to get the latest IRQ registers from a timer or a work queue
  but I am running into problems:
  - get_irq_regs() returns NULL in some cases,

 It will always return NULL outside of IRQ context - and only returns valid
 pointers when used inside IRQ context.
Ok got it.

 It's one of these things that nests itself - when you have several IRQs
 being processed on one CPU, there are several register contexts saved,
 and get_irq_regs() returns the most recent one.

  The use case is that the performance unit (PMNC) of the Cortex A8 has
  some serious bug, in short the performance counters overflow IRQ is to be
  avoided.

 I don't follow.  None of the PMNC support code in the mainline kernel
 uses get_irq_regs() outside of IRQ context.
That is correct. The Cortex A8 needs some special treatment.
The errata says that if the counters are overflowing at the same time as a 
coprocessor access is performed, the perf unit gets reset and/or locks up. In 
short the counters overflow is to be avoided and so the PMNC IRQ.

  Some questions:
  - is there a way to get the last 'real' IRQ registers from a timer or
  work queue handler?

 No.  Outside of IRQ events, the saved IRQ context does not exist.
Ok. I wonder how to implement it correctly from here.
The ultimate goal is to feed the registers to oprofile for statistics 
gathering (mostly the PC). I do not see much benefit from oprofile without 
the PC statistics.

Thanks,
Jean
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Russell King - ARM Linux
On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote:
 On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
  It's one of these things that nests itself - when you have several IRQs
  being processed on one CPU, there are several register contexts saved,
  and get_irq_regs() returns the most recent one.
 
   The use case is that the performance unit (PMNC) of the Cortex A8 has
   some serious bug, in short the performance counters overflow IRQ is to be
   avoided.
 
  I don't follow.  None of the PMNC support code in the mainline kernel
  uses get_irq_regs() outside of IRQ context.

 That is correct. The Cortex A8 needs some special treatment.
 The errata says that if the counters are overflowing at the same time as a 
 coprocessor access is performed, the perf unit gets reset and/or locks up. In 
 short the counters overflow is to be avoided and so the PMNC IRQ.

Are you talking about 628216?
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 18:07:44 Russell King - ARM Linux wrote:
 On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote:
  On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
   It's one of these things that nests itself - when you have several IRQs
   being processed on one CPU, there are several register contexts saved,
   and get_irq_regs() returns the most recent one.
  
The use case is that the performance unit (PMNC) of the Cortex A8 has
some serious bug, in short the performance counters overflow IRQ is
to be avoided.
  
   I don't follow.  None of the PMNC support code in the mainline kernel
   uses get_irq_regs() outside of IRQ context.
 
  That is correct. The Cortex A8 needs some special treatment.
  The errata says that if the counters are overflowing at the same time as
  a coprocessor access is performed, the perf unit gets reset and/or locks
  up. In short the counters overflow is to be avoided and so the PMNC IRQ.

 Are you talking about 628216?
Yes that is the one. Sorry not to mention it sooner.

Jean

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Siarhei Siamashka
On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
 Hi,

 I am trying to get the latest IRQ registers from a timer or a work queue
 but I am running into problems:
 - get_irq_regs() returns NULL in some cases, so it is unsuable and even
 causes crash when trying to get the registers values from the returned ptr
 - I never get user space registers, only kernel

 The use case is that the performance unit (PMNC) of the Cortex A8 has some
 serious bug, in short the performance counters overflow IRQ is to be
 avoided. The solution I am implementing is to read and reset the counters
 from a work queue that is triggered by a timer.

Regarding this oprofile related part. I wonder how you can get oprofile
working properly (providing non-bogus results) without performance
counters overflow IRQ generation? 

Are you trying to implement (in a clean way) something similar to
http://marc.info/?l=oprofile-listm=123688347009580w=2

Or is it going to be a different workaround?

-- 
Best regards,
Siarhei Siamashka
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
Hi Siarhei Siamashka,
On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote:
 On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
  Hi,
 
  I am trying to get the latest IRQ registers from a timer or a work queue
  but I am running into problems:
  - get_irq_regs() returns NULL in some cases, so it is unsuable and even
  causes crash when trying to get the registers values from the returned
  ptr - I never get user space registers, only kernel
 
  The use case is that the performance unit (PMNC) of the Cortex A8 has
  some serious bug, in short the performance counters overflow IRQ is to be
  avoided. The solution I am implementing is to read and reset the counters
  from a work queue that is triggered by a timer.

 Regarding this oprofile related part. I wonder how you can get oprofile
 working properly (providing non-bogus results) without performance
 counters overflow IRQ generation?

 Are you trying to implement (in a clean way) something similar to
 http://marc.info/?l=oprofile-listm=123688347009580w=2

 Or is it going to be a different workaround?
I am trying to get a different approach, starting from the errata description. 
The idea is to avoid the counters from overflowing, which could cause a PMNC 
unit reset or lock-up (or both).

Here are the implementation details:
- use a timer to read and reset the counters, then fire a work queue
- in the work queue the counters values are converted to oprofile samples
- the proper locking is used to avoid some races between the various tasks

I am nearly done with it but I am now running into problems with PM 
(suspend/resume) and get_irq_regs().

What do you think?
How far are you on your side? Did you stress test the solution? Is the PMNC 
recovery always successful?

Regards,
Jean
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Russell King - ARM Linux
On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
 On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
  I am trying to get the latest IRQ registers from a timer or a work queue
  but I am running into problems:
  - get_irq_regs() returns NULL in some cases, so it is unsuable and even
  causes crash when trying to get the registers values from the returned ptr
  - I never get user space registers, only kernel
 
  The use case is that the performance unit (PMNC) of the Cortex A8 has some
  serious bug, in short the performance counters overflow IRQ is to be
  avoided. The solution I am implementing is to read and reset the counters
  from a work queue that is triggered by a timer.
 
 Regarding this oprofile related part. I wonder how you can get oprofile
 working properly (providing non-bogus results) without performance
 counters overflow IRQ generation? 

I don't think you can - triggering capture on overflow is precisely how
oprofile works.

The erratum talks about polling for overflow.  By doing this, you are in
a well defined part of the kernel, which is obviously going to be shown
as a hot path for every counter, thus making oprofile useless for kernel
work.

Deferring the interrupt to a workqueue doesn't resolve the problem either.
The problem has nothing to do with what happens after the interrupt
occurs - it's about interrupts themselves being lost.

I think just accepting that this erratum breaks oprofile is the only
realistic solution. ;(
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Russell King - ARM Linux
On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote:
 I am trying to get a different approach, starting from the errata
 description.  The idea is to avoid the counters from overflowing,
 which could cause a PMNC unit reset or lock-up (or both).

But this can't work.

Oprofile essentially works as follows:

  You set the number (N) of events you wish to occur between each sample.
  When N events have occured, you record the stacktrace and reset the
  counter so it fires after another N events.

Now, you could start the counters at zero every time, and then poll them
via a timer.  When the counter value is larger than N, you could log a
stacktrace and zero the counter.

However, this suffers one very serious problem - if you're wanting to
measure something at an interval which occurs faster than your timer,
you're going to get misleading results.

You could set the timer to fire at a high rate, but then that's going
to upset things like cache miss, cache hit, etc measurements.

 Here are the implementation details:
 - use a timer to read and reset the counters, then fire a work queue
 - in the work queue the counters values are converted to oprofile samples
 - the proper locking is used to avoid some races between the various tasks

This sounds over complicated.  I see no reason for a workqueue to be
involved anywhere near the oprofile sample code.

 I am nearly done with it but I am now running into problems with PM 
 (suspend/resume) and get_irq_regs().

You really really really can't use get_irq_regs() outside of IRQ context.
The stored registers just do not exist anymore - they've been overwritten
by whatever exception or system call you're currently in.

You can't create a copy of them - copies will be overwritten on the very
next (nested) interrupt.  You don't know which interrupt is the first
interrupt to occur.

I really think that the only option here is to just accept that oprofile
is crucified by this errata.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 19:37:57 Russell King - ARM Linux wrote:
 On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
  On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
   I am trying to get the latest IRQ registers from a timer or a work
   queue but I am running into problems:
   - get_irq_regs() returns NULL in some cases, so it is unsuable and even
   causes crash when trying to get the registers values from the returned
   ptr - I never get user space registers, only kernel
  
   The use case is that the performance unit (PMNC) of the Cortex A8 has
   some serious bug, in short the performance counters overflow IRQ is to
   be avoided. The solution I am implementing is to read and reset the
   counters from a work queue that is triggered by a timer.
 
  Regarding this oprofile related part. I wonder how you can get oprofile
  working properly (providing non-bogus results) without performance
  counters overflow IRQ generation?

 I don't think you can - triggering capture on overflow is precisely how
 oprofile works.

 The erratum talks about polling for overflow.  By doing this, you are in
 a well defined part of the kernel, which is obviously going to be shown
 as a hot path for every counter, thus making oprofile useless for kernel
 work.
I think it is possible, well if you except the get_irq_regs() problem.
The idea is to read and reset the counters before the overflow, instead of 
loading them with a small negative value and waiting for the overflow to 
happen.

 Deferring the interrupt to a workqueue doesn't resolve the problem either.
 The problem has nothing to do with what happens after the interrupt
 occurs - it's about interrupts themselves being lost.
The errata is about a lost event and/or a lock-up of the PMNC unit at the time 
of overflow.

 I think just accepting that this erratum breaks oprofile is the only
 realistic solution. ;(
Completely agree. However it would be nice to have a workaround, as un-elegant 
as it can be ;(
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 19:46:33 Russell King - ARM Linux wrote:
 On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote:
  I am trying to get a different approach, starting from the errata
  description.  The idea is to avoid the counters from overflowing,
  which could cause a PMNC unit reset or lock-up (or both).

 But this can't work.

 Oprofile essentially works as follows:

   You set the number (N) of events you wish to occur between each sample.
   When N events have occured, you record the stacktrace and reset the
   counter so it fires after another N events.

 Now, you could start the counters at zero every time, and then poll them
 via a timer.  When the counter value is larger than N, you could log a
 stacktrace and zero the counter.

 However, this suffers one very serious problem - if you're wanting to
 measure something at an interval which occurs faster than your timer,
 you're going to get misleading results.
The counters are 32-bit wide and the maximum counting frequency is 2 events 
per cycle (cf. errata). That means you get plenty of time before the counters 
overflow.

 You could set the timer to fire at a high rate, but then that's going
 to upset things like cache miss, cache hit, etc measurements.
Correct.
You need a tradeoff for the timer period.

  Here are the implementation details:
  - use a timer to read and reset the counters, then fire a work queue
  - in the work queue the counters values are converted to oprofile samples
  - the proper locking is used to avoid some races between the various
  tasks

 This sounds over complicated. 
It is ;p

 I see no reason for a workqueue to be 
 involved anywhere near the oprofile sample code.
Got it.

  I am nearly done with it but I am now running into problems with PM
  (suspend/resume) and get_irq_regs().

 You really really really can't use get_irq_regs() outside of IRQ context.
 The stored registers just do not exist anymore - they've been overwritten
 by whatever exception or system call you're currently in.

 You can't create a copy of them - copies will be overwritten on the very
 next (nested) interrupt.  You don't know which interrupt is the first
 interrupt to occur.
Doh!

 I really think that the only option here is to just accept that oprofile
 is crucified by this errata.
Amen!

Thanks,
Jean
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 19:54:23 Siarhei Siamashka wrote:
 On Monday 29 June 2009 19:58:41 ext Jean Pihet wrote:
  Hi Siarhei Siamashka,
 
  On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote:
   On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
Hi,
   
I am trying to get the latest IRQ registers from a timer or a work
queue but I am running into problems:
- get_irq_regs() returns NULL in some cases, so it is unsuable and
even causes crash when trying to get the registers values from the
returned ptr - I never get user space registers, only kernel
   
The use case is that the performance unit (PMNC) of the Cortex A8 has
some serious bug, in short the performance counters overflow IRQ is
to be avoided. The solution I am implementing is to read and reset
the counters from a work queue that is triggered by a timer.
  
   Regarding this oprofile related part. I wonder how you can get oprofile
   working properly (providing non-bogus results) without performance
   counters overflow IRQ generation?
  
   Are you trying to implement (in a clean way) something similar to
   http://marc.info/?l=oprofile-listm=123688347009580w=2
  
   Or is it going to be a different workaround?
 
  I am trying to get a different approach, starting from the errata
  description. The idea is to avoid the counters from overflowing, which
  could cause a PMNC unit reset or lock-up (or both).
 
  Here are the implementation details:
  - use a timer to read and reset the counters, then fire a work queue
  - in the work queue the counters values are converted to oprofile samples
  - the proper locking is used to avoid some races between the various
  tasks
 
  I am nearly done with it but I am now running into problems with PM
  (suspend/resume) and get_irq_regs().
 
  What do you think?

 Russel was the first to reply :)

 But we also discussed this hybrid model some time ago, and there is a
 clear counterexample where it fails:
 http://www.nabble.com/Re%3A--PATCH-0-1--OMAP-gptimer-based-event-monitor-dr
iver-for-oprofile-p21374285.html
All right, sorry I was not aware of that discussion. So the PMNC unit is 
broken beyond repair. BTW good description and test results!

  How far are you on your side? Did you stress test the solution? Is the
  PMNC recovery always successful?

 I ended up just using a timer with high frequency of samples generation. it
 works without hassle and is sufficient for the majority of cases.
Ok. It looks like it is the best we can do.

Thanks,
Jean
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Siarhei Siamashka
On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote:
 On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
  On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
   I am trying to get the latest IRQ registers from a timer or a work
   queue but I am running into problems:
   - get_irq_regs() returns NULL in some cases, so it is unsuable and even
   causes crash when trying to get the registers values from the returned
   ptr - I never get user space registers, only kernel
  
   The use case is that the performance unit (PMNC) of the Cortex A8 has
   some serious bug, in short the performance counters overflow IRQ is to
   be avoided. The solution I am implementing is to read and reset the
   counters from a work queue that is triggered by a timer.
 
  Regarding this oprofile related part. I wonder how you can get oprofile
  working properly (providing non-bogus results) without performance
  counters overflow IRQ generation?

 I don't think you can - triggering capture on overflow is precisely how
 oprofile works.

 The erratum talks about polling for overflow.  By doing this, you are in
 a well defined part of the kernel, which is obviously going to be shown
 as a hot path for every counter, thus making oprofile useless for kernel
 work.

 Deferring the interrupt to a workqueue doesn't resolve the problem either.
 The problem has nothing to do with what happens after the interrupt
 occurs - it's about interrupts themselves being lost.

 I think just accepting that this erratum breaks oprofile is the only
 realistic solution. ;(

I also thought about the same initially. But the problem still looks like it
can be workarounded, admittedly in quite a dirty way.

We just need to use not a periodic timer, but kind of a watchdog (this can be
implemented with OMAP GPTIMER).

As long as PMU interrupts are coming fast, watchdog is frequently reset and
never shows up anywhere. Everything is working nice.

Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU
state. As PMU could get broken something like 10 times per second in the worst
case in my experiments, having ~10 ms for a watchdog trigger period seemed to
be a  reasonable empirical value.  So in this conditions, PMU will be in a
nonworking state approximately less than 10% of the time in the worst
practical case. Not very nice, but not completely ugly either.

Another problematic condition is when PMU is fine, but is not generating
events naturally (for example we have configured it for cache misses, but are
burning cpu in a loop which is not accessing memory at all). In this case a
watchdog will be triggered periodically for no reason, generating the noise
in profiling statistics. This noise needs to be filtered out, and seems like
it is possible to do it. The trick is to reset watchdog counter to a lower
value than it is typically reset in PMU IRQ handler. This way, whenever PMU
interrupt is generated, we check if watchdog counter is below the normal
threshold. If it is lower, then we know that watchdog interrupt was triggered
recently and this sample can be ignored. The difference between normal
watchdog counter reset value and the value which gets set on watchdog
interrupts should provide sufficient time to get out of the watchdog interrupt
handler and its related code, so that it does not show up in statistics that
much.

A working proof of concept patch was submitted there:
http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
Sorry for not posting it to one of the kernel mailing lists, but I thought
that beagleboard mailing list was a good place to find users who may
want to try it and evaluate if it has any practical value. Maybe it was not a
very wise decision.

Unfortunately I'm not a kernel hacker and cleaning up the patch may take
too much time and efforts, taking into account my current knowledge. I would
be happy if somebody else with more hands-on kernel experience could make a
clean and usable Cortex-A8 PMU workaround. I don't care about getting some
part of credit for it or not, the end result is more important :)

One of the obvious problems with the patch (other than race conditions) is
that it is using OMAP-specific GPTIMER. Is there something more portable in
the kernel to provide similar functionality? Or are there any Cortex-A8 r1
cores other than OMAP3 in the wild?

-- 
Best regards,
Siarhei Siamashka
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Jean Pihet
On Monday 29 June 2009 20:38:59 Siarhei Siamashka wrote:
 On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote:
  On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
   On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
I am trying to get the latest IRQ registers from a timer or a work
queue but I am running into problems:
- get_irq_regs() returns NULL in some cases, so it is unsuable and
even causes crash when trying to get the registers values from the
returned ptr - I never get user space registers, only kernel
   
The use case is that the performance unit (PMNC) of the Cortex A8 has
some serious bug, in short the performance counters overflow IRQ is
to be avoided. The solution I am implementing is to read and reset
the counters from a work queue that is triggered by a timer.
  
   Regarding this oprofile related part. I wonder how you can get oprofile
   working properly (providing non-bogus results) without performance
   counters overflow IRQ generation?
 
  I don't think you can - triggering capture on overflow is precisely how
  oprofile works.
 
  The erratum talks about polling for overflow.  By doing this, you are in
  a well defined part of the kernel, which is obviously going to be shown
  as a hot path for every counter, thus making oprofile useless for kernel
  work.
 
  Deferring the interrupt to a workqueue doesn't resolve the problem
  either. The problem has nothing to do with what happens after the
  interrupt occurs - it's about interrupts themselves being lost.
 
  I think just accepting that this erratum breaks oprofile is the only
  realistic solution. ;(

 I also thought about the same initially. But the problem still looks like
 it can be workarounded, admittedly in quite a dirty way.

 We just need to use not a periodic timer, but kind of a watchdog (this can
 be implemented with OMAP GPTIMER).

 As long as PMU interrupts are coming fast, watchdog is frequently reset and
 never shows up anywhere. Everything is working nice.

 Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU
 state. As PMU could get broken something like 10 times per second in the
 worst case in my experiments, having ~10 ms for a watchdog trigger period
 seemed to be a  reasonable empirical value.  So in this conditions, PMU
 will be in a nonworking state approximately less than 10% of the time in
 the worst practical case. Not very nice, but not completely ugly either.
The accuracy is not very good.

 Another problematic condition is when PMU is fine, but is not generating
 events naturally (for example we have configured it for cache misses, but
 are burning cpu in a loop which is not accessing memory at all). In this
 case a watchdog will be triggered periodically for no reason, generating
 the noise in profiling statistics. This noise needs to be filtered out,
 and seems like it is possible to do it. The trick is to reset watchdog
 counter to a lower value than it is typically reset in PMU IRQ handler.
 This way, whenever PMU interrupt is generated, we check if watchdog counter
 is below the normal threshold. If it is lower, then we know that watchdog
 interrupt was triggered recently and this sample can be ignored. The
 difference between normal watchdog counter reset value and the value which
 gets set on watchdog interrupts should provide sufficient time to get out
 of the watchdog interrupt handler and its related code, so that it does not
 show up in statistics that much.

 A working proof of concept patch was submitted there:
 http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
 Sorry for not posting it to one of the kernel mailing lists, but I thought
 that beagleboard mailing list was a good place to find users who may
 want to try it and evaluate if it has any practical value. Maybe it was not
 a very wise decision.

 Unfortunately I'm not a kernel hacker and cleaning up the patch may take
 too much time and efforts, taking into account my current knowledge. I
 would be happy if somebody else with more hands-on kernel experience could
 make a clean and usable Cortex-A8 PMU workaround. I don't care about
 getting some part of credit for it or not, the end result is more important
 :)
I am ok to help

 One of the obvious problems with the patch (other than race conditions) is
 that it is using OMAP-specific GPTIMER. Is there something more portable in
 the kernel to provide similar functionality? Or are there any Cortex-A8 r1
 cores other than OMAP3 in the wild?
You can use a 'struct timer_list' and the setup_timer, mod_timer, 
del_timer_sync. Another API is the hight resolution timers (HRT) but I do not 
think we need such a high precision timer here.

Jean
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: get_irq_regs() from soft IRQ

2009-06-29 Thread Siarhei Siamashka
On Monday 29 June 2009 21:49:59 ext Jean Pihet wrote:
[...]
  We just need to use not a periodic timer, but kind of a watchdog (this
  can be implemented with OMAP GPTIMER).
 
  As long as PMU interrupts are coming fast, watchdog is frequently reset
  and never shows up anywhere. Everything is working nice.
 
  Now if PMU gets broken, watchdog gets triggered eventually and recovers
  PMU state. As PMU could get broken something like 10 times per second in
  the worst case in my experiments, having ~10 ms for a watchdog trigger
  period seemed to be a  reasonable empirical value.  So in this
  conditions, PMU will be in a nonworking state approximately less than 10%
  of the time in the worst practical case. Not very nice, but not
  completely ugly either.

 The accuracy is not very good.

Yes, but it is the worst case. In normal case when PMU not broken or very
rarely broken, the statistics would be quite good. One of the reasons of
dropping working on this patch was also the fact that in some cases Cortex-A8
PMU even works reliable enough :) Adding some suspicious weird extra logic may
be not very desired by the people, who are quite satisfied even with the
current oprofile state on Cortex-A8 chips (numbercrunching applications with
relatively low number of syscalls and hence rarely touching any coprocessor
registers, are mostly unaffected).

Some adaptive watchdog trigger period may be better (try to predict when the
next PMU interrupt is going to normally happen and tune watchdog timeout at
runtime), but also may be more complex and may theoretically still misbehave
in some cases.

  Another problematic condition is when PMU is fine, but is not generating
  events naturally (for example we have configured it for cache misses, but
  are burning cpu in a loop which is not accessing memory at all). In this
  case a watchdog will be triggered periodically for no reason, generating
  the noise in profiling statistics. This noise needs to be filtered out,
  and seems like it is possible to do it. The trick is to reset watchdog
  counter to a lower value than it is typically reset in PMU IRQ handler.
  This way, whenever PMU interrupt is generated, we check if watchdog
  counter is below the normal threshold. If it is lower, then we know that
  watchdog interrupt was triggered recently and this sample can be ignored.
  The difference between normal watchdog counter reset value and the value
  which gets set on watchdog interrupts should provide sufficient time to
  get out of the watchdog interrupt handler and its related code, so that
  it does not show up in statistics that much.

And forgot to mention here, very low frequency events (with frequency lower
than the frequency of watchdog) may be quite problematic and still distort the
statistics because they will be filtered out. Tuning all the magic values may
turn out to be a hell.

But at the very least, all the watchdog interrupts (both false alarms and real
cases of PMU breakage) can be counted and taken into account. This statistics
could be somehow reported to the user, so that (s)he would make a decision
if the final profiling statistics can be trusted and for how much time the PMU
was actually broken.

  A working proof of concept patch was submitted there:
  http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
  Sorry for not posting it to one of the kernel mailing lists, but I
  thought that beagleboard mailing list was a good place to find users who
  may want to try it and evaluate if it has any practical value. Maybe it
  was not a very wise decision.
 
  Unfortunately I'm not a kernel hacker and cleaning up the patch may take
  too much time and efforts, taking into account my current knowledge. I
  would be happy if somebody else with more hands-on kernel experience
  could make a clean and usable Cortex-A8 PMU workaround. I don't care
  about getting some part of credit for it or not, the end result is more
  important
 
  :)

 I am ok to help

  One of the obvious problems with the patch (other than race conditions)
  is that it is using OMAP-specific GPTIMER. Is there something more
  portable in the kernel to provide similar functionality? Or are there any
  Cortex-A8 r1 cores other than OMAP3 in the wild?

 You can use a 'struct timer_list' and the setup_timer, mod_timer,
 del_timer_sync. Another API is the hight resolution timers (HRT) but I do
 not think we need such a high precision timer here.

Thanks

-- 
Best regards,
Siarhei Siamashka
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html