Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers

2010-06-08 Thread Philippe Gerum
On Tue, 2010-06-08 at 18:22 +0200, Wolfgang Mauerer wrote:
> Philippe Gerum wrote:
> > On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote:
> >> Hi,
> >>
> >> the attached patch augments the timer expiration handler with some checks
> >> for excessive timer latencies. Depending on the configuration options, it
> >> freezes the ipipe tracer and creates a list of delayed timers.
> >>
> >> This code was useful for us in debugging some problems with faulty 
> >> hardware.
> >> Would such a thing be suitable for mainline?
> > 
> > No, I don't think so, albeit the debug feature it brings in makes sense.
> > 
> > - the code, as is, adds significant cruft to the timer code, which is
> > per se a problem. Debug is ok, but I really would like that code to have
> > a much lower impact source-wise.
> 
> so something along the lines of
> 
> xntimer_enqueue_aperiodic(...)
> {
> ...
>   large_latency_hook_whatever(args);
> ...
> }
> 
> where large_latency_hook_whatever() is #ifdef'd to nop if not
> compile-time configured would be acceptable I suppose?

Yes.

> 
> > - dependency on debugfs is wrong; we still have to take care of legacy
> > 2.4 kernels. I know, this is a real PITA, and I'm more than willing to
> > get rid of that "heritage" in 3.x, but for now, 2.x still requires to be
> > 2.4-compatible.
> 
> oh, yeah. procfs would be a replacement, but using debugfs via ipipe
> is much nicer since this is definitely debugging stuff.
> 
> > - but, beyond this, this is something that should go into the I-pipe
> > tracer instead, exporting a new dedicated interface to log a high
> > latency event. There, you could use whatever fits to implement it, since
> > there is by definition no consideration for compat with legacy kernels.
> > I'm thinking of having ipipe_trace_latency(unsigned long delta, const
> > char *source). That would be reusable in other, non-Xenomai contexts as
> > well.
> 
> Sounds reasonable.
> 
> Would it make sense to augment ipipe_trace_latency() with an additional
> type argument to distinguish different latency sources, or is this
> overkill?
> 

It would even be required to store a latency event as a tracepoint in
the log, I guess, since we don't have room for any source label (which
would be overkill to have in every tracepoint anyway), but we do have a
type field there.

> Best, Wolfgang
> 
> > 
> > Additionally, the latency spots could appear as events in the tracer log
> > as well, which may help debugging them a lot, since you would know about
> > the actual context.
> > 
> >> Regards, Wolfgang
> >>
> >> ---
> >> commit 9c2a3908ac4aff21022a577552002fc2622d3670
> >> Author: Wolfgang Mauerer 
> >> Date:   Mon Jun 7 15:34:39 2010 +0200
> >>
> >> This patch provides the possibility to check upon timer expiration
> >> how much a timer has been delayed. When the delay surpasses a 
> >> configurable
> >> threshold, the ipipe tracer is frozen. Additionally, the patch allows
> >> for the creation of a list of timers that have suffered from excessive
> >> delays.
> >> 
> >> The mechanism can be disabled at both, compile- and runtime. There is 
> >> no
> >> overhead in the former case.
> >> 
> >> Signed-off-by: Wolfgang Mauerer 
> >>
> >> diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig
> >> index 211a4ad..ffed3f8 100644
> >> --- a/ksrc/nucleus/Kconfig
> >> +++ b/ksrc/nucleus/Kconfig
> >> @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS
> >>This option activates debugging output for critical
> >>timer-related operations performed by the Xenomai core.
> >>  
> >> +config XENO_OPT_DEBUG_TIMER_LATENCY
> >> +  bool "Log excessive timer latencies"
> >> +  depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE)
> >> +  help
> >> +
> >> +  This option activates recording of excessive timer latencies.
> >> +  In contrast to the userland latency checker which is statistical
> >> +  in nature, this test checks every expiring timer and is thus
> >> +  more precise. However, it adds some runtime overhead when
> >> +  activated.
> >> +
> >> +  When the ipipe tracer is enabled, the trace is frozen
> >> +  when a latency maximum above the threshold set in
> >> +  the debugfs file timer_lattest/lat_threshold_us is detected.
> >> +  When Xenomai statistics collection is enabled, a list of
> >> +  excessive latencies together with the names of the timers
> >> +  is kept in the debugfs file timer_lattest/exc_timer_list.
> >> +
> >>  config XENO_OPT_DEBUG_SYNCH_RELAX
> >>bool "Detect mutexes held in relaxed sections"
> >>depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG
> >> diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c
> >> index d813c4f..01dad44 100644
> >> --- a/ksrc/nucleus/timer.c
> >> +++ b/ksrc/nucleus/timer.c
> >> @@ -36,11 +36,37 @@
> >>   *
> >>   *...@{*/
> >>  
> >> +#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include 
> >>  
> >> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY
> >> +#define NA

Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers

2010-06-08 Thread Wolfgang Mauerer
Philippe Gerum wrote:
> On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote:
>> Hi,
>>
>> the attached patch augments the timer expiration handler with some checks
>> for excessive timer latencies. Depending on the configuration options, it
>> freezes the ipipe tracer and creates a list of delayed timers.
>>
>> This code was useful for us in debugging some problems with faulty hardware.
>> Would such a thing be suitable for mainline?
> 
> No, I don't think so, albeit the debug feature it brings in makes sense.
> 
> - the code, as is, adds significant cruft to the timer code, which is
> per se a problem. Debug is ok, but I really would like that code to have
> a much lower impact source-wise.

so something along the lines of

xntimer_enqueue_aperiodic(...)
{
...
large_latency_hook_whatever(args);
...
}

where large_latency_hook_whatever() is #ifdef'd to nop if not
compile-time configured would be acceptable I suppose?

> - dependency on debugfs is wrong; we still have to take care of legacy
> 2.4 kernels. I know, this is a real PITA, and I'm more than willing to
> get rid of that "heritage" in 3.x, but for now, 2.x still requires to be
> 2.4-compatible.

oh, yeah. procfs would be a replacement, but using debugfs via ipipe
is much nicer since this is definitely debugging stuff.

> - but, beyond this, this is something that should go into the I-pipe
> tracer instead, exporting a new dedicated interface to log a high
> latency event. There, you could use whatever fits to implement it, since
> there is by definition no consideration for compat with legacy kernels.
> I'm thinking of having ipipe_trace_latency(unsigned long delta, const
> char *source). That would be reusable in other, non-Xenomai contexts as
> well.

Sounds reasonable.

Would it make sense to augment ipipe_trace_latency() with an additional
type argument to distinguish different latency sources, or is this
overkill?

Best, Wolfgang

> 
> Additionally, the latency spots could appear as events in the tracer log
> as well, which may help debugging them a lot, since you would know about
> the actual context.
> 
>> Regards, Wolfgang
>>
>> ---
>> commit 9c2a3908ac4aff21022a577552002fc2622d3670
>> Author: Wolfgang Mauerer 
>> Date:   Mon Jun 7 15:34:39 2010 +0200
>>
>> This patch provides the possibility to check upon timer expiration
>> how much a timer has been delayed. When the delay surpasses a 
>> configurable
>> threshold, the ipipe tracer is frozen. Additionally, the patch allows
>> for the creation of a list of timers that have suffered from excessive
>> delays.
>> 
>> The mechanism can be disabled at both, compile- and runtime. There is no
>> overhead in the former case.
>> 
>> Signed-off-by: Wolfgang Mauerer 
>>
>> diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig
>> index 211a4ad..ffed3f8 100644
>> --- a/ksrc/nucleus/Kconfig
>> +++ b/ksrc/nucleus/Kconfig
>> @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS
>>  This option activates debugging output for critical
>>  timer-related operations performed by the Xenomai core.
>>  
>> +config XENO_OPT_DEBUG_TIMER_LATENCY
>> +bool "Log excessive timer latencies"
>> +depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE)
>> +help
>> +
>> +This option activates recording of excessive timer latencies.
>> +In contrast to the userland latency checker which is statistical
>> +in nature, this test checks every expiring timer and is thus
>> +more precise. However, it adds some runtime overhead when
>> +activated.
>> +
>> +When the ipipe tracer is enabled, the trace is frozen
>> +when a latency maximum above the threshold set in
>> +the debugfs file timer_lattest/lat_threshold_us is detected.
>> +When Xenomai statistics collection is enabled, a list of
>> +excessive latencies together with the names of the timers
>> +is kept in the debugfs file timer_lattest/exc_timer_list.
>> +
>>  config XENO_OPT_DEBUG_SYNCH_RELAX
>>  bool "Detect mutexes held in relaxed sections"
>>  depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG
>> diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c
>> index d813c4f..01dad44 100644
>> --- a/ksrc/nucleus/timer.c
>> +++ b/ksrc/nucleus/timer.c
>> @@ -36,11 +36,37 @@
>>   *
>>   *...@{*/
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY
>> +#define NANO 10ll
>> +#define MU   100ll
>> +#define NANO_PER_MU (NANO/MU)
>> +
>> +static u32 lat_thresh = 0; /* in \mu s, set to zero to disable */
>> +
>> +#ifdef CONFIG_XENO_OPT_STATS
>> +#include 
>> +
>> +#define NUM_REC_LATENCIES   500
>> +static int num_lat_exc_timers = -1;
>> +static bool lat_rec_overflow = false; /* Set when more than 
>> NUM_REC_LATENCIES
>> +   * excessive latencies have been
>> +   * observed */
>> +struct lat_rec {
>> 

Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers

2010-06-08 Thread Philippe Gerum
On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote:
> Hi,
> 
> the attached patch augments the timer expiration handler with some checks
> for excessive timer latencies. Depending on the configuration options, it
> freezes the ipipe tracer and creates a list of delayed timers.
> 
> This code was useful for us in debugging some problems with faulty hardware.
> Would such a thing be suitable for mainline?

No, I don't think so, albeit the debug feature it brings in makes sense.

- the code, as is, adds significant cruft to the timer code, which is
per se a problem. Debug is ok, but I really would like that code to have
a much lower impact source-wise.

- dependency on debugfs is wrong; we still have to take care of legacy
2.4 kernels. I know, this is a real PITA, and I'm more than willing to
get rid of that "heritage" in 3.x, but for now, 2.x still requires to be
2.4-compatible.

- but, beyond this, this is something that should go into the I-pipe
tracer instead, exporting a new dedicated interface to log a high
latency event. There, you could use whatever fits to implement it, since
there is by definition no consideration for compat with legacy kernels.
I'm thinking of having ipipe_trace_latency(unsigned long delta, const
char *source). That would be reusable in other, non-Xenomai contexts as
well.

Additionally, the latency spots could appear as events in the tracer log
as well, which may help debugging them a lot, since you would know about
the actual context.

> 
> Regards, Wolfgang
> 
> ---
> commit 9c2a3908ac4aff21022a577552002fc2622d3670
> Author: Wolfgang Mauerer 
> Date:   Mon Jun 7 15:34:39 2010 +0200
> 
> This patch provides the possibility to check upon timer expiration
> how much a timer has been delayed. When the delay surpasses a configurable
> threshold, the ipipe tracer is frozen. Additionally, the patch allows
> for the creation of a list of timers that have suffered from excessive
> delays.
> 
> The mechanism can be disabled at both, compile- and runtime. There is no
> overhead in the former case.
> 
> Signed-off-by: Wolfgang Mauerer 
> 
> diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig
> index 211a4ad..ffed3f8 100644
> --- a/ksrc/nucleus/Kconfig
> +++ b/ksrc/nucleus/Kconfig
> @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS
>   This option activates debugging output for critical
>   timer-related operations performed by the Xenomai core.
>  
> +config XENO_OPT_DEBUG_TIMER_LATENCY
> + bool "Log excessive timer latencies"
> + depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE)
> + help
> +
> + This option activates recording of excessive timer latencies.
> + In contrast to the userland latency checker which is statistical
> + in nature, this test checks every expiring timer and is thus
> + more precise. However, it adds some runtime overhead when
> + activated.
> +
> + When the ipipe tracer is enabled, the trace is frozen
> + when a latency maximum above the threshold set in
> + the debugfs file timer_lattest/lat_threshold_us is detected.
> + When Xenomai statistics collection is enabled, a list of
> + excessive latencies together with the names of the timers
> + is kept in the debugfs file timer_lattest/exc_timer_list.
> +
>  config XENO_OPT_DEBUG_SYNCH_RELAX
>   bool "Detect mutexes held in relaxed sections"
>   depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG
> diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c
> index d813c4f..01dad44 100644
> --- a/ksrc/nucleus/timer.c
> +++ b/ksrc/nucleus/timer.c
> @@ -36,11 +36,37 @@
>   *
>   *...@{*/
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY
> +#define NANO 10ll
> +#define MU   100ll
> +#define NANO_PER_MU (NANO/MU)
> +
> +static u32 lat_thresh = 0; /* in \mu s, set to zero to disable */
> +
> +#ifdef CONFIG_XENO_OPT_STATS
> +#include 
> +
> +#define NUM_REC_LATENCIES500
> +static int num_lat_exc_timers = -1;
> +static bool lat_rec_overflow = false; /* Set when more than NUM_REC_LATENCIES
> +* excessive latencies have been
> +* observed */
> +struct lat_rec {
> + char name[XNOBJECT_NAME_LEN];
> + xnsticks_t delta;
> +};
> +
> +static struct lat_rec delayed_timers[NUM_REC_LATENCIES];
> +#endif // CONFIG_XENO_OPT_STATS
> +#endif // XENO_OPT_DEBUG_TIMER_LATENCY
> +
>  static inline void xntimer_enqueue_aperiodic(xntimer_t *timer)
>  {
>   xntimerq_t *q = &timer->sched->timerqueue;
> @@ -368,6 +394,32 @@ void xntimer_tick_aperiodic(void)
>   if (delta > (xnsticks_t)nklatency)
>   break;
>  
> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY
> + /*
> +  * Check for severly delayed timers, but omit the host tick
> +  * timer as it can be delayed intentionally, thus