Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers
On Tue, 2010-06-08 at 18:22 +0200, Wolfgang Mauerer wrote: > Philippe Gerum wrote: > > On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote: > >> Hi, > >> > >> the attached patch augments the timer expiration handler with some checks > >> for excessive timer latencies. Depending on the configuration options, it > >> freezes the ipipe tracer and creates a list of delayed timers. > >> > >> This code was useful for us in debugging some problems with faulty > >> hardware. > >> Would such a thing be suitable for mainline? > > > > No, I don't think so, albeit the debug feature it brings in makes sense. > > > > - the code, as is, adds significant cruft to the timer code, which is > > per se a problem. Debug is ok, but I really would like that code to have > > a much lower impact source-wise. > > so something along the lines of > > xntimer_enqueue_aperiodic(...) > { > ... > large_latency_hook_whatever(args); > ... > } > > where large_latency_hook_whatever() is #ifdef'd to nop if not > compile-time configured would be acceptable I suppose? Yes. > > > - dependency on debugfs is wrong; we still have to take care of legacy > > 2.4 kernels. I know, this is a real PITA, and I'm more than willing to > > get rid of that "heritage" in 3.x, but for now, 2.x still requires to be > > 2.4-compatible. > > oh, yeah. procfs would be a replacement, but using debugfs via ipipe > is much nicer since this is definitely debugging stuff. > > > - but, beyond this, this is something that should go into the I-pipe > > tracer instead, exporting a new dedicated interface to log a high > > latency event. There, you could use whatever fits to implement it, since > > there is by definition no consideration for compat with legacy kernels. > > I'm thinking of having ipipe_trace_latency(unsigned long delta, const > > char *source). That would be reusable in other, non-Xenomai contexts as > > well. > > Sounds reasonable. > > Would it make sense to augment ipipe_trace_latency() with an additional > type argument to distinguish different latency sources, or is this > overkill? > It would even be required to store a latency event as a tracepoint in the log, I guess, since we don't have room for any source label (which would be overkill to have in every tracepoint anyway), but we do have a type field there. > Best, Wolfgang > > > > > Additionally, the latency spots could appear as events in the tracer log > > as well, which may help debugging them a lot, since you would know about > > the actual context. > > > >> Regards, Wolfgang > >> > >> --- > >> commit 9c2a3908ac4aff21022a577552002fc2622d3670 > >> Author: Wolfgang Mauerer > >> Date: Mon Jun 7 15:34:39 2010 +0200 > >> > >> This patch provides the possibility to check upon timer expiration > >> how much a timer has been delayed. When the delay surpasses a > >> configurable > >> threshold, the ipipe tracer is frozen. Additionally, the patch allows > >> for the creation of a list of timers that have suffered from excessive > >> delays. > >> > >> The mechanism can be disabled at both, compile- and runtime. There is > >> no > >> overhead in the former case. > >> > >> Signed-off-by: Wolfgang Mauerer > >> > >> diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig > >> index 211a4ad..ffed3f8 100644 > >> --- a/ksrc/nucleus/Kconfig > >> +++ b/ksrc/nucleus/Kconfig > >> @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS > >>This option activates debugging output for critical > >>timer-related operations performed by the Xenomai core. > >> > >> +config XENO_OPT_DEBUG_TIMER_LATENCY > >> + bool "Log excessive timer latencies" > >> + depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE) > >> + help > >> + > >> + This option activates recording of excessive timer latencies. > >> + In contrast to the userland latency checker which is statistical > >> + in nature, this test checks every expiring timer and is thus > >> + more precise. However, it adds some runtime overhead when > >> + activated. > >> + > >> + When the ipipe tracer is enabled, the trace is frozen > >> + when a latency maximum above the threshold set in > >> + the debugfs file timer_lattest/lat_threshold_us is detected. > >> + When Xenomai statistics collection is enabled, a list of > >> + excessive latencies together with the names of the timers > >> + is kept in the debugfs file timer_lattest/exc_timer_list. > >> + > >> config XENO_OPT_DEBUG_SYNCH_RELAX > >>bool "Detect mutexes held in relaxed sections" > >>depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG > >> diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c > >> index d813c4f..01dad44 100644 > >> --- a/ksrc/nucleus/timer.c > >> +++ b/ksrc/nucleus/timer.c > >> @@ -36,11 +36,37 @@ > >> * > >> *...@{*/ > >> > >> +#include > >> #include > >> #include > >> #include > >> +#include > >> #include > >> > >> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY > >> +#define NA
Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers
Philippe Gerum wrote: > On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote: >> Hi, >> >> the attached patch augments the timer expiration handler with some checks >> for excessive timer latencies. Depending on the configuration options, it >> freezes the ipipe tracer and creates a list of delayed timers. >> >> This code was useful for us in debugging some problems with faulty hardware. >> Would such a thing be suitable for mainline? > > No, I don't think so, albeit the debug feature it brings in makes sense. > > - the code, as is, adds significant cruft to the timer code, which is > per se a problem. Debug is ok, but I really would like that code to have > a much lower impact source-wise. so something along the lines of xntimer_enqueue_aperiodic(...) { ... large_latency_hook_whatever(args); ... } where large_latency_hook_whatever() is #ifdef'd to nop if not compile-time configured would be acceptable I suppose? > - dependency on debugfs is wrong; we still have to take care of legacy > 2.4 kernels. I know, this is a real PITA, and I'm more than willing to > get rid of that "heritage" in 3.x, but for now, 2.x still requires to be > 2.4-compatible. oh, yeah. procfs would be a replacement, but using debugfs via ipipe is much nicer since this is definitely debugging stuff. > - but, beyond this, this is something that should go into the I-pipe > tracer instead, exporting a new dedicated interface to log a high > latency event. There, you could use whatever fits to implement it, since > there is by definition no consideration for compat with legacy kernels. > I'm thinking of having ipipe_trace_latency(unsigned long delta, const > char *source). That would be reusable in other, non-Xenomai contexts as > well. Sounds reasonable. Would it make sense to augment ipipe_trace_latency() with an additional type argument to distinguish different latency sources, or is this overkill? Best, Wolfgang > > Additionally, the latency spots could appear as events in the tracer log > as well, which may help debugging them a lot, since you would know about > the actual context. > >> Regards, Wolfgang >> >> --- >> commit 9c2a3908ac4aff21022a577552002fc2622d3670 >> Author: Wolfgang Mauerer >> Date: Mon Jun 7 15:34:39 2010 +0200 >> >> This patch provides the possibility to check upon timer expiration >> how much a timer has been delayed. When the delay surpasses a >> configurable >> threshold, the ipipe tracer is frozen. Additionally, the patch allows >> for the creation of a list of timers that have suffered from excessive >> delays. >> >> The mechanism can be disabled at both, compile- and runtime. There is no >> overhead in the former case. >> >> Signed-off-by: Wolfgang Mauerer >> >> diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig >> index 211a4ad..ffed3f8 100644 >> --- a/ksrc/nucleus/Kconfig >> +++ b/ksrc/nucleus/Kconfig >> @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS >> This option activates debugging output for critical >> timer-related operations performed by the Xenomai core. >> >> +config XENO_OPT_DEBUG_TIMER_LATENCY >> +bool "Log excessive timer latencies" >> +depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE) >> +help >> + >> +This option activates recording of excessive timer latencies. >> +In contrast to the userland latency checker which is statistical >> +in nature, this test checks every expiring timer and is thus >> +more precise. However, it adds some runtime overhead when >> +activated. >> + >> +When the ipipe tracer is enabled, the trace is frozen >> +when a latency maximum above the threshold set in >> +the debugfs file timer_lattest/lat_threshold_us is detected. >> +When Xenomai statistics collection is enabled, a list of >> +excessive latencies together with the names of the timers >> +is kept in the debugfs file timer_lattest/exc_timer_list. >> + >> config XENO_OPT_DEBUG_SYNCH_RELAX >> bool "Detect mutexes held in relaxed sections" >> depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG >> diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c >> index d813c4f..01dad44 100644 >> --- a/ksrc/nucleus/timer.c >> +++ b/ksrc/nucleus/timer.c >> @@ -36,11 +36,37 @@ >> * >> *...@{*/ >> >> +#include >> #include >> #include >> #include >> +#include >> #include >> >> +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY >> +#define NANO 10ll >> +#define MU 100ll >> +#define NANO_PER_MU (NANO/MU) >> + >> +static u32 lat_thresh = 0; /* in \mu s, set to zero to disable */ >> + >> +#ifdef CONFIG_XENO_OPT_STATS >> +#include >> + >> +#define NUM_REC_LATENCIES 500 >> +static int num_lat_exc_timers = -1; >> +static bool lat_rec_overflow = false; /* Set when more than >> NUM_REC_LATENCIES >> + * excessive latencies have been >> + * observed */ >> +struct lat_rec { >>
Re: [Xenomai-core] [PATCH, RFC] Instrumentation to detect delayed timers
On Tue, 2010-06-08 at 16:21 +0200, Wolfgang Mauerer wrote: > Hi, > > the attached patch augments the timer expiration handler with some checks > for excessive timer latencies. Depending on the configuration options, it > freezes the ipipe tracer and creates a list of delayed timers. > > This code was useful for us in debugging some problems with faulty hardware. > Would such a thing be suitable for mainline? No, I don't think so, albeit the debug feature it brings in makes sense. - the code, as is, adds significant cruft to the timer code, which is per se a problem. Debug is ok, but I really would like that code to have a much lower impact source-wise. - dependency on debugfs is wrong; we still have to take care of legacy 2.4 kernels. I know, this is a real PITA, and I'm more than willing to get rid of that "heritage" in 3.x, but for now, 2.x still requires to be 2.4-compatible. - but, beyond this, this is something that should go into the I-pipe tracer instead, exporting a new dedicated interface to log a high latency event. There, you could use whatever fits to implement it, since there is by definition no consideration for compat with legacy kernels. I'm thinking of having ipipe_trace_latency(unsigned long delta, const char *source). That would be reusable in other, non-Xenomai contexts as well. Additionally, the latency spots could appear as events in the tracer log as well, which may help debugging them a lot, since you would know about the actual context. > > Regards, Wolfgang > > --- > commit 9c2a3908ac4aff21022a577552002fc2622d3670 > Author: Wolfgang Mauerer > Date: Mon Jun 7 15:34:39 2010 +0200 > > This patch provides the possibility to check upon timer expiration > how much a timer has been delayed. When the delay surpasses a configurable > threshold, the ipipe tracer is frozen. Additionally, the patch allows > for the creation of a list of timers that have suffered from excessive > delays. > > The mechanism can be disabled at both, compile- and runtime. There is no > overhead in the former case. > > Signed-off-by: Wolfgang Mauerer > > diff --git a/ksrc/nucleus/Kconfig b/ksrc/nucleus/Kconfig > index 211a4ad..ffed3f8 100644 > --- a/ksrc/nucleus/Kconfig > +++ b/ksrc/nucleus/Kconfig > @@ -293,6 +293,24 @@ config XENO_OPT_DEBUG_TIMERS > This option activates debugging output for critical > timer-related operations performed by the Xenomai core. > > +config XENO_OPT_DEBUG_TIMER_LATENCY > + bool "Log excessive timer latencies" > + depends on XENO_OPT_DEBUG && (XENO_OPT_STATS || IPIPE_TRACE) > + help > + > + This option activates recording of excessive timer latencies. > + In contrast to the userland latency checker which is statistical > + in nature, this test checks every expiring timer and is thus > + more precise. However, it adds some runtime overhead when > + activated. > + > + When the ipipe tracer is enabled, the trace is frozen > + when a latency maximum above the threshold set in > + the debugfs file timer_lattest/lat_threshold_us is detected. > + When Xenomai statistics collection is enabled, a list of > + excessive latencies together with the names of the timers > + is kept in the debugfs file timer_lattest/exc_timer_list. > + > config XENO_OPT_DEBUG_SYNCH_RELAX > bool "Detect mutexes held in relaxed sections" > depends on XENO_OPT_PERVASIVE && XENO_OPT_DEBUG > diff --git a/ksrc/nucleus/timer.c b/ksrc/nucleus/timer.c > index d813c4f..01dad44 100644 > --- a/ksrc/nucleus/timer.c > +++ b/ksrc/nucleus/timer.c > @@ -36,11 +36,37 @@ > * > *...@{*/ > > +#include > #include > #include > #include > +#include > #include > > +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY > +#define NANO 10ll > +#define MU 100ll > +#define NANO_PER_MU (NANO/MU) > + > +static u32 lat_thresh = 0; /* in \mu s, set to zero to disable */ > + > +#ifdef CONFIG_XENO_OPT_STATS > +#include > + > +#define NUM_REC_LATENCIES500 > +static int num_lat_exc_timers = -1; > +static bool lat_rec_overflow = false; /* Set when more than NUM_REC_LATENCIES > +* excessive latencies have been > +* observed */ > +struct lat_rec { > + char name[XNOBJECT_NAME_LEN]; > + xnsticks_t delta; > +}; > + > +static struct lat_rec delayed_timers[NUM_REC_LATENCIES]; > +#endif // CONFIG_XENO_OPT_STATS > +#endif // XENO_OPT_DEBUG_TIMER_LATENCY > + > static inline void xntimer_enqueue_aperiodic(xntimer_t *timer) > { > xntimerq_t *q = &timer->sched->timerqueue; > @@ -368,6 +394,32 @@ void xntimer_tick_aperiodic(void) > if (delta > (xnsticks_t)nklatency) > break; > > +#ifdef CONFIG_XENO_OPT_DEBUG_TIMER_LATENCY > + /* > + * Check for severly delayed timers, but omit the host tick > + * timer as it can be delayed intentionally, thus