On Wed, 2010-06-09 at 20:11 +0200, Tschaeche IT-Services wrote:
> On Wed, Jun 09, 2010 at 12:41:23PM +0200, Philippe Gerum wrote:
> > We definitely need user feedback on this. Typically, does arming the
> > nucleus watchdog with that patch support in, properly recovers from your
> > favorite "get me out of here" situation? TIA,
> > 
> > You can pull this stuff from
> > git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> manually build a kernel (timeout 1s) with your patches.
> user space linked to 2.5.3 libraries without any patches.
> Looks fine: the amok task is switched to secondary domain
> (we catched the SIGXCPU) running the loop in secondary domain.
> then, on a SIGTRAP the task leaves the loop.
> also, if SIGTRAP arives before SIGXCPU it looks good,
> apart from the latency of 1s.
> did not check the ucontext within the exception handler, yet.
> would like to setup a reproducible kernel build first...
> we will go into deeper testing in 2 weeks.
> maybe we need a finer granularity than 1s for the watchdog timeout.
> is there a chance?

The watchdog is not meant to be used for implementing application-level
health monitors, which is what you seem to be looking after. The
watchdog is really about pulling the break while debugging, as a mean
not to brick your board when things start to hit the crapper, without
knowing anything from the error source. For that purpose, the current 1s
granularity is just fine. It makes the nucleus watchdog as tactful as a
lumberjack, which is what we want in those circumstances: we want it to
point the finger at the problem we did not know about yet and keep the
board afloat; it is neither meant to monitor a specific code we know in
advance that might misbehave, nor provide any kind of smart contingency

I would rather think that you may need something like a RTDM driver
actually implementing smarter health monitoring features that you could
use along with your app. That driver would expose a normalized socket
interface for observing how things go app-wise, by collecting data about
the current health status. It would have to tap into the mayday routines
for recovering from runaway situations it may detect via its own,
fine-grained watchdog service for instance.

ATM, you can still hack the nucleus watchdog threshold by changing the
periodic setup for its timer in xnpod_enable_timesource(). This said,
increasing the frequency too much would also induce much more overhead,
so YMMV.

> will your patches be merged in an official 2.5.x version?


> thanks for your great support,
>       Olli


Xenomai-core mailing list

Reply via email to