[Xenomai-core] Re: [RFC] tame the watchdog

2006-08-02 Thread Jan Kiszka
Philippe Gerum wrote:
> On Tue, 2006-08-01 at 16:45 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
 Still, reinitializing X while the latency test runs causes
 the latter to hang, albeit LOC is still flowing properly and the box
 keeps going normally.
>>> This one was due to the nucleus watchdog which triggered right after the
>>> graphic mode was fully initialized, due to the huge amount of
>>> unpreemptible time spent doing this; this caused the sampling task to be
>>> detected as a runaway thread. So the behaviour is ok, albeit a bit
>>> frightening at first.
>>>
>> That reminds of the unfortunate characteristics of the 2.6 oom-killer:
>> unless you set your time-critical app's oom_adj to -17, you are never
>> really safe from being killed accidentally on low-mem scenarios.
>>
>> What about introducing some mechanism to protect audited tasks against
>> the watchdog? A simple thread flag settable via existing APIs, ignored
>> if there is no watchdog compiled in?
> 
> There is a fundamental difference between the OOM killer and the Xenomai
> watchdog: the latter is merely a debugging tool to prevent the box to
> hang, and you can disable it completely.
> 
> The situations reported by the watchdog are pathological ones, which
> involve more than 4 seconds of continuous real-time activity while the
> Linux kernel is being totally starved from CPU, and in such a case, you
> really want someone to pull the brake, regardless of the consequences on
> the application (which looks like basically toast anyway). IOW, if such
> weird situation eventually ends up being considered as "normal" under
> certain circumstances, the best approach is simply to disable the
> watchdog entirely.
> 
> Limiting the runtime quantum allotted to threads through a dedicated
> scheduling policy would be a better way to deal with CPU overconsumption
> "intelligently", i.e. on a per-thread basis. 

For sure, e.g. round-robin scheduling including the root thread, and
this also over aperiodic timer mode.

> OTOH, the current watchdog
> implementation is aiming at being terminally dumb for the sake of debug
> efficiency.
> 

Yes, it's simple and it's a debugging mechanism. Nevertheless, I think
it can be improved without too much effort or costs. I would love to
demonstrate this, but for now I'm afraid this has to remain a (now
filed) idea.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Re: [RFC] tame the watchdog (was: Beginner's question / testsuite / latency)

2006-08-02 Thread Philippe Gerum
On Tue, 2006-08-01 at 16:45 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> >> Still, reinitializing X while the latency test runs causes
> >> the latter to hang, albeit LOC is still flowing properly and the box
> >> keeps going normally.
> > 
> > This one was due to the nucleus watchdog which triggered right after the
> > graphic mode was fully initialized, due to the huge amount of
> > unpreemptible time spent doing this; this caused the sampling task to be
> > detected as a runaway thread. So the behaviour is ok, albeit a bit
> > frightening at first.
> > 
> 
> That reminds of the unfortunate characteristics of the 2.6 oom-killer:
> unless you set your time-critical app's oom_adj to -17, you are never
> really safe from being killed accidentally on low-mem scenarios.
> 
> What about introducing some mechanism to protect audited tasks against
> the watchdog? A simple thread flag settable via existing APIs, ignored
> if there is no watchdog compiled in?

There is a fundamental difference between the OOM killer and the Xenomai
watchdog: the latter is merely a debugging tool to prevent the box to
hang, and you can disable it completely.

The situations reported by the watchdog are pathological ones, which
involve more than 4 seconds of continuous real-time activity while the
Linux kernel is being totally starved from CPU, and in such a case, you
really want someone to pull the brake, regardless of the consequences on
the application (which looks like basically toast anyway). IOW, if such
weird situation eventually ends up being considered as "normal" under
certain circumstances, the best approach is simply to disable the
watchdog entirely.

Limiting the runtime quantum allotted to threads through a dedicated
scheduling policy would be a better way to deal with CPU overconsumption
"intelligently", i.e. on a per-thread basis. OTOH, the current watchdog
implementation is aiming at being terminally dumb for the sake of debug
efficiency.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core