On Fri, 2010-08-20 at 16:06 +0200, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: > >> Jan Kiszka wrote: > >>> Philippe Gerum wrote: > >>>> I've toyed a bit to find a generic approach for the nucleus to regain > >>>> complete control over a userland application running in a syscall-less > >>>> loop. > >>>> > >>>> The original issue was about recovering gracefully from a runaway > >>>> situation detected by the nucleus watchdog, where a thread would spin in > >>>> primary mode without issuing any syscall, but this would also apply for > >>>> real-time signals pending for such a thread. Currently, Xenomai rt > >>>> signals cannot preempt syscall-less code running in primary mode either. > >>>> > >>>> The major difference between the previous approaches we discussed about > >>>> and this one, is the fact that we now force the runaway thread to run a > >>>> piece of valid code that calls into the nucleus. We do not force the > >>>> thread to run faulty code or at a faulty address anymore. Therefore, we > >>>> can reuse this feature to improve the rt signal management, without > >>>> having to forge yet-another signal stack frame for this. > >>>> > >>>> The code introduced only fixes the watchdog related issue, but also does > >>>> some groundwork for enhancing the rt signal support later. The > >>>> implementation details can be found here: > >>>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c > >>>> > >>>> The current mayday support is only available for powerpc and x86 for > >>>> now, more will come in the next days. To have it enabled, you have to > >>>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, > >>>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a > >>>> new interface available from those latest patches. > >>>> > >>>> The current implementation does not break the 2.5.x ABI on purpose, so > >>>> we could merge it into the stable branch. > >>>> > >>>> We definitely need user feedback on this. Typically, does arming the > >>>> nucleus watchdog with that patch support in, properly recovers from your > >>>> favorite "get me out of here" situation? TIA, > >>>> > >>>> You can pull this stuff from > >>>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. > >>>> > >>> I've retested the feature as it's now in master, and it has one > >>> remaining problem: If you run the cpu hog under gdb control and try to > >>> break out of the while(1) loop, this doesn't work before the watchdog > >>> expired - of course. But if you send the break before the expiry (or hit > >>> a breakpoint), something goes wrong. The Xenomai task continues to spin, > >>> and there is no chance to kill its process (only gdb). > >>> > >>> # cat /proc/xenomai/sched > >>> CPU PID CLASS PRI TIMEOUT TIMEBASE STAT NAME > >>> 0 0 idle -1 - master RR ROOT/0 > > > > Eeek, we really need to have a look at this funky STAT output. > > I've a patch for this queued as well. Was only a cosmetic thing. > > > > >>> 1 0 idle -1 - master R ROOT/1 > >>> 0 6120 rt 99 - master Tt cpu-hog > >>> # cat /proc/xenomai/stat > >>> CPU PID MSW CSW PF STAT %CPU NAME > >>> 0 0 0 0 0 00500088 0.0 ROOT/0 > >>> 1 0 0 0 0 00500080 99.7 ROOT/1 > >>> 0 6120 0 1 0 00342180 100.0 cpu-hog > >>> 0 0 0 21005 0 00000000 0.0 IRQ3340: [timer] > >>> 1 0 0 35887 0 00000000 0.3 IRQ3340: [timer] > >>> > >> Fixable by this tiny change: > >> > >> diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c > >> index 5242d9f..04a344e 100644 > >> --- a/ksrc/nucleus/sched.c > >> +++ b/ksrc/nucleus/sched.c > >> @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) > >> xnthread_name(&sched->rootcb)); > >> > >> #ifdef CONFIG_XENO_OPT_WATCHDOG > >> - xntimer_init(&sched->wdtimer, &nktbase, xnsched_watchdog_handler); > >> + xntimer_init_noblock(&sched->wdtimer, &nktbase, > >> + xnsched_watchdog_handler); > >> xntimer_set_name(&sched->wdtimer, "[watchdog]"); > >> xntimer_set_priority(&sched->wdtimer, XNTIMER_LOPRIO); > >> xntimer_set_sched(&sched->wdtimer, sched); > >> > >> > >> I.e. the watchdog timer should not be stopped by any ongoing debug > >> session of a Xenomai app. Will queue this for upstream. > > > > Yes, that makes a lot of sense now. The watchdog would not fire if the > > task was single-stepped anyway, since the latter would have been moved > > to secondary mode first. > > Yep. > > > > > Did you see this bug happening in a uniprocessor context as well? > > No, as it is impossible on a uniprocessor to interact with gdb if a cpu > hog - the only existing CPU is simply not available. :)
I was rather thinking of your hit-a-breakpoint-or-^C-early scenario... I thought you did see this on UP as well, and scratched my head to understand how this would have been possible. Ok, so let's merge this. > > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core