On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote: > >> Philippe Gerum wrote: > >>> I've toyed a bit to find a generic approach for the nucleus to regain > >>> complete control over a userland application running in a syscall-less > >>> loop. > >>> > >>> The original issue was about recovering gracefully from a runaway > >>> situation detected by the nucleus watchdog, where a thread would spin in > >>> primary mode without issuing any syscall, but this would also apply for > >>> real-time signals pending for such a thread. Currently, Xenomai rt > >>> signals cannot preempt syscall-less code running in primary mode either. > >>> > >>> The major difference between the previous approaches we discussed about > >>> and this one, is the fact that we now force the runaway thread to run a > >>> piece of valid code that calls into the nucleus. We do not force the > >>> thread to run faulty code or at a faulty address anymore. Therefore, we > >>> can reuse this feature to improve the rt signal management, without > >>> having to forge yet-another signal stack frame for this. > >>> > >>> The code introduced only fixes the watchdog related issue, but also does > >>> some groundwork for enhancing the rt signal support later. The > >>> implementation details can be found here: > >>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c > >>> > >>> The current mayday support is only available for powerpc and x86 for > >>> now, more will come in the next days. To have it enabled, you have to > >>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, > >>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a > >>> new interface available from those latest patches. > >>> > >>> The current implementation does not break the 2.5.x ABI on purpose, so > >>> we could merge it into the stable branch. > >>> > >>> We definitely need user feedback on this. Typically, does arming the > >>> nucleus watchdog with that patch support in, properly recovers from your > >>> favorite "get me out of here" situation? TIA, > >>> > >>> You can pull this stuff from > >>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. > >>> > >> I've retested the feature as it's now in master, and it has one > >> remaining problem: If you run the cpu hog under gdb control and try to > >> break out of the while(1) loop, this doesn't work before the watchdog > >> expired - of course. But if you send the break before the expiry (or hit > >> a breakpoint), something goes wrong. The Xenomai task continues to spin, > >> and there is no chance to kill its process (only gdb). > > > > I can't reproduce this easily here; it happened only once on a lite52xx, > > and then disappeared; no way to reproduce this once on a dual core atom > > in 64bit mode, or on a x86_32 single core platform either. But I still > > saw it once on a powerpc target, so this looks like a generic > > time-dependent issue. > > > > Do you have the same behavior on a single core config, > > You cannot reproduce it on a single core as the CPU hog will occupy that > core and gdb cannot be operated.
What I want is the lockup to happen; I'll start working from this point using other means. > > > and/or without > > WARNSW enabled? > > Just tried and disabled WARNSW in the test below: no difference. > Ok. > > > > Also, could you post your hog test code? maybe there is a difference > > with the way I'm testing. > > #include <signal.h> > #include <native/task.h> > #include <sys/mman.h> > #include <stdlib.h> > > void sighandler(int sig, siginfo_t *si, void *context) > { > printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int); > exit(1); > } > > void loop(void *arg) > { > RT_TASK_INFO info; > > while (1) > if (!arg) > rt_task_inquire(NULL, &info); > } > > int main(int argc, const char *argv[]) > { > struct sigaction sa; > RT_TASK task; > > sigemptyset(&sa.sa_mask); > sa.sa_sigaction = sighandler; > sa.sa_flags = SA_SIGINFO; > sigaction(SIGDEBUG, &sa, NULL); > > mlockall(MCL_CURRENT|MCL_FUTURE); > rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop, > (void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0)); > rt_task_join(&task); > > return 0; > } Ok, will rebase on this code. Thanks. > > > > >> # cat /proc/xenomai/sched > >> CPU PID CLASS PRI TIMEOUT TIMEBASE STAT NAME > >> 0 0 idle -1 - master RR ROOT/0 > > > > Eeek. This symbolic stat mode label looks weird. > > Hmm, haven't noticed this yet. I'm running a kind of all-yes config, > namely: > > ... > CONFIG_XENOMAI=y > CONFIG_XENO_GENERIC_STACKPOOL=y > CONFIG_XENO_FASTSYNCH=y > CONFIG_XENO_OPT_NUCLEUS=y > CONFIG_XENO_OPT_PERVASIVE=y > CONFIG_XENO_OPT_PRIOCPL=y > CONFIG_XENO_OPT_PIPELINE_HEAD=y > CONFIG_XENO_OPT_SCHED_CLASSES=y > CONFIG_XENO_OPT_SCHED_TP=y > CONFIG_XENO_OPT_SCHED_TP_NRPART=4 > CONFIG_XENO_OPT_SCHED_SPORADIC=y > CONFIG_XENO_OPT_SCHED_SPORADIC_MAXREPL=8 > CONFIG_XENO_OPT_PIPE=y > CONFIG_XENO_OPT_MAP=y > CONFIG_XENO_OPT_PIPE_NRDEV=32 > CONFIG_XENO_OPT_REGISTRY_NRSLOTS=512 > CONFIG_XENO_OPT_SYS_HEAPSZ=256 > CONFIG_XENO_OPT_SYS_STACKPOOLSZ=128 > CONFIG_XENO_OPT_SEM_HEAPSZ=12 > CONFIG_XENO_OPT_GLOBAL_SEM_HEAPSZ=12 > CONFIG_XENO_OPT_STATS=y > CONFIG_XENO_OPT_DEBUG=y > # CONFIG_XENO_OPT_DEBUG_NUCLEUS is not set > # CONFIG_XENO_OPT_DEBUG_XNLOCK is not set > # CONFIG_XENO_OPT_DEBUG_QUEUES is not set > # CONFIG_XENO_OPT_DEBUG_REGISTRY is not set > # CONFIG_XENO_OPT_DEBUG_TIMERS is not set > CONFIG_XENO_OPT_DEBUG_SYNCH_RELAX=y > CONFIG_XENO_OPT_WATCHDOG=y > CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60 > CONFIG_XENO_OPT_SHIRQ=y > CONFIG_XENO_OPT_SELECT=y > > # > # Timing > # > CONFIG_XENO_OPT_TIMING_PERIODIC=y > CONFIG_XENO_OPT_TIMING_VIRTICK=1000 > CONFIG_XENO_OPT_TIMING_SCHEDLAT=0 > > # > # Scalability > # > CONFIG_XENO_OPT_SCALABLE_SCHED=y > # CONFIG_XENO_OPT_TIMER_LIST is not set > CONFIG_XENO_OPT_TIMER_HEAP=y > # CONFIG_XENO_OPT_TIMER_WHEEL is not set > CONFIG_XENO_OPT_TIMER_HEAP_CAPACITY=256 > ... > > Maybe this has some influence as well. The 'RR' correlates with starting > the hog, with or without gdb. > It looks like the status mask is misinterpreted; it could be some harmless position-to-label mismatch (it happened already when the state labels were not properly reordered after a change in the status bits), or something worse. I'll work from your config as well. Thanks. Again. > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core