Re: [Xenomai-core] [PATCH] Mayday support
On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek, we really need to have a look at this funky STAT output. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Fixable by this tiny change: diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c index 5242d9f..04a344e 100644 --- a/ksrc/nucleus/sched.c +++ b/ksrc/nucleus/sched.c @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) xnthread_name(sched-rootcb)); #ifdef CONFIG_XENO_OPT_WATCHDOG - xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler); + xntimer_init_noblock(sched-wdtimer, nktbase, + xnsched_watchdog_handler); xntimer_set_name(sched-wdtimer, [watchdog]); xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO); xntimer_set_sched(sched-wdtimer, sched); I.e. the watchdog timer should not be stopped by any ongoing debug session of a Xenomai app. Will queue this for upstream. Yes, that makes a lot of sense now. The watchdog would not fire if the task was single-stepped anyway, since the latter would have been moved to secondary mode first. Did you see this bug happening in a uniprocessor context as well? Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
Philippe Gerum wrote: On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek, we really need to have a look at this funky STAT output. I've a patch for this queued as well. Was only a cosmetic thing. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Fixable by this tiny change: diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c index 5242d9f..04a344e 100644 --- a/ksrc/nucleus/sched.c +++ b/ksrc/nucleus/sched.c @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) xnthread_name(sched-rootcb)); #ifdef CONFIG_XENO_OPT_WATCHDOG -xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler); +xntimer_init_noblock(sched-wdtimer, nktbase, + xnsched_watchdog_handler); xntimer_set_name(sched-wdtimer, [watchdog]); xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO); xntimer_set_sched(sched-wdtimer, sched); I.e. the watchdog timer should not be stopped by any ongoing debug session of a Xenomai app. Will queue this for upstream. Yes, that makes a lot of sense now. The watchdog would not fire if the task was single-stepped anyway, since the latter would have been moved to secondary mode first. Yep. Did you see this bug happening in a uniprocessor context as well? No, as it is impossible on a uniprocessor to interact with gdb if a cpu hog - the only existing CPU is simply not available. :) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
On Fri, 2010-08-20 at 16:06 +0200, Jan Kiszka wrote: Philippe Gerum wrote: On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek, we really need to have a look at this funky STAT output. I've a patch for this queued as well. Was only a cosmetic thing. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Fixable by this tiny change: diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c index 5242d9f..04a344e 100644 --- a/ksrc/nucleus/sched.c +++ b/ksrc/nucleus/sched.c @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu) xnthread_name(sched-rootcb)); #ifdef CONFIG_XENO_OPT_WATCHDOG - xntimer_init(sched-wdtimer, nktbase, xnsched_watchdog_handler); + xntimer_init_noblock(sched-wdtimer, nktbase, + xnsched_watchdog_handler); xntimer_set_name(sched-wdtimer, [watchdog]); xntimer_set_priority(sched-wdtimer, XNTIMER_LOPRIO); xntimer_set_sched(sched-wdtimer, sched); I.e. the watchdog timer should not be stopped by any ongoing debug session of a Xenomai app. Will queue this for upstream. Yes, that makes a lot of sense now. The watchdog would not fire if the task was single-stepped anyway, since the latter would have been moved to secondary mode first. Yep. Did you see this bug happening in a uniprocessor context as well? No, as it is impossible on a uniprocessor to interact with gdb if a cpu hog - the only existing CPU is simply not available. :) I was rather thinking of your hit-a-breakpoint-or-^C-early scenario... I thought you did see this on UP as well, and scratched my head to understand how this would have been possible. Ok, so let's merge this. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org
Re: [Xenomai-core] [PATCH] Mayday support
On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote: Philippe Gerum wrote: On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). I can't reproduce this easily here; it happened only once on a lite52xx, and then disappeared; no way to reproduce this once on a dual core atom in 64bit mode, or on a x86_32 single core platform either. But I still saw it once on a powerpc target, so this looks like a generic time-dependent issue. Do you have the same behavior on a single core config, You cannot reproduce it on a single core as the CPU hog will occupy that core and gdb cannot be operated. and/or without WARNSW enabled? Just tried and disabled WARNSW in the test below: no difference. Also, could you post your hog test code? maybe there is a difference with the way I'm testing. #include signal.h #include native/task.h #include sys/mman.h #include stdlib.h void sighandler(int sig, siginfo_t *si, void *context) { printf(SIGDEBUG: reason=%d\n, si-si_value.sival_int); exit(1); } void loop(void *arg) { RT_TASK_INFO info; while (1) if (!arg) rt_task_inquire(NULL, info); } int main(int argc, const char *argv[]) { struct sigaction sa; RT_TASK task; sigemptyset(sa.sa_mask); sa.sa_sigaction = sighandler; sa.sa_flags = SA_SIGINFO; sigaction(SIGDEBUG, sa, NULL); mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_spawn(task, cpu-hog, 0, 99, T_JOINABLE|T_WARNSW, loop, (void *)(long)((argc 1) strcmp(argv[1], --lethal) == 0)); rt_task_join(task); return 0; } I can't reproduce this issue, leaving the watchdog threshold to the default value (4s). CONFIG_XENO_OPT_WATCHDOG=y CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60 60s seems way too long to have a chance of recovering from a runaway loop to a reasonably sane state. Do you still see the issue with shorter timeouts? CONFIG_XENO_OPT_SHIRQ=y CONFIG_XENO_OPT_SELECT=y # # Timing # CONFIG_XENO_OPT_TIMING_PERIODIC=y CONFIG_XENO_OPT_TIMING_VIRTICK=1000 CONFIG_XENO_OPT_TIMING_SCHEDLAT=0 # # Scalability # CONFIG_XENO_OPT_SCALABLE_SCHED=y # CONFIG_XENO_OPT_TIMER_LIST is not set CONFIG_XENO_OPT_TIMER_HEAP=y # CONFIG_XENO_OPT_TIMER_WHEEL is not set CONFIG_XENO_OPT_TIMER_HEAP_CAPACITY=256 ... Maybe this has some influence as well. The 'RR' correlates with starting the hog, with or without gdb. Jan -- Philippe. ___ Xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
Philippe Gerum wrote: On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote: Philippe Gerum wrote: On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). I can't reproduce this easily here; it happened only once on a lite52xx, and then disappeared; no way to reproduce this once on a dual core atom in 64bit mode, or on a x86_32 single core platform either. But I still saw it once on a powerpc target, so this looks like a generic time-dependent issue. Do you have the same behavior on a single core config, You cannot reproduce it on a single core as the CPU hog will occupy that core and gdb cannot be operated. and/or without WARNSW enabled? Just tried and disabled WARNSW in the test below: no difference. Also, could you post your hog test code? maybe there is a difference with the way I'm testing. #include signal.h #include native/task.h #include sys/mman.h #include stdlib.h void sighandler(int sig, siginfo_t *si, void *context) { printf(SIGDEBUG: reason=%d\n, si-si_value.sival_int); exit(1); } void loop(void *arg) { RT_TASK_INFO info; while (1) if (!arg) rt_task_inquire(NULL, info); } int main(int argc, const char *argv[]) { struct sigaction sa; RT_TASK task; sigemptyset(sa.sa_mask); sa.sa_sigaction = sighandler; sa.sa_flags = SA_SIGINFO; sigaction(SIGDEBUG, sa, NULL); mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_spawn(task, cpu-hog, 0, 99, T_JOINABLE|T_WARNSW, loop, (void *)(long)((argc 1) strcmp(argv[1], --lethal) == 0)); rt_task_join(task); return 0; } I can't reproduce this issue, leaving the watchdog threshold to the default value (4s). CONFIG_XENO_OPT_WATCHDOG=y CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60 60s seems way too long to have a chance of recovering from a runaway loop to a reasonably sane state. That's required for debugging the kernel. Do you still see the issue with shorter timeouts? Yes, I usually lower the timeout before triggering the issue. OK, I will try to find some time to look closer at this. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
On Tue, 2010-07-06 at 17:54 +0200, Jan Kiszka wrote: CONFIG_XENO_OPT_WATCHDOG=y CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60 60s seems way too long to have a chance of recovering from a runaway loop to a reasonably sane state. That's required for debugging the kernel. I don't understand this requirement. Any insight? Do you still see the issue with shorter timeouts? Yes, I usually lower the timeout before triggering the issue. OK, I will try to find some time to look closer at this. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
Philippe Gerum wrote: On Tue, 2010-07-06 at 17:54 +0200, Jan Kiszka wrote: CONFIG_XENO_OPT_WATCHDOG=y CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60 60s seems way too long to have a chance of recovering from a runaway loop to a reasonably sane state. That's required for debugging the kernel. I don't understand this requirement. Any insight? While you step though a Xenomai task context, timers continue to tick. So the period spent in that context gets huge, and soon the task will be shot by the watchdog. Likely a limitation of kvm (interrupts should be blockable in singlestep mode). Haven't looked at all details yet, just picked the lazy workaround. Of course, we don't use this value on real HW. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
Philippe Gerum wrote: On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). I can't reproduce this easily here; it happened only once on a lite52xx, and then disappeared; no way to reproduce this once on a dual core atom in 64bit mode, or on a x86_32 single core platform either. But I still saw it once on a powerpc target, so this looks like a generic time-dependent issue. Do you have the same behavior on a single core config, You cannot reproduce it on a single core as the CPU hog will occupy that core and gdb cannot be operated. and/or without WARNSW enabled? Just tried and disabled WARNSW in the test below: no difference. Also, could you post your hog test code? maybe there is a difference with the way I'm testing. #include signal.h #include native/task.h #include sys/mman.h #include stdlib.h void sighandler(int sig, siginfo_t *si, void *context) { printf(SIGDEBUG: reason=%d\n, si-si_value.sival_int); exit(1); } void loop(void *arg) { RT_TASK_INFO info; while (1) if (!arg) rt_task_inquire(NULL, info); } int main(int argc, const char *argv[]) { struct sigaction sa; RT_TASK task; sigemptyset(sa.sa_mask); sa.sa_sigaction = sighandler; sa.sa_flags = SA_SIGINFO; sigaction(SIGDEBUG, sa, NULL); mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_spawn(task, cpu-hog, 0, 99, T_JOINABLE|T_WARNSW, loop, (void *)(long)((argc 1) strcmp(argv[1], --lethal) == 0)); rt_task_join(task); return 0; } # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek. This symbolic stat mode label looks weird. Hmm, haven't noticed this yet. I'm running a kind of all-yes config, namely: ... CONFIG_XENOMAI=y CONFIG_XENO_GENERIC_STACKPOOL=y CONFIG_XENO_FASTSYNCH=y CONFIG_XENO_OPT_NUCLEUS=y CONFIG_XENO_OPT_PERVASIVE=y CONFIG_XENO_OPT_PRIOCPL=y CONFIG_XENO_OPT_PIPELINE_HEAD=y CONFIG_XENO_OPT_SCHED_CLASSES=y CONFIG_XENO_OPT_SCHED_TP=y CONFIG_XENO_OPT_SCHED_TP_NRPART=4 CONFIG_XENO_OPT_SCHED_SPORADIC=y CONFIG_XENO_OPT_SCHED_SPORADIC_MAXREPL=8 CONFIG_XENO_OPT_PIPE=y CONFIG_XENO_OPT_MAP=y CONFIG_XENO_OPT_PIPE_NRDEV=32 CONFIG_XENO_OPT_REGISTRY_NRSLOTS=512 CONFIG_XENO_OPT_SYS_HEAPSZ=256 CONFIG_XENO_OPT_SYS_STACKPOOLSZ=128 CONFIG_XENO_OPT_SEM_HEAPSZ=12 CONFIG_XENO_OPT_GLOBAL_SEM_HEAPSZ=12 CONFIG_XENO_OPT_STATS=y CONFIG_XENO_OPT_DEBUG=y # CONFIG_XENO_OPT_DEBUG_NUCLEUS is not set # CONFIG_XENO_OPT_DEBUG_XNLOCK is not
Re: [Xenomai-core] [PATCH] Mayday support
On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote: Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). I can't reproduce this easily here; it happened only once on a lite52xx, and then disappeared; no way to reproduce this once on a dual core atom in 64bit mode, or on a x86_32 single core platform either. But I still saw it once on a powerpc target, so this looks like a generic time-dependent issue. Do you have the same behavior on a single core config, and/or without WARNSW enabled? Also, could you post your hog test code? maybe there is a difference with the way I'm testing. # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 Eeek. This symbolic stat mode label looks weird. 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] Mayday support
Philippe Gerum wrote: I've toyed a bit to find a generic approach for the nucleus to regain complete control over a userland application running in a syscall-less loop. The original issue was about recovering gracefully from a runaway situation detected by the nucleus watchdog, where a thread would spin in primary mode without issuing any syscall, but this would also apply for real-time signals pending for such a thread. Currently, Xenomai rt signals cannot preempt syscall-less code running in primary mode either. The major difference between the previous approaches we discussed about and this one, is the fact that we now force the runaway thread to run a piece of valid code that calls into the nucleus. We do not force the thread to run faulty code or at a faulty address anymore. Therefore, we can reuse this feature to improve the rt signal management, without having to forge yet-another signal stack frame for this. The code introduced only fixes the watchdog related issue, but also does some groundwork for enhancing the rt signal support later. The implementation details can be found here: http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c The current mayday support is only available for powerpc and x86 for now, more will come in the next days. To have it enabled, you have to upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86, 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a new interface available from those latest patches. The current implementation does not break the 2.5.x ABI on purpose, so we could merge it into the stable branch. We definitely need user feedback on this. Typically, does arming the nucleus watchdog with that patch support in, properly recovers from your favorite get me out of here situation? TIA, You can pull this stuff from git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch. I've retested the feature as it's now in master, and it has one remaining problem: If you run the cpu hog under gdb control and try to break out of the while(1) loop, this doesn't work before the watchdog expired - of course. But if you send the break before the expiry (or hit a breakpoint), something goes wrong. The Xenomai task continues to spin, and there is no chance to kill its process (only gdb). # cat /proc/xenomai/sched CPU PIDCLASS PRI TIMEOUT TIMEBASE STAT NAME 0 0 idle-1 - master RR ROOT/0 1 0 idle-1 - master R ROOT/1 0 6120 rt 99 - master Tt cpu-hog # cat /proc/xenomai/stat CPU PIDMSWCSWPFSTAT %CPU NAME 0 0 0 0 0 005000880.0 ROOT/0 1 0 0 0 0 00500080 99.7 ROOT/1 0 6120 0 1 0 00342180 100.0 cpu-hog 0 0 0 21005 0 0.0 IRQ3340: [timer] 1 0 0 35887 0 0.3 IRQ3340: [timer] Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core