Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Philippe Gerum
On Fri, 2010-08-20 at 16:06 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> Philippe Gerum wrote:
>  I've toyed a bit to find a generic approach for the nucleus to regain
>  complete control over a userland application running in a syscall-less
>  loop.
> 
>  The original issue was about recovering gracefully from a runaway
>  situation detected by the nucleus watchdog, where a thread would spin in
>  primary mode without issuing any syscall, but this would also apply for
>  real-time signals pending for such a thread. Currently, Xenomai rt
>  signals cannot preempt syscall-less code running in primary mode either.
> 
>  The major difference between the previous approaches we discussed about
>  and this one, is the fact that we now force the runaway thread to run a
>  piece of valid code that calls into the nucleus. We do not force the
>  thread to run faulty code or at a faulty address anymore. Therefore, we
>  can reuse this feature to improve the rt signal management, without
>  having to forge yet-another signal stack frame for this.
> 
>  The code introduced only fixes the watchdog related issue, but also does
>  some groundwork for enhancing the rt signal support later. The
>  implementation details can be found here:
>  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> 
>  The current mayday support is only available for powerpc and x86 for
>  now, more will come in the next days. To have it enabled, you have to
>  upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
>  2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
>  new interface available from those latest patches.
> 
>  The current implementation does not break the 2.5.x ABI on purpose, so
>  we could merge it into the stable branch.
> 
>  We definitely need user feedback on this. Typically, does arming the
>  nucleus watchdog with that patch support in, properly recovers from your
>  favorite "get me out of here" situation? TIA,
> 
>  You can pull this stuff from
>  git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> 
> >>> I've retested the feature as it's now in master, and it has one
> >>> remaining problem: If you run the cpu hog under gdb control and try to
> >>> break out of the while(1) loop, this doesn't work before the watchdog
> >>> expired - of course. But if you send the break before the expiry (or hit
> >>> a breakpoint), something goes wrong. The Xenomai task continues to spin,
> >>> and there is no chance to kill its process (only gdb).
> >>>
> >>> # cat /proc/xenomai/sched
> >>> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
> >>>   0  0  idle-1  - master RR ROOT/0
> > 
> > Eeek, we really need to have a look at this funky STAT output.
> 
> I've a patch for this queued as well. Was only a cosmetic thing.
> 
> > 
> >>>   1  0  idle-1  - master R  ROOT/1
> >>>   0  6120   rt  99  - master Tt cpu-hog
> >>> # cat /proc/xenomai/stat
> >>> CPU  PIDMSWCSWPFSTAT   %CPU  NAME
> >>>   0  0  0  0  0 005000880.0  ROOT/0
> >>>   1  0  0  0  0 00500080   99.7  ROOT/1
> >>>   0  6120   0  1  0 00342180  100.0  cpu-hog
> >>>   0  0  0  21005  0 0.0  IRQ3340: [timer]
> >>>   1  0  0  35887  0 0.3  IRQ3340: [timer]
> >>>
> >> Fixable by this tiny change:
> >>
> >> diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c
> >> index 5242d9f..04a344e 100644
> >> --- a/ksrc/nucleus/sched.c
> >> +++ b/ksrc/nucleus/sched.c
> >> @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu)
> >> xnthread_name(&sched->rootcb));
> >>  
> >>  #ifdef CONFIG_XENO_OPT_WATCHDOG
> >> -  xntimer_init(&sched->wdtimer, &nktbase, xnsched_watchdog_handler);
> >> +  xntimer_init_noblock(&sched->wdtimer, &nktbase,
> >> +   xnsched_watchdog_handler);
> >>xntimer_set_name(&sched->wdtimer, "[watchdog]");
> >>xntimer_set_priority(&sched->wdtimer, XNTIMER_LOPRIO);
> >>xntimer_set_sched(&sched->wdtimer, sched);
> >>
> >>
> >> I.e. the watchdog timer should not be stopped by any ongoing debug
> >> session of a Xenomai app. Will queue this for upstream.
> > 
> > Yes, that makes a lot of sense now. The watchdog would not fire if the
> > task was single-stepped anyway, since the latter would have been moved
> > to secondary mode first.
> 
> Yep.
> 
> > 
> > Did you see this bug happening in a uniprocessor context as well?
> 
> No, as it is impossible on a uniprocessor to interact with gdb if a cpu
> hog - the onl

Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Jan Kiszka
Philippe Gerum wrote:
> On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
 I've toyed a bit to find a generic approach for the nucleus to regain
 complete control over a userland application running in a syscall-less
 loop.

 The original issue was about recovering gracefully from a runaway
 situation detected by the nucleus watchdog, where a thread would spin in
 primary mode without issuing any syscall, but this would also apply for
 real-time signals pending for such a thread. Currently, Xenomai rt
 signals cannot preempt syscall-less code running in primary mode either.

 The major difference between the previous approaches we discussed about
 and this one, is the fact that we now force the runaway thread to run a
 piece of valid code that calls into the nucleus. We do not force the
 thread to run faulty code or at a faulty address anymore. Therefore, we
 can reuse this feature to improve the rt signal management, without
 having to forge yet-another signal stack frame for this.

 The code introduced only fixes the watchdog related issue, but also does
 some groundwork for enhancing the rt signal support later. The
 implementation details can be found here:
 http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c

 The current mayday support is only available for powerpc and x86 for
 now, more will come in the next days. To have it enabled, you have to
 upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
 new interface available from those latest patches.

 The current implementation does not break the 2.5.x ABI on purpose, so
 we could merge it into the stable branch.

 We definitely need user feedback on this. Typically, does arming the
 nucleus watchdog with that patch support in, properly recovers from your
 favorite "get me out of here" situation? TIA,

 You can pull this stuff from
 git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.

>>> I've retested the feature as it's now in master, and it has one
>>> remaining problem: If you run the cpu hog under gdb control and try to
>>> break out of the while(1) loop, this doesn't work before the watchdog
>>> expired - of course. But if you send the break before the expiry (or hit
>>> a breakpoint), something goes wrong. The Xenomai task continues to spin,
>>> and there is no chance to kill its process (only gdb).
>>>
>>> # cat /proc/xenomai/sched
>>> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
>>>   0  0  idle-1  - master RR ROOT/0
> 
> Eeek, we really need to have a look at this funky STAT output.

I've a patch for this queued as well. Was only a cosmetic thing.

> 
>>>   1  0  idle-1  - master R  ROOT/1
>>>   0  6120   rt  99  - master Tt cpu-hog
>>> # cat /proc/xenomai/stat
>>> CPU  PIDMSWCSWPFSTAT   %CPU  NAME
>>>   0  0  0  0  0 005000880.0  ROOT/0
>>>   1  0  0  0  0 00500080   99.7  ROOT/1
>>>   0  6120   0  1  0 00342180  100.0  cpu-hog
>>>   0  0  0  21005  0 0.0  IRQ3340: [timer]
>>>   1  0  0  35887  0 0.3  IRQ3340: [timer]
>>>
>> Fixable by this tiny change:
>>
>> diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c
>> index 5242d9f..04a344e 100644
>> --- a/ksrc/nucleus/sched.c
>> +++ b/ksrc/nucleus/sched.c
>> @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu)
>>   xnthread_name(&sched->rootcb));
>>  
>>  #ifdef CONFIG_XENO_OPT_WATCHDOG
>> -xntimer_init(&sched->wdtimer, &nktbase, xnsched_watchdog_handler);
>> +xntimer_init_noblock(&sched->wdtimer, &nktbase,
>> + xnsched_watchdog_handler);
>>  xntimer_set_name(&sched->wdtimer, "[watchdog]");
>>  xntimer_set_priority(&sched->wdtimer, XNTIMER_LOPRIO);
>>  xntimer_set_sched(&sched->wdtimer, sched);
>>
>>
>> I.e. the watchdog timer should not be stopped by any ongoing debug
>> session of a Xenomai app. Will queue this for upstream.
> 
> Yes, that makes a lot of sense now. The watchdog would not fire if the
> task was single-stepped anyway, since the latter would have been moved
> to secondary mode first.

Yep.

> 
> Did you see this bug happening in a uniprocessor context as well?

No, as it is impossible on a uniprocessor to interact with gdb if a cpu
hog - the only existing CPU is simply not available. :)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org

Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Philippe Gerum
On Fri, 2010-08-20 at 14:32 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Philippe Gerum wrote:
> >> I've toyed a bit to find a generic approach for the nucleus to regain
> >> complete control over a userland application running in a syscall-less
> >> loop.
> >>
> >> The original issue was about recovering gracefully from a runaway
> >> situation detected by the nucleus watchdog, where a thread would spin in
> >> primary mode without issuing any syscall, but this would also apply for
> >> real-time signals pending for such a thread. Currently, Xenomai rt
> >> signals cannot preempt syscall-less code running in primary mode either.
> >>
> >> The major difference between the previous approaches we discussed about
> >> and this one, is the fact that we now force the runaway thread to run a
> >> piece of valid code that calls into the nucleus. We do not force the
> >> thread to run faulty code or at a faulty address anymore. Therefore, we
> >> can reuse this feature to improve the rt signal management, without
> >> having to forge yet-another signal stack frame for this.
> >>
> >> The code introduced only fixes the watchdog related issue, but also does
> >> some groundwork for enhancing the rt signal support later. The
> >> implementation details can be found here:
> >> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> >>
> >> The current mayday support is only available for powerpc and x86 for
> >> now, more will come in the next days. To have it enabled, you have to
> >> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> >> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> >> new interface available from those latest patches.
> >>
> >> The current implementation does not break the 2.5.x ABI on purpose, so
> >> we could merge it into the stable branch.
> >>
> >> We definitely need user feedback on this. Typically, does arming the
> >> nucleus watchdog with that patch support in, properly recovers from your
> >> favorite "get me out of here" situation? TIA,
> >>
> >> You can pull this stuff from
> >> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> >>
> > 
> > I've retested the feature as it's now in master, and it has one
> > remaining problem: If you run the cpu hog under gdb control and try to
> > break out of the while(1) loop, this doesn't work before the watchdog
> > expired - of course. But if you send the break before the expiry (or hit
> > a breakpoint), something goes wrong. The Xenomai task continues to spin,
> > and there is no chance to kill its process (only gdb).
> > 
> > # cat /proc/xenomai/sched
> > CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
> >   0  0  idle-1  - master RR ROOT/0

Eeek, we really need to have a look at this funky STAT output.

> >   1  0  idle-1  - master R  ROOT/1
> >   0  6120   rt  99  - master Tt cpu-hog
> > # cat /proc/xenomai/stat
> > CPU  PIDMSWCSWPFSTAT   %CPU  NAME
> >   0  0  0  0  0 005000880.0  ROOT/0
> >   1  0  0  0  0 00500080   99.7  ROOT/1
> >   0  6120   0  1  0 00342180  100.0  cpu-hog
> >   0  0  0  21005  0 0.0  IRQ3340: [timer]
> >   1  0  0  35887  0 0.3  IRQ3340: [timer]
> > 
> 
> Fixable by this tiny change:
> 
> diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c
> index 5242d9f..04a344e 100644
> --- a/ksrc/nucleus/sched.c
> +++ b/ksrc/nucleus/sched.c
> @@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu)
>xnthread_name(&sched->rootcb));
>  
>  #ifdef CONFIG_XENO_OPT_WATCHDOG
> - xntimer_init(&sched->wdtimer, &nktbase, xnsched_watchdog_handler);
> + xntimer_init_noblock(&sched->wdtimer, &nktbase,
> +  xnsched_watchdog_handler);
>   xntimer_set_name(&sched->wdtimer, "[watchdog]");
>   xntimer_set_priority(&sched->wdtimer, XNTIMER_LOPRIO);
>   xntimer_set_sched(&sched->wdtimer, sched);
> 
> 
> I.e. the watchdog timer should not be stopped by any ongoing debug
> session of a Xenomai app. Will queue this for upstream.

Yes, that makes a lot of sense now. The watchdog would not fire if the
task was single-stepped anyway, since the latter would have been moved
to secondary mode first.

Did you see this bug happening in a uniprocessor context as well?

> 
> Jan
> 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-08-20 Thread Jan Kiszka
Jan Kiszka wrote:
> Philippe Gerum wrote:
>> I've toyed a bit to find a generic approach for the nucleus to regain
>> complete control over a userland application running in a syscall-less
>> loop.
>>
>> The original issue was about recovering gracefully from a runaway
>> situation detected by the nucleus watchdog, where a thread would spin in
>> primary mode without issuing any syscall, but this would also apply for
>> real-time signals pending for such a thread. Currently, Xenomai rt
>> signals cannot preempt syscall-less code running in primary mode either.
>>
>> The major difference between the previous approaches we discussed about
>> and this one, is the fact that we now force the runaway thread to run a
>> piece of valid code that calls into the nucleus. We do not force the
>> thread to run faulty code or at a faulty address anymore. Therefore, we
>> can reuse this feature to improve the rt signal management, without
>> having to forge yet-another signal stack frame for this.
>>
>> The code introduced only fixes the watchdog related issue, but also does
>> some groundwork for enhancing the rt signal support later. The
>> implementation details can be found here:
>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
>>
>> The current mayday support is only available for powerpc and x86 for
>> now, more will come in the next days. To have it enabled, you have to
>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
>> new interface available from those latest patches.
>>
>> The current implementation does not break the 2.5.x ABI on purpose, so
>> we could merge it into the stable branch.
>>
>> We definitely need user feedback on this. Typically, does arming the
>> nucleus watchdog with that patch support in, properly recovers from your
>> favorite "get me out of here" situation? TIA,
>>
>> You can pull this stuff from
>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
>>
> 
> I've retested the feature as it's now in master, and it has one
> remaining problem: If you run the cpu hog under gdb control and try to
> break out of the while(1) loop, this doesn't work before the watchdog
> expired - of course. But if you send the break before the expiry (or hit
> a breakpoint), something goes wrong. The Xenomai task continues to spin,
> and there is no chance to kill its process (only gdb).
> 
> # cat /proc/xenomai/sched
> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
>   0  0  idle-1  - master RR ROOT/0
>   1  0  idle-1  - master R  ROOT/1
>   0  6120   rt  99  - master Tt cpu-hog
> # cat /proc/xenomai/stat
> CPU  PIDMSWCSWPFSTAT   %CPU  NAME
>   0  0  0  0  0 005000880.0  ROOT/0
>   1  0  0  0  0 00500080   99.7  ROOT/1
>   0  6120   0  1  0 00342180  100.0  cpu-hog
>   0  0  0  21005  0 0.0  IRQ3340: [timer]
>   1  0  0  35887  0 0.3  IRQ3340: [timer]
> 

Fixable by this tiny change:

diff --git a/ksrc/nucleus/sched.c b/ksrc/nucleus/sched.c
index 5242d9f..04a344e 100644
--- a/ksrc/nucleus/sched.c
+++ b/ksrc/nucleus/sched.c
@@ -175,7 +175,8 @@ void xnsched_init(struct xnsched *sched, int cpu)
 xnthread_name(&sched->rootcb));
 
 #ifdef CONFIG_XENO_OPT_WATCHDOG
-   xntimer_init(&sched->wdtimer, &nktbase, xnsched_watchdog_handler);
+   xntimer_init_noblock(&sched->wdtimer, &nktbase,
+xnsched_watchdog_handler);
xntimer_set_name(&sched->wdtimer, "[watchdog]");
xntimer_set_priority(&sched->wdtimer, XNTIMER_LOPRIO);
xntimer_set_sched(&sched->wdtimer, sched);


I.e. the watchdog timer should not be stopped by any ongoing debug
session of a Xenomai app. Will queue this for upstream.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-07-06 Thread Jan Kiszka
Philippe Gerum wrote:
> On Tue, 2010-07-06 at 17:54 +0200, Jan Kiszka wrote:
 CONFIG_XENO_OPT_WATCHDOG=y
 CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60
>>> 60s seems way too long to have a chance of recovering from a runaway
>>> loop to a reasonably sane state.
>> That's required for debugging the kernel.
>>
> 
> I don't understand this requirement. Any insight?

While you step though a Xenomai task context, timers continue to tick.
So the period spent in that context gets huge, and soon the task will be
shot by the watchdog. Likely a limitation of kvm (interrupts should be
blockable in singlestep mode). Haven't looked at all details yet, just
picked the lazy workaround.

Of course, we don't use this value on real HW.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-07-06 Thread Philippe Gerum
On Tue, 2010-07-06 at 17:54 +0200, Jan Kiszka wrote:
> >> CONFIG_XENO_OPT_WATCHDOG=y
> >> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60
> > 
> > 60s seems way too long to have a chance of recovering from a runaway
> > loop to a reasonably sane state.
> 
> That's required for debugging the kernel.
> 

I don't understand this requirement. Any insight?

> > Do you still see the issue with shorter
> > timeouts?
> 
> Yes, I usually lower the timeout before triggering the issue.
> 
> OK, I will try to find some time to look closer at this.
> 
> Jan
> 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-07-06 Thread Jan Kiszka
Philippe Gerum wrote:
> On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
> I've toyed a bit to find a generic approach for the nucleus to regain
> complete control over a userland application running in a syscall-less
> loop.
>
> The original issue was about recovering gracefully from a runaway
> situation detected by the nucleus watchdog, where a thread would spin in
> primary mode without issuing any syscall, but this would also apply for
> real-time signals pending for such a thread. Currently, Xenomai rt
> signals cannot preempt syscall-less code running in primary mode either.
>
> The major difference between the previous approaches we discussed about
> and this one, is the fact that we now force the runaway thread to run a
> piece of valid code that calls into the nucleus. We do not force the
> thread to run faulty code or at a faulty address anymore. Therefore, we
> can reuse this feature to improve the rt signal management, without
> having to forge yet-another signal stack frame for this.
>
> The code introduced only fixes the watchdog related issue, but also does
> some groundwork for enhancing the rt signal support later. The
> implementation details can be found here:
> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
>
> The current mayday support is only available for powerpc and x86 for
> now, more will come in the next days. To have it enabled, you have to
> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> new interface available from those latest patches.
>
> The current implementation does not break the 2.5.x ABI on purpose, so
> we could merge it into the stable branch.
>
> We definitely need user feedback on this. Typically, does arming the
> nucleus watchdog with that patch support in, properly recovers from your
> favorite "get me out of here" situation? TIA,
>
> You can pull this stuff from
> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
>
 I've retested the feature as it's now in master, and it has one
 remaining problem: If you run the cpu hog under gdb control and try to
 break out of the while(1) loop, this doesn't work before the watchdog
 expired - of course. But if you send the break before the expiry (or hit
 a breakpoint), something goes wrong. The Xenomai task continues to spin,
 and there is no chance to kill its process (only gdb).
>>> I can't reproduce this easily here; it happened only once on a lite52xx,
>>> and then disappeared; no way to reproduce this once on a dual core atom
>>> in 64bit mode, or on a x86_32 single core platform either. But I still
>>> saw it once on a powerpc target, so this looks like a generic
>>> time-dependent issue.
>>>
>>> Do you have the same behavior on a single core config,
>> You cannot reproduce it on a single core as the CPU hog will occupy that
>> core and gdb cannot be operated.
>>
>>> and/or without
>>> WARNSW enabled?
>> Just tried and disabled WARNSW in the test below: no difference.
>>
>>> Also, could you post your hog test code? maybe there is a difference
>>> with the way I'm testing.
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> void sighandler(int sig, siginfo_t *si, void *context)
>> {
>>  printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int);
>>  exit(1);
>> }
>>
>> void loop(void *arg)
>> {
>>  RT_TASK_INFO info;
>>
>>  while (1)
>>  if (!arg)
>>  rt_task_inquire(NULL, &info);
>> }
>>
>> int main(int argc, const char *argv[])
>> {
>>  struct sigaction sa;
>>  RT_TASK task;
>>
>>  sigemptyset(&sa.sa_mask);
>>  sa.sa_sigaction = sighandler;
>>  sa.sa_flags = SA_SIGINFO;
>>  sigaction(SIGDEBUG, &sa, NULL);
>>
>>  mlockall(MCL_CURRENT|MCL_FUTURE);
>>  rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop,
>>  (void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0));
>>  rt_task_join(&task);
>>
>>  return 0;
>> }
> 
> I can't reproduce this issue, leaving the watchdog threshold to the
> default value (4s).
> 
>> CONFIG_XENO_OPT_WATCHDOG=y
>> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60
> 
> 60s seems way too long to have a chance of recovering from a runaway
> loop to a reasonably sane state.

That's required for debugging the kernel.

> Do you still see the issue with shorter
> timeouts?

Yes, I usually lower the timeout before triggering the issue.

OK, I will try to find some time to look closer at this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core m

Re: [Xenomai-core] [PATCH] Mayday support

2010-07-06 Thread Philippe Gerum
On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
> >> Philippe Gerum wrote:
> >>> I've toyed a bit to find a generic approach for the nucleus to regain
> >>> complete control over a userland application running in a syscall-less
> >>> loop.
> >>>
> >>> The original issue was about recovering gracefully from a runaway
> >>> situation detected by the nucleus watchdog, where a thread would spin in
> >>> primary mode without issuing any syscall, but this would also apply for
> >>> real-time signals pending for such a thread. Currently, Xenomai rt
> >>> signals cannot preempt syscall-less code running in primary mode either.
> >>>
> >>> The major difference between the previous approaches we discussed about
> >>> and this one, is the fact that we now force the runaway thread to run a
> >>> piece of valid code that calls into the nucleus. We do not force the
> >>> thread to run faulty code or at a faulty address anymore. Therefore, we
> >>> can reuse this feature to improve the rt signal management, without
> >>> having to forge yet-another signal stack frame for this.
> >>>
> >>> The code introduced only fixes the watchdog related issue, but also does
> >>> some groundwork for enhancing the rt signal support later. The
> >>> implementation details can be found here:
> >>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> >>>
> >>> The current mayday support is only available for powerpc and x86 for
> >>> now, more will come in the next days. To have it enabled, you have to
> >>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> >>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> >>> new interface available from those latest patches.
> >>>
> >>> The current implementation does not break the 2.5.x ABI on purpose, so
> >>> we could merge it into the stable branch.
> >>>
> >>> We definitely need user feedback on this. Typically, does arming the
> >>> nucleus watchdog with that patch support in, properly recovers from your
> >>> favorite "get me out of here" situation? TIA,
> >>>
> >>> You can pull this stuff from
> >>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> >>>
> >> I've retested the feature as it's now in master, and it has one
> >> remaining problem: If you run the cpu hog under gdb control and try to
> >> break out of the while(1) loop, this doesn't work before the watchdog
> >> expired - of course. But if you send the break before the expiry (or hit
> >> a breakpoint), something goes wrong. The Xenomai task continues to spin,
> >> and there is no chance to kill its process (only gdb).
> > 
> > I can't reproduce this easily here; it happened only once on a lite52xx,
> > and then disappeared; no way to reproduce this once on a dual core atom
> > in 64bit mode, or on a x86_32 single core platform either. But I still
> > saw it once on a powerpc target, so this looks like a generic
> > time-dependent issue.
> > 
> > Do you have the same behavior on a single core config,
> 
> You cannot reproduce it on a single core as the CPU hog will occupy that
> core and gdb cannot be operated.
> 
> > and/or without
> > WARNSW enabled?
> 
> Just tried and disabled WARNSW in the test below: no difference.
> 
> > 
> > Also, could you post your hog test code? maybe there is a difference
> > with the way I'm testing.
> 
> #include 
> #include 
> #include 
> #include 
> 
> void sighandler(int sig, siginfo_t *si, void *context)
> {
>   printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int);
>   exit(1);
> }
> 
> void loop(void *arg)
> {
>   RT_TASK_INFO info;
> 
>   while (1)
>   if (!arg)
>   rt_task_inquire(NULL, &info);
> }
> 
> int main(int argc, const char *argv[])
> {
>   struct sigaction sa;
>   RT_TASK task;
> 
>   sigemptyset(&sa.sa_mask);
>   sa.sa_sigaction = sighandler;
>   sa.sa_flags = SA_SIGINFO;
>   sigaction(SIGDEBUG, &sa, NULL);
> 
>   mlockall(MCL_CURRENT|MCL_FUTURE);
>   rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop,
>   (void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0));
>   rt_task_join(&task);
> 
>   return 0;
> }

I can't reproduce this issue, leaving the watchdog threshold to the
default value (4s).

> CONFIG_XENO_OPT_WATCHDOG=y
> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60

60s seems way too long to have a chance of recovering from a runaway
loop to a reasonably sane state. Do you still see the issue with shorter
timeouts?


> CONFIG_XENO_OPT_SHIRQ=y
> CONFIG_XENO_OPT_SELECT=y
> 
> #
> # Timing
> #
> CONFIG_XENO_OPT_TIMING_PERIODIC=y
> CONFIG_XENO_OPT_TIMING_VIRTICK=1000
> CONFIG_XENO_OPT_TIMING_SCHEDLAT=0
> 
> #
> # Scalability
> #
> CONFIG_XENO_OPT_SCALABLE_SCHED=y
> # CONFIG_XENO_OPT_TIMER_LIST is not set
> CONFIG_XENO_OPT_TIMER_HEAP=y
> # CONFIG_XENO_OPT_TIMER_WHEEL is not set
> CO

Re: [Xenomai-core] [PATCH] Mayday support

2010-06-28 Thread Philippe Gerum
On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
> >> Philippe Gerum wrote:
> >>> I've toyed a bit to find a generic approach for the nucleus to regain
> >>> complete control over a userland application running in a syscall-less
> >>> loop.
> >>>
> >>> The original issue was about recovering gracefully from a runaway
> >>> situation detected by the nucleus watchdog, where a thread would spin in
> >>> primary mode without issuing any syscall, but this would also apply for
> >>> real-time signals pending for such a thread. Currently, Xenomai rt
> >>> signals cannot preempt syscall-less code running in primary mode either.
> >>>
> >>> The major difference between the previous approaches we discussed about
> >>> and this one, is the fact that we now force the runaway thread to run a
> >>> piece of valid code that calls into the nucleus. We do not force the
> >>> thread to run faulty code or at a faulty address anymore. Therefore, we
> >>> can reuse this feature to improve the rt signal management, without
> >>> having to forge yet-another signal stack frame for this.
> >>>
> >>> The code introduced only fixes the watchdog related issue, but also does
> >>> some groundwork for enhancing the rt signal support later. The
> >>> implementation details can be found here:
> >>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> >>>
> >>> The current mayday support is only available for powerpc and x86 for
> >>> now, more will come in the next days. To have it enabled, you have to
> >>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> >>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> >>> new interface available from those latest patches.
> >>>
> >>> The current implementation does not break the 2.5.x ABI on purpose, so
> >>> we could merge it into the stable branch.
> >>>
> >>> We definitely need user feedback on this. Typically, does arming the
> >>> nucleus watchdog with that patch support in, properly recovers from your
> >>> favorite "get me out of here" situation? TIA,
> >>>
> >>> You can pull this stuff from
> >>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> >>>
> >> I've retested the feature as it's now in master, and it has one
> >> remaining problem: If you run the cpu hog under gdb control and try to
> >> break out of the while(1) loop, this doesn't work before the watchdog
> >> expired - of course. But if you send the break before the expiry (or hit
> >> a breakpoint), something goes wrong. The Xenomai task continues to spin,
> >> and there is no chance to kill its process (only gdb).
> > 
> > I can't reproduce this easily here; it happened only once on a lite52xx,
> > and then disappeared; no way to reproduce this once on a dual core atom
> > in 64bit mode, or on a x86_32 single core platform either. But I still
> > saw it once on a powerpc target, so this looks like a generic
> > time-dependent issue.
> > 
> > Do you have the same behavior on a single core config,
> 
> You cannot reproduce it on a single core as the CPU hog will occupy that
> core and gdb cannot be operated.

What I want is the lockup to happen; I'll start working from this point
using other means.

> 
> > and/or without
> > WARNSW enabled?
> 
> Just tried and disabled WARNSW in the test below: no difference.
> 

Ok.

> > 
> > Also, could you post your hog test code? maybe there is a difference
> > with the way I'm testing.
> 
> #include 
> #include 
> #include 
> #include 
> 
> void sighandler(int sig, siginfo_t *si, void *context)
> {
>   printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int);
>   exit(1);
> }
> 
> void loop(void *arg)
> {
>   RT_TASK_INFO info;
> 
>   while (1)
>   if (!arg)
>   rt_task_inquire(NULL, &info);
> }
> 
> int main(int argc, const char *argv[])
> {
>   struct sigaction sa;
>   RT_TASK task;
> 
>   sigemptyset(&sa.sa_mask);
>   sa.sa_sigaction = sighandler;
>   sa.sa_flags = SA_SIGINFO;
>   sigaction(SIGDEBUG, &sa, NULL);
> 
>   mlockall(MCL_CURRENT|MCL_FUTURE);
>   rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop,
>   (void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0));
>   rt_task_join(&task);
> 
>   return 0;
> }

Ok, will rebase on this code. Thanks.

> 
> > 
> >> # cat /proc/xenomai/sched
> >> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
> >>   0  0  idle-1  - master RR ROOT/0
> > 
> > Eeek. This symbolic stat mode label looks weird.
> 
> Hmm, haven't noticed this yet. I'm running a kind of all-yes config,
> namely:
> 
> ...
> CONFIG_XENOMAI=y
> CONFIG_XENO_GENERIC_STACKPOOL=y
> CONFIG_XENO_FASTSYNCH=y
> CONFIG_XENO_OPT_NUCLEUS=y
> CONFIG_XENO_OPT_PERVASIVE=y
> CONFIG_XENO_OPT_PRIOCPL=y
> CONFIG_XENO_OPT_PIPELINE_HEAD=y
> CONFIG_

Re: [Xenomai-core] [PATCH] Mayday support

2010-06-28 Thread Jan Kiszka
Philippe Gerum wrote:
> On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> I've toyed a bit to find a generic approach for the nucleus to regain
>>> complete control over a userland application running in a syscall-less
>>> loop.
>>>
>>> The original issue was about recovering gracefully from a runaway
>>> situation detected by the nucleus watchdog, where a thread would spin in
>>> primary mode without issuing any syscall, but this would also apply for
>>> real-time signals pending for such a thread. Currently, Xenomai rt
>>> signals cannot preempt syscall-less code running in primary mode either.
>>>
>>> The major difference between the previous approaches we discussed about
>>> and this one, is the fact that we now force the runaway thread to run a
>>> piece of valid code that calls into the nucleus. We do not force the
>>> thread to run faulty code or at a faulty address anymore. Therefore, we
>>> can reuse this feature to improve the rt signal management, without
>>> having to forge yet-another signal stack frame for this.
>>>
>>> The code introduced only fixes the watchdog related issue, but also does
>>> some groundwork for enhancing the rt signal support later. The
>>> implementation details can be found here:
>>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
>>>
>>> The current mayday support is only available for powerpc and x86 for
>>> now, more will come in the next days. To have it enabled, you have to
>>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
>>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
>>> new interface available from those latest patches.
>>>
>>> The current implementation does not break the 2.5.x ABI on purpose, so
>>> we could merge it into the stable branch.
>>>
>>> We definitely need user feedback on this. Typically, does arming the
>>> nucleus watchdog with that patch support in, properly recovers from your
>>> favorite "get me out of here" situation? TIA,
>>>
>>> You can pull this stuff from
>>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
>>>
>> I've retested the feature as it's now in master, and it has one
>> remaining problem: If you run the cpu hog under gdb control and try to
>> break out of the while(1) loop, this doesn't work before the watchdog
>> expired - of course. But if you send the break before the expiry (or hit
>> a breakpoint), something goes wrong. The Xenomai task continues to spin,
>> and there is no chance to kill its process (only gdb).
> 
> I can't reproduce this easily here; it happened only once on a lite52xx,
> and then disappeared; no way to reproduce this once on a dual core atom
> in 64bit mode, or on a x86_32 single core platform either. But I still
> saw it once on a powerpc target, so this looks like a generic
> time-dependent issue.
> 
> Do you have the same behavior on a single core config,

You cannot reproduce it on a single core as the CPU hog will occupy that
core and gdb cannot be operated.

> and/or without
> WARNSW enabled?

Just tried and disabled WARNSW in the test below: no difference.

> 
> Also, could you post your hog test code? maybe there is a difference
> with the way I'm testing.

#include 
#include 
#include 
#include 

void sighandler(int sig, siginfo_t *si, void *context)
{
printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int);
exit(1);
}

void loop(void *arg)
{
RT_TASK_INFO info;

while (1)
if (!arg)
rt_task_inquire(NULL, &info);
}

int main(int argc, const char *argv[])
{
struct sigaction sa;
RT_TASK task;

sigemptyset(&sa.sa_mask);
sa.sa_sigaction = sighandler;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGDEBUG, &sa, NULL);

mlockall(MCL_CURRENT|MCL_FUTURE);
rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop,
(void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0));
rt_task_join(&task);

return 0;
}

> 
>> # cat /proc/xenomai/sched
>> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
>>   0  0  idle-1  - master RR ROOT/0
> 
> Eeek. This symbolic stat mode label looks weird.

Hmm, haven't noticed this yet. I'm running a kind of all-yes config,
namely:

...
CONFIG_XENOMAI=y
CONFIG_XENO_GENERIC_STACKPOOL=y
CONFIG_XENO_FASTSYNCH=y
CONFIG_XENO_OPT_NUCLEUS=y
CONFIG_XENO_OPT_PERVASIVE=y
CONFIG_XENO_OPT_PRIOCPL=y
CONFIG_XENO_OPT_PIPELINE_HEAD=y
CONFIG_XENO_OPT_SCHED_CLASSES=y
CONFIG_XENO_OPT_SCHED_TP=y
CONFIG_XENO_OPT_SCHED_TP_NRPART=4
CONFIG_XENO_OPT_SCHED_SPORADIC=y
CONFIG_XENO_OPT_SCHED_SPORADIC_MAXREPL=8
CONFIG_XENO_OPT_PIPE=y
CONFIG_XENO_OPT_MAP=y
CONFIG_XENO_OPT_PIPE_NRDEV=32
CONFIG_XENO_OPT_REGISTRY_NRSLOTS=512
CONFIG_XENO_OPT_SYS_HEAPSZ=256
CONFIG_XENO_OPT_SYS_STACKPOOLSZ=128
CONFIG_XENO_OPT_SEM_HEAPSZ=12
CONFIG_XENO_OPT_GLOBAL_SEM_HEAPSZ=12
CO

Re: [Xenomai-core] [PATCH] Mayday support

2010-06-27 Thread Philippe Gerum
On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > I've toyed a bit to find a generic approach for the nucleus to regain
> > complete control over a userland application running in a syscall-less
> > loop.
> > 
> > The original issue was about recovering gracefully from a runaway
> > situation detected by the nucleus watchdog, where a thread would spin in
> > primary mode without issuing any syscall, but this would also apply for
> > real-time signals pending for such a thread. Currently, Xenomai rt
> > signals cannot preempt syscall-less code running in primary mode either.
> > 
> > The major difference between the previous approaches we discussed about
> > and this one, is the fact that we now force the runaway thread to run a
> > piece of valid code that calls into the nucleus. We do not force the
> > thread to run faulty code or at a faulty address anymore. Therefore, we
> > can reuse this feature to improve the rt signal management, without
> > having to forge yet-another signal stack frame for this.
> > 
> > The code introduced only fixes the watchdog related issue, but also does
> > some groundwork for enhancing the rt signal support later. The
> > implementation details can be found here:
> > http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> > 
> > The current mayday support is only available for powerpc and x86 for
> > now, more will come in the next days. To have it enabled, you have to
> > upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> > 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> > new interface available from those latest patches.
> > 
> > The current implementation does not break the 2.5.x ABI on purpose, so
> > we could merge it into the stable branch.
> > 
> > We definitely need user feedback on this. Typically, does arming the
> > nucleus watchdog with that patch support in, properly recovers from your
> > favorite "get me out of here" situation? TIA,
> > 
> > You can pull this stuff from
> > git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> > 
> 
> I've retested the feature as it's now in master, and it has one
> remaining problem: If you run the cpu hog under gdb control and try to
> break out of the while(1) loop, this doesn't work before the watchdog
> expired - of course. But if you send the break before the expiry (or hit
> a breakpoint), something goes wrong. The Xenomai task continues to spin,
> and there is no chance to kill its process (only gdb).

I can't reproduce this easily here; it happened only once on a lite52xx,
and then disappeared; no way to reproduce this once on a dual core atom
in 64bit mode, or on a x86_32 single core platform either. But I still
saw it once on a powerpc target, so this looks like a generic
time-dependent issue.

Do you have the same behavior on a single core config, and/or without
WARNSW enabled?

Also, could you post your hog test code? maybe there is a difference
with the way I'm testing.

> 
> # cat /proc/xenomai/sched
> CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
>   0  0  idle-1  - master RR ROOT/0

Eeek. This symbolic stat mode label looks weird.

>   1  0  idle-1  - master R  ROOT/1
>   0  6120   rt  99  - master Tt cpu-hog
> # cat /proc/xenomai/stat
> CPU  PIDMSWCSWPFSTAT   %CPU  NAME
>   0  0  0  0  0 005000880.0  ROOT/0
>   1  0  0  0  0 00500080   99.7  ROOT/1
>   0  6120   0  1  0 00342180  100.0  cpu-hog
>   0  0  0  21005  0 0.0  IRQ3340: [timer]
>   1  0  0  35887  0 0.3  IRQ3340: [timer]
> 
> Jan
> 


-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-06-24 Thread Jan Kiszka
Philippe Gerum wrote:
> I've toyed a bit to find a generic approach for the nucleus to regain
> complete control over a userland application running in a syscall-less
> loop.
> 
> The original issue was about recovering gracefully from a runaway
> situation detected by the nucleus watchdog, where a thread would spin in
> primary mode without issuing any syscall, but this would also apply for
> real-time signals pending for such a thread. Currently, Xenomai rt
> signals cannot preempt syscall-less code running in primary mode either.
> 
> The major difference between the previous approaches we discussed about
> and this one, is the fact that we now force the runaway thread to run a
> piece of valid code that calls into the nucleus. We do not force the
> thread to run faulty code or at a faulty address anymore. Therefore, we
> can reuse this feature to improve the rt signal management, without
> having to forge yet-another signal stack frame for this.
> 
> The code introduced only fixes the watchdog related issue, but also does
> some groundwork for enhancing the rt signal support later. The
> implementation details can be found here:
> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
> 
> The current mayday support is only available for powerpc and x86 for
> now, more will come in the next days. To have it enabled, you have to
> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
> new interface available from those latest patches.
> 
> The current implementation does not break the 2.5.x ABI on purpose, so
> we could merge it into the stable branch.
> 
> We definitely need user feedback on this. Typically, does arming the
> nucleus watchdog with that patch support in, properly recovers from your
> favorite "get me out of here" situation? TIA,
> 
> You can pull this stuff from
> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> 

I've retested the feature as it's now in master, and it has one
remaining problem: If you run the cpu hog under gdb control and try to
break out of the while(1) loop, this doesn't work before the watchdog
expired - of course. But if you send the break before the expiry (or hit
a breakpoint), something goes wrong. The Xenomai task continues to spin,
and there is no chance to kill its process (only gdb).

# cat /proc/xenomai/sched
CPU  PIDCLASS  PRI  TIMEOUT   TIMEBASE   STAT   NAME
  0  0  idle-1  - master RR ROOT/0
  1  0  idle-1  - master R  ROOT/1
  0  6120   rt  99  - master Tt cpu-hog
# cat /proc/xenomai/stat
CPU  PIDMSWCSWPFSTAT   %CPU  NAME
  0  0  0  0  0 005000880.0  ROOT/0
  1  0  0  0  0 00500080   99.7  ROOT/1
  0  6120   0  1  0 00342180  100.0  cpu-hog
  0  0  0  21005  0 0.0  IRQ3340: [timer]
  1  0  0  35887  0 0.3  IRQ3340: [timer]

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support (was: Re: [RFC] Break out of endless user space loops)

2010-06-24 Thread Philippe Gerum
On Thu, 2010-06-24 at 11:22 +0200, Tschaeche IT-Services wrote:
> On Sat, Jun 19, 2010 at 01:11:17AM +0200, Philippe Gerum wrote:
> > On Wed, 2010-06-09 at 20:11 +0200, Tschaeche IT-Services wrote:
> > > On Wed, Jun 09, 2010 at 12:41:23PM +0200, Philippe Gerum wrote:
> > > > We definitely need user feedback on this. Typically, does arming the
> > > > nucleus watchdog with that patch support in, properly recovers from your
> > > > favorite "get me out of here" situation? TIA,
> > > > 
> > > > You can pull this stuff from
> > > > git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> > > 
> > > manually build a kernel (timeout 1s) with your patches.
> > > user space linked to 2.5.3 libraries without any patches.
> > > Looks fine: the amok task is switched to secondary domain
> > > (we catched the SIGXCPU) running the loop in secondary domain.
> > > then, on a SIGTRAP the task leaves the loop.
> > > 
> > > also, if SIGTRAP arives before SIGXCPU it looks good,
> > > apart from the latency of 1s.
> > > 
> > > did not check the ucontext within the exception handler, yet.
> > > would like to setup a reproducible kernel build first...
> > > we will go into deeper testing in 2 weeks.
> > > 
> > > maybe we need a finer granularity than 1s for the watchdog timeout.
> > > is there a chance?
> > 
> > The watchdog is not meant to be used for implementing application-level
> > health monitors, which is what you seem to be looking after. The
> > watchdog is really about pulling the break while debugging, as a mean
> > not to brick your board when things start to hit the crapper, without
> > knowing anything from the error source. For that purpose, the current 1s
> > granularity is just fine. It makes the nucleus watchdog as tactful as a
> > lumberjack, which is what we want in those circumstances: we want it to
> > point the finger at the problem we did not know about yet and keep the
> > board afloat; it is neither meant to monitor a specific code we know in
> > advance that might misbehave, nor provide any kind of smart contingency
> > plan.
> > 
> > I would rather think that you may need something like a RTDM driver
> > actually implementing smarter health monitoring features that you could
> > use along with your app. That driver would expose a normalized socket
> > interface for observing how things go app-wise, by collecting data about
> > the current health status. It would have to tap into the mayday routines
> > for recovering from runaway situations it may detect via its own,
> > fine-grained watchdog service for instance.
> 
> Perfect, that's exactly what we want (and already have implemented).
> How can i tap into the MayDay routines from my driver?
> Is there a rt_mayday(RT_TASK)?

You will need this patch (totally untested, but it has a good chance to
work given the implementation of the mayday support underneath):
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=2205a8f2a7aa8fdc7b7d7f5a96f8064a771382ec

Should be used like this:

void foo(RT_TASK *task)
{
xnshadow_call_mayday(&task->thread_base, SIGDEBUG_WATCHDOG);
}

We are obviously bypassing all the layers happily, this should be used
only in contexts where 'thread' is guaranteed ok, but this should work
until 2.6 provides a better support that won't expose the innards this
way.

NOTE: that method above is of course absolutely discouraged. Make sure
it is not disclosed out of the Internet.

HTH,

> 
> Cheers,
> 
>   Olli
> 


-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support

2010-06-24 Thread Jan Kiszka
Tschaeche IT-Services wrote:
> On Sat, Jun 19, 2010 at 01:11:17AM +0200, Philippe Gerum wrote:
>> On Wed, 2010-06-09 at 20:11 +0200, Tschaeche IT-Services wrote:
>>> On Wed, Jun 09, 2010 at 12:41:23PM +0200, Philippe Gerum wrote:
 We definitely need user feedback on this. Typically, does arming the
 nucleus watchdog with that patch support in, properly recovers from your
 favorite "get me out of here" situation? TIA,

 You can pull this stuff from
 git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
>>> manually build a kernel (timeout 1s) with your patches.
>>> user space linked to 2.5.3 libraries without any patches.
>>> Looks fine: the amok task is switched to secondary domain
>>> (we catched the SIGXCPU) running the loop in secondary domain.
>>> then, on a SIGTRAP the task leaves the loop.
>>>
>>> also, if SIGTRAP arives before SIGXCPU it looks good,
>>> apart from the latency of 1s.
>>>
>>> did not check the ucontext within the exception handler, yet.
>>> would like to setup a reproducible kernel build first...
>>> we will go into deeper testing in 2 weeks.
>>>
>>> maybe we need a finer granularity than 1s for the watchdog timeout.
>>> is there a chance?
>> The watchdog is not meant to be used for implementing application-level
>> health monitors, which is what you seem to be looking after. The
>> watchdog is really about pulling the break while debugging, as a mean
>> not to brick your board when things start to hit the crapper, without
>> knowing anything from the error source. For that purpose, the current 1s
>> granularity is just fine. It makes the nucleus watchdog as tactful as a
>> lumberjack, which is what we want in those circumstances: we want it to
>> point the finger at the problem we did not know about yet and keep the
>> board afloat; it is neither meant to monitor a specific code we know in
>> advance that might misbehave, nor provide any kind of smart contingency
>> plan.
>>
>> I would rather think that you may need something like a RTDM driver
>> actually implementing smarter health monitoring features that you could
>> use along with your app. That driver would expose a normalized socket
>> interface for observing how things go app-wise, by collecting data about
>> the current health status. It would have to tap into the mayday routines
>> for recovering from runaway situations it may detect via its own,
>> fine-grained watchdog service for instance.
> 
> Perfect, that's exactly what we want (and already have implemented).
> How can i tap into the MayDay routines from my driver?
> Is there a rt_mayday(RT_TASK)?

I think you will simply have to call the nucleus services directly,
which indicates that there is something wrong with it conceptually.

An RTDM driver is just another workaround. A better solution will once
come with RT-signals: A user space(!) high-prio watchdog thread will be
able to send a signal to the spinning thread, and the signal handler can
then report the error and/or kick the thread out of primary mode.

Alternatively, the nucleus could export a user space interface to send
SIGDEBUG from an RT thread to some other thread. That would allow to
push the watchdog policy into user space, freeing the kernel (or some
workaround driver) from any customization burdens.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Mayday support (was: Re: [RFC] Break out of endless user space loops)

2010-06-18 Thread Philippe Gerum
On Wed, 2010-06-09 at 20:11 +0200, Tschaeche IT-Services wrote:
> On Wed, Jun 09, 2010 at 12:41:23PM +0200, Philippe Gerum wrote:
> > We definitely need user feedback on this. Typically, does arming the
> > nucleus watchdog with that patch support in, properly recovers from your
> > favorite "get me out of here" situation? TIA,
> > 
> > You can pull this stuff from
> > git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
> 
> manually build a kernel (timeout 1s) with your patches.
> user space linked to 2.5.3 libraries without any patches.
> Looks fine: the amok task is switched to secondary domain
> (we catched the SIGXCPU) running the loop in secondary domain.
> then, on a SIGTRAP the task leaves the loop.
> 
> also, if SIGTRAP arives before SIGXCPU it looks good,
> apart from the latency of 1s.
> 
> did not check the ucontext within the exception handler, yet.
> would like to setup a reproducible kernel build first...
> we will go into deeper testing in 2 weeks.
> 
> maybe we need a finer granularity than 1s for the watchdog timeout.
> is there a chance?

The watchdog is not meant to be used for implementing application-level
health monitors, which is what you seem to be looking after. The
watchdog is really about pulling the break while debugging, as a mean
not to brick your board when things start to hit the crapper, without
knowing anything from the error source. For that purpose, the current 1s
granularity is just fine. It makes the nucleus watchdog as tactful as a
lumberjack, which is what we want in those circumstances: we want it to
point the finger at the problem we did not know about yet and keep the
board afloat; it is neither meant to monitor a specific code we know in
advance that might misbehave, nor provide any kind of smart contingency
plan.

I would rather think that you may need something like a RTDM driver
actually implementing smarter health monitoring features that you could
use along with your app. That driver would expose a normalized socket
interface for observing how things go app-wise, by collecting data about
the current health status. It would have to tap into the mayday routines
for recovering from runaway situations it may detect via its own,
fine-grained watchdog service for instance.

ATM, you can still hack the nucleus watchdog threshold by changing the
periodic setup for its timer in xnpod_enable_timesource(). This said,
increasing the frequency too much would also induce much more overhead,
so YMMV.

> 
> will your patches be merged in an official 2.5.x version?
> 

2.5.4.

> thanks for your great support,
> 
>   Olli


-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core