Re: [Xenomai-core] Watchdog / immediate Linux signal delivery

Philippe Gerum Mon, 09 Mar 2009 10:09:46 -0700

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Hi,
>>>
>>> the watchdog is currently broken in trunk ("zombie [...] would not
>>> die..."). In fact, it should also be broken in older versions, but only
>>> recent thread termination rework made this visible.
>>>
>>> When a Xenomai CPU hog is caught by the watchdog, xnpod_delete_thread is
>>> invoked, causing the current thread to be set in zombie state and
>>> scheduled out. But as its Linux mate still exist, hell breaks loose once
>>> Linux tries to get rid of it (the Xenomai zombie is scheduled in again).
>>> In short: calling xnpod_delete_thread(<self>) for a shadow thread is not
>>> working, probably never worked cleanly.
>> Nak, it is a regression introduced by the scheduler changes in 2.5.x. We 
>> should 
>> detect _any_ shadow thread that schedules out in primary mode then regains 
>> control in secondary mode like we do in the 2.4.x series, not only 
>> _relaxing_ 
>> shadow threads. It is perfectly valid to have the Linux task orphaned from 
>> the 
>> deletion of its shadow TCB until Xenomai notices the issue and reaps it; 
>> problem 
>> was that such regression prevented the nucleus to get the memo.
>>
>> The following patch should fix the issue:
>>
>>   Index: include/asm-generic/system.h
>> ===================================================================
>> --- include/asm-generic/system.h     (revision 4676)
>> +++ include/asm-generic/system.h     (working copy)
>> @@ -311,6 +311,11 @@
>>      return !!s;
>>   }
>>
>> +static inline int xnarch_root_domain_p(void)
>> +{
>> +    return rthal_current_domain == rthal_root_domain;
>> +}
>> +
>>   #ifdef CONFIG_SMP
>>
>>   #define xnlock_get(lock)           __xnlock_get(lock  XNLOCK_DBG_CONTEXT)
>> Index: ksrc/nucleus/pod.c
>> ===================================================================
>> --- ksrc/nucleus/pod.c       (revision 4676)
>> +++ ksrc/nucleus/pod.c       (working copy)
>> @@ -2137,7 +2137,7 @@
>>   void __xnpod_schedule(struct xnsched *sched)
>>   {
>>      struct xnthread *prev, *next, *curr = sched->curr;
>> -    int zombie, switched = 0, need_resched, relaxing;
>> +    int zombie, switched = 0, need_resched, shadow;
>>      spl_t s;
>>
>>      if (xnarch_escalate())
>> @@ -2174,9 +2174,9 @@
>>                 next, xnthread_name(next));
>>
>>   #ifdef CONFIG_XENO_OPT_PERVASIVE
>> -    relaxing = xnthread_test_state(prev, XNRELAX);
>> +    shadow = xnthread_test_state(prev, XNSHADOW);
>>   #else
>> -    (void)relaxing;
>> +    (void)shadow;
>>   #endif /* CONFIG_XENO_OPT_PERVASIVE */
>>
>>      if (xnthread_test_state(next, XNROOT)) {
>> @@ -2204,12 +2204,18 @@
>>
>>   #ifdef CONFIG_XENO_OPT_PERVASIVE
>>      /*
>> -     * Test whether we are relaxing a thread. In such a case, we
>> -     * are here the epilogue of Linux' schedule, and should skip
>> -     * xnpod_schedule epilogue.
>> +     * Test whether we transitioned from primary mode to secondary
>> +     * over a shadow thread. This may happen in two cases:
>> +     *
>> +     * 1) the shadow thread just relaxed.
>> +     * 2) the shadow TCB has just been deleted, in which case
>> +     * we have to reap the mated Linux side as well.
>> +     *
>> +     * In both cases, we are running over the epilogue of Linux's
>> +     * schedule, and should skip our epilogue code.
>>       */
>> -    if (relaxing)
>> -            goto relax_epilogue;
>> +    if (shadow && xnarch_root_domain_p())
>> +            goto shadow_epilogue;
>>   #endif /* CONFIG_XENO_OPT_PERVASIVE */
>>
>>      switched = 1;
>> @@ -2252,7 +2258,7 @@
>>      return;
>>
>>   #ifdef CONFIG_XENO_OPT_PERVASIVE
>> -      relax_epilogue:
>> +      shadow_epilogue:
>>      {
>>              spl_t ignored;
> 
> Finally makes sense and works (but your posting was corrupted). Great.
> 
>>> There are basically two approaches to fix it: The first one is to find a
>>> different way to kill (or only suspend?)
>> Suspending the hog won't work, particularly when GDB is involved, because a 
>> pending non-lethal Linux signal may cause the suspended shadow to resume 
>> immediately for processing the signal, therefore defeating the purpose of 
>> the 
>> watchdog, leading to an infinite loop. This is why we moved from suspension 
>> to 
>> deletion upon watchdog trigger in 2.3 (2.2 used to suspend only).
> 
> Yes, that became clear to me in the meantime, too.
> 
>>   the current shadow thread when
>>> the watchdog strikes. The second one brought me to another issue: Raise
>>> SIGKILL for the current thread and make sure that it can be processed by
>>> Linux (e.g. via xnpod_suspend_thread(<cpu-hog>). Unfortunately, there is
>>> no way to force a shadow thread into secondary mode to handle pending
>>> Linux signals unless that thread issues a syscall once in a while. And
>>> that raises the question if we shouldn't improve this as well while we
>>> are on it.
>>>
>>> Granted, non-broken Xenomai user space threads always issue frequent
>>> syscalls, otherwise the system would starve (and the watchdog would come
>>> around). On the other hand, delaying signals till syscall prologues is
>>> different from plain Linux behaviour...
>>>
>>> Comments, ideas?
>>>
>> We probably need a two-stage approach: first record the thread was bumped 
>> out 
>> and suspend it from the watchdog handler to give Linux a chance to run 
>> again, 
>> then finish the work, killing it for good, next time the root thread is 
>> scheduled in on the same CPU.
> 
> That confuses me again: The watchdog issue is solved now, no? We are
> only left with the scenario of breaking out of a user space loop of some
> Xenomai thread via a Linux signal (which implies SMP - otherwise there
> is no chance to raise the signal...).
>


If you first suspend the hog, then send it a lethal signal, you solve both 
issues: first Linux is allowed to run eventually, then your task won't be able 
to resume running the faulty code, but solely to process SIGKILL, which can be 
made pending early enough because the nucleus decides when Linux resumes.

> Meanwhile I played with some light-weight approach to relax a thread
> that received a signal (according to do_sigwake_event). Worked, but only
> once due to a limitation (if not bug) of I-pipe x86: in __ipipe_run_isr,
> it does not handle the case that a non-root handler may alter the
> current domain, causing corruptions to the IPIPE_SYNC_FLAG states of the
> involved domains.

It is not a bug, this is wanted. ISR must neither change the current domain nor 
migrate CPU; allowing this would open Pandora's box.

  Will try to fix this and post my signaling proposal so
> that this work is not lost.
> 
> Jan
> 


-- 
Philippe.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Watchdog / immediate Linux signal delivery

Reply via email to