Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Philippe Gerum wrote: >>>> Jan Kiszka wrote: >>>>> Hi, >>>>> >>>>> the watchdog is currently broken in trunk ("zombie [...] would not >>>>> die..."). In fact, it should also be broken in older versions, but only >>>>> recent thread termination rework made this visible. >>>>> >>>>> When a Xenomai CPU hog is caught by the watchdog, >>>>> xnpod_delete_thread is >>>>> invoked, causing the current thread to be set in zombie state and >>>>> scheduled out. But as its Linux mate still exist, hell breaks loose >>>>> once >>>>> Linux tries to get rid of it (the Xenomai zombie is scheduled in >>>>> again). >>>>> In short: calling xnpod_delete_thread(<self>) for a shadow thread is >>>>> not >>>>> working, probably never worked cleanly. >>>> Nak, it is a regression introduced by the scheduler changes in 2.5.x. >>>> We should detect _any_ shadow thread that schedules out in primary >>>> mode then regains control in secondary mode like we do in the 2.4.x >>>> series, not only _relaxing_ shadow threads. It is perfectly valid to >>>> have the Linux task orphaned from the deletion of its shadow TCB >>>> until Xenomai notices the issue and reaps it; problem was that such >>>> regression prevented the nucleus to get the memo. >>>> >>>> The following patch should fix the issue: >>>> >>>> Index: include/asm-generic/system.h >>>> =================================================================== >>>> --- include/asm-generic/system.h (revision 4676) >>>> +++ include/asm-generic/system.h (working copy) >>>> @@ -311,6 +311,11 @@ >>>> return !!s; >>>> } >>>> >>>> +static inline int xnarch_root_domain_p(void) >>>> +{ >>>> + return rthal_current_domain == rthal_root_domain; >>>> +} >>>> + >>>> #ifdef CONFIG_SMP >>>> >>>> #define xnlock_get(lock) __xnlock_get(lock XNLOCK_DBG_CONTEXT) >>>> Index: ksrc/nucleus/pod.c >>>> =================================================================== >>>> --- ksrc/nucleus/pod.c (revision 4676) >>>> +++ ksrc/nucleus/pod.c (working copy) >>>> @@ -2137,7 +2137,7 @@ >>>> void __xnpod_schedule(struct xnsched *sched) >>>> { >>>> struct xnthread *prev, *next, *curr = sched->curr; >>>> - int zombie, switched = 0, need_resched, relaxing; >>>> + int zombie, switched = 0, need_resched, shadow; >>>> spl_t s; >>>> >>>> if (xnarch_escalate()) >>>> @@ -2174,9 +2174,9 @@ >>>> next, xnthread_name(next)); >>>> >>>> #ifdef CONFIG_XENO_OPT_PERVASIVE >>>> - relaxing = xnthread_test_state(prev, XNRELAX); >>>> + shadow = xnthread_test_state(prev, XNSHADOW); >>>> #else >>>> - (void)relaxing; >>>> + (void)shadow; >>>> #endif /* CONFIG_XENO_OPT_PERVASIVE */ >>>> >>>> if (xnthread_test_state(next, XNROOT)) { >>>> @@ -2204,12 +2204,18 @@ >>>> >>>> #ifdef CONFIG_XENO_OPT_PERVASIVE >>>> /* >>>> - * Test whether we are relaxing a thread. In such a case, we >>>> - * are here the epilogue of Linux' schedule, and should skip >>>> - * xnpod_schedule epilogue. >>>> + * Test whether we transitioned from primary mode to secondary >>>> + * over a shadow thread. This may happen in two cases: >>>> + * >>>> + * 1) the shadow thread just relaxed. >>>> + * 2) the shadow TCB has just been deleted, in which case >>>> + * we have to reap the mated Linux side as well. >>>> + * >>>> + * In both cases, we are running over the epilogue of Linux's >>>> + * schedule, and should skip our epilogue code. >>>> */ >>>> - if (relaxing) >>>> - goto relax_epilogue; >>>> + if (shadow && xnarch_root_domain_p()) >>>> + goto shadow_epilogue; >>>> #endif /* CONFIG_XENO_OPT_PERVASIVE */ >>>> >>>> switched = 1; >>>> @@ -2252,7 +2258,7 @@ >>>> return; >>>> >>>> #ifdef CONFIG_XENO_OPT_PERVASIVE >>>> - relax_epilogue: >>>> + shadow_epilogue: >>>> { >>>> spl_t ignored; >>> Finally makes sense and works (but your posting was corrupted). Great. >>> >>>>> There are basically two approaches to fix it: The first one is to >>>>> find a >>>>> different way to kill (or only suspend?) >>>> Suspending the hog won't work, particularly when GDB is involved, >>>> because a pending non-lethal Linux signal may cause the suspended >>>> shadow to resume immediately for processing the signal, therefore >>>> defeating the purpose of the watchdog, leading to an infinite loop. >>>> This is why we moved from suspension to deletion upon watchdog >>>> trigger in 2.3 (2.2 used to suspend only). >>> Yes, that became clear to me in the meantime, too. >>> >>>> the current shadow thread when >>>>> the watchdog strikes. The second one brought me to another issue: Raise >>>>> SIGKILL for the current thread and make sure that it can be >>>>> processed by >>>>> Linux (e.g. via xnpod_suspend_thread(<cpu-hog>). Unfortunately, >>>>> there is >>>>> no way to force a shadow thread into secondary mode to handle pending >>>>> Linux signals unless that thread issues a syscall once in a while. And >>>>> that raises the question if we shouldn't improve this as well while we >>>>> are on it. >>>>> >>>>> Granted, non-broken Xenomai user space threads always issue frequent >>>>> syscalls, otherwise the system would starve (and the watchdog would >>>>> come >>>>> around). On the other hand, delaying signals till syscall prologues is >>>>> different from plain Linux behaviour... >>>>> >>>>> Comments, ideas? >>>>> >>>> We probably need a two-stage approach: first record the thread was >>>> bumped out and suspend it from the watchdog handler to give Linux a >>>> chance to run again, then finish the work, killing it for good, next >>>> time the root thread is scheduled in on the same CPU. >>> That confuses me again: The watchdog issue is solved now, no? We are >>> only left with the scenario of breaking out of a user space loop of some >>> Xenomai thread via a Linux signal (which implies SMP - otherwise there >>> is no chance to raise the signal...). >>> >> If you first suspend the hog, then send it a lethal signal, you solve >> both issues: first Linux is allowed to run eventually, then your task >> won't be able to resume running the faulty code, but solely to process >> SIGKILL, which can be made pending early enough because the nucleus >> decides when Linux resumes. > > I'm not interested in SIGKILL here, rather in SIGSTOP to do debugging. > That is currently impossible. > >>> Meanwhile I played with some light-weight approach to relax a thread >>> that received a signal (according to do_sigwake_event). Worked, but only >>> once due to a limitation (if not bug) of I-pipe x86: in __ipipe_run_isr, >>> it does not handle the case that a non-root handler may alter the >>> current domain, causing corruptions to the IPIPE_SYNC_FLAG states of the >>> involved domains. >> It is not a bug, this is wanted. ISR must neither change the current >> domain nor migrate CPU; allowing this would open Pandora's box. > > OK, then please elaborate on this a bit more in the adeos-main thread > and explain why __ipipe_sync_stage currently reloads the domain. >
ipipe_cpudom_ptr() may be affected by CPU migration within the _root_ domain, which does not mean that non-root domains are allowed to migrate and/or change domains. > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core