[Xenomai-core] [PATCH] fix regression around one-shot host tick

2007-07-19 Thread Jan Kiszka
Philippe,

this bug was introduced with recent clock_event modifications:

--- ksrc/nucleus/timer.c(Revision 2766)
+++ ksrc/nucleus/timer.c(Arbeitskopie)
@@ -245,7 +245,7 @@ void xntimer_tick_aperiodic(void)
   translates into precious microsecs on low-end hw. */
__setbits(sched-status, XNHTICK);
if (!testbits(timer-status, XNTIMER_PERIODIC))
-   goto out;
+   continue;
}
 
do {
@@ -254,7 +254,6 @@ void xntimer_tick_aperiodic(void)
xntimer_enqueue_aperiodic(timer);
}
 
-out:
__clrbits(sched-status, XNINTCK);
 
xntimer_next_local_shot(sched);


It doesn't look like typo, so what was your original intention? The current 
code at least fails to handle outstanding timers that are enqueued right behind 
a one-shot host-tick timer.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] fix regression around one-shot host tick

2007-07-19 Thread Philippe Gerum
On Thu, 2007-07-19 at 09:22 +0200, Jan Kiszka wrote:
 Philippe,
 
 this bug was introduced with recent clock_event modifications:
 
 --- ksrc/nucleus/timer.c  (Revision 2766)
 +++ ksrc/nucleus/timer.c  (Arbeitskopie)
 @@ -245,7 +245,7 @@ void xntimer_tick_aperiodic(void)
  translates into precious microsecs on low-end hw. */
   __setbits(sched-status, XNHTICK);
   if (!testbits(timer-status, XNTIMER_PERIODIC))
 - goto out;
 + continue;
   }
  
   do {
 @@ -254,7 +254,6 @@ void xntimer_tick_aperiodic(void)
   xntimer_enqueue_aperiodic(timer);
   }
  
 -out:
   __clrbits(sched-status, XNINTCK);
  
   xntimer_next_local_shot(sched);
 
 
 It doesn't look like typo, so what was your original intention?

The host timer is no more a purely periodic beast. Since it may be
aperiodic, we ought to get away from the interval update loop, otherwise
we'd remain stuck into it in the aperiodic case.

 The current code at least fails to handle outstanding timers that are 
 enqueued right behind a one-shot host-tick timer.
 

True, I've been slightly, mmm, radical here. Will merge.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Philippe Gerum
On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  And when looking at the holders of rpilock, I think one issue could be
  that we hold that lock while calling into xnpod_renice_root [1], ie.
  doing a potential context switch. Was this checked to be save?
  
  xnpod_renice_root() does no reschedule immediately on purpose, we would
  never have been able to run any SMP config more than a couple of seconds
  otherwise. (See the NOSWITCH bit).
 
 OK, then it's not the cause.
 
  
  Furthermore, that code path reveals that we take nklock nested into
  rpilock [2]. I haven't found a spot for the other way around (and I hope
  there is none)
  
  xnshadow_start().
 
 Nope, that one is not holding nklock.

Indeed, but this only works because its callers who may hold this lock
do not activate shadow threads so far. This looks so fragile... I'll add
some comment about this in the doc.

 But I found an offender...
 
  
  , but such nesting is already evil per se...
  
  Well, nesting spinlocks only falls into evilness when you get a circular
  graph, but since the rpilock is a rookie in the locking team, I'm going
  to check this.
 
 Take this one: gatekeeper_thread calls into rpi_pop with nklock
 acquired. So we have a classic ABAB locking bug. Bang!
 

Damnit.

The fix needs some thought and attention, we are running against the
deletion path here.

PS: Time to switch to -core.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Philippe Gerum
On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  And when looking at the holders of rpilock, I think one issue could be
  that we hold that lock while calling into xnpod_renice_root [1], ie.
  doing a potential context switch. Was this checked to be save?
  
  xnpod_renice_root() does no reschedule immediately on purpose, we would
  never have been able to run any SMP config more than a couple of seconds
  otherwise. (See the NOSWITCH bit).
 
 OK, then it's not the cause.
 
  
  Furthermore, that code path reveals that we take nklock nested into
  rpilock [2]. I haven't found a spot for the other way around (and I hope
  there is none)
  
  xnshadow_start().
 
 Nope, that one is not holding nklock. But I found an offender...

Gasp. xnshadow_renice() kills us too.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Jan Kiszka
Philippe Gerum wrote:
 On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 And when looking at the holders of rpilock, I think one issue could be
 that we hold that lock while calling into xnpod_renice_root [1], ie.
 doing a potential context switch. Was this checked to be save?
 xnpod_renice_root() does no reschedule immediately on purpose, we would
 never have been able to run any SMP config more than a couple of seconds
 otherwise. (See the NOSWITCH bit).
 OK, then it's not the cause.

 Furthermore, that code path reveals that we take nklock nested into
 rpilock [2]. I haven't found a spot for the other way around (and I hope
 there is none)
 xnshadow_start().
 Nope, that one is not holding nklock. But I found an offender...
 
 Gasp. xnshadow_renice() kills us too.

Looks like we are approaching mainline qualities here - but they have
at least lockdep (and still face nasty races regularly).

As long as you can't avoid nesting or the inner lock only protects
really, really trivial code (list manipulation etc.), I would say there
is one lock too much... Did I mention that I consider nesting to be
evil? :- Besides correctness, there is also an increasing worst-case
behaviour issue with each additional nesting level.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Philippe Gerum
On Thu, 2007-07-19 at 17:35 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
  And when looking at the holders of rpilock, I think one issue could be
  that we hold that lock while calling into xnpod_renice_root [1], ie.
  doing a potential context switch. Was this checked to be save?
  xnpod_renice_root() does no reschedule immediately on purpose, we would
  never have been able to run any SMP config more than a couple of seconds
  otherwise. (See the NOSWITCH bit).
  OK, then it's not the cause.
 
  Furthermore, that code path reveals that we take nklock nested into
  rpilock [2]. I haven't found a spot for the other way around (and I hope
  there is none)
  xnshadow_start().
  Nope, that one is not holding nklock. But I found an offender...
  
  Gasp. xnshadow_renice() kills us too.
 
 Looks like we are approaching mainline qualities here - but they have
 at least lockdep (and still face nasty races regularly).
 

We only have a 2-level locking depth at most, thare barely qualifies for
being compared to the situation with mainline. Most often, the more
radical the solution, the less relevant it is: simple nesting on very
few levels is not bad, bugous nesting sequence is.

 As long as you can't avoid nesting or the inner lock only protects
 really, really trivial code (list manipulation etc.), I would say there
 is one lock too much... Did I mention that I consider nesting to be
 evil? :- Besides correctness, there is also an increasing worst-case
 behaviour issue with each additional nesting level.
 

In this case, we do not want the RPI manipulation to affect the
worst-case of all other threads by holding the nklock. This is
fundamentally a migration-related issue, which is a situation that must
not impact all other contexts relying on the nklock. Given this, you
need to protect the RPI list and prevent the scheduler data to be
altered at the same time, there is no cheap trick to avoid this.

We need to keep the rpilock, otherwise we would have significantly large
latency penalties, especially when domain migration are frequent, and
yes, we do need RPI, otherwise the sequence for emulated RTOS services
would be plain wrong (e.g. task creation).

Ok, the rpilock is local, the nesting level is bearable, let's focus on
putting this thingy straight.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Jan Kiszka
Philippe Gerum wrote:
 On Thu, 2007-07-19 at 17:35 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 And when looking at the holders of rpilock, I think one issue could be
 that we hold that lock while calling into xnpod_renice_root [1], ie.
 doing a potential context switch. Was this checked to be save?
 xnpod_renice_root() does no reschedule immediately on purpose, we would
 never have been able to run any SMP config more than a couple of seconds
 otherwise. (See the NOSWITCH bit).
 OK, then it's not the cause.

 Furthermore, that code path reveals that we take nklock nested into
 rpilock [2]. I haven't found a spot for the other way around (and I hope
 there is none)
 xnshadow_start().
 Nope, that one is not holding nklock. But I found an offender...
 Gasp. xnshadow_renice() kills us too.
 Looks like we are approaching mainline qualities here - but they have
 at least lockdep (and still face nasty races regularly).

 
 We only have a 2-level locking depth at most, thare barely qualifies for
 being compared to the situation with mainline. Most often, the more
 radical the solution, the less relevant it is: simple nesting on very
 few levels is not bad, bugous nesting sequence is.
 
 As long as you can't avoid nesting or the inner lock only protects
 really, really trivial code (list manipulation etc.), I would say there
 is one lock too much... Did I mention that I consider nesting to be
 evil? :- Besides correctness, there is also an increasing worst-case
 behaviour issue with each additional nesting level.

 
 In this case, we do not want the RPI manipulation to affect the
 worst-case of all other threads by holding the nklock. This is
 fundamentally a migration-related issue, which is a situation that must
 not impact all other contexts relying on the nklock. Given this, you
 need to protect the RPI list and prevent the scheduler data to be
 altered at the same time, there is no cheap trick to avoid this.
 
 We need to keep the rpilock, otherwise we would have significantly large
 latency penalties, especially when domain migration are frequent, and
 yes, we do need RPI, otherwise the sequence for emulated RTOS services
 would be plain wrong (e.g. task creation).

If rpilock is known to protect potentially costly code, you _must not_
hold other locks while taking it. Otherwise, you do not win a dime by
using two locks, rather make things worse (overhead of taking two locks
instead of just one). That all relates to the worst case, of course, the
one thing we are worried about most.

In that light, the nesting nklock-rpilock must go away, independently
of the ordering bug. The other way around might be a different thing,
though I'm not sure if there is actually so much difference between the
locks in the worst case.

What is the actual _combined_ lock holding time in the longest
nklock/rpilock nesting path? Is that one really larger than any other
pre-existing nklock path? Only in that case, it makes sense to think
about splitting, though you will still be left with precisely the same
(rather a few cycles more) CPU-local latency. Is there really no chance
to split the lock paths?

 Ok, the rpilock is local, the nesting level is bearable, let's focus on
 putting this thingy straight.

The whole RPI thing, though required for some scenarios, remains ugly
and error-prone (including worst-case latency issues). I can only
underline my recommendation to switch off complexity in Xenomai when one
doesn't need it - which often includes RPI. Sorry, Philippe, but I think
we have to be honest to the users here. RPI remains problematic, at
least /wrt your beloved latency.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Jan Kiszka
Philippe Gerum wrote:
 Ok, the rpilock is local, the nesting level is bearable, let's focus on
 putting this thingy straight.

Well, redesigning things may not necessarily improve the situation, but
reducing the amount of special RPI code might be worth a thought:

What is so special about RPI compared to standard prio inheritance? What
about [wild idea ahead!] modelling RPI as a virtual mutex that is
permanently held by the ROOT thread and which relaxed threads try to
acquire? They would never get it, rather drop the request (and thus the
inheritance) once they are to be hardened again or Linux starts to
schedule around.

*If* that is possible, we would
 A) reuse existing code heavily,
 B) lack any argument for separate locking,
 C) make things far easier to understand and review.

Sounds too beautiful to work, I'm afraid...

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Philippe Gerum
On Thu, 2007-07-19 at 19:18 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  On Thu, 2007-07-19 at 17:35 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
  On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
  And when looking at the holders of rpilock, I think one issue could be
  that we hold that lock while calling into xnpod_renice_root [1], ie.
  doing a potential context switch. Was this checked to be save?
  xnpod_renice_root() does no reschedule immediately on purpose, we would
  never have been able to run any SMP config more than a couple of seconds
  otherwise. (See the NOSWITCH bit).
  OK, then it's not the cause.
 
  Furthermore, that code path reveals that we take nklock nested into
  rpilock [2]. I haven't found a spot for the other way around (and I 
  hope
  there is none)
  xnshadow_start().
  Nope, that one is not holding nklock. But I found an offender...
  Gasp. xnshadow_renice() kills us too.
  Looks like we are approaching mainline qualities here - but they have
  at least lockdep (and still face nasty races regularly).
 
  
  We only have a 2-level locking depth at most, thare barely qualifies for
  being compared to the situation with mainline. Most often, the more
  radical the solution, the less relevant it is: simple nesting on very
  few levels is not bad, bugous nesting sequence is.
  
  As long as you can't avoid nesting or the inner lock only protects
  really, really trivial code (list manipulation etc.), I would say there
  is one lock too much... Did I mention that I consider nesting to be
  evil? :- Besides correctness, there is also an increasing worst-case
  behaviour issue with each additional nesting level.
 
  
  In this case, we do not want the RPI manipulation to affect the
  worst-case of all other threads by holding the nklock. This is
  fundamentally a migration-related issue, which is a situation that must
  not impact all other contexts relying on the nklock. Given this, you
  need to protect the RPI list and prevent the scheduler data to be
  altered at the same time, there is no cheap trick to avoid this.
  
  We need to keep the rpilock, otherwise we would have significantly large
  latency penalties, especially when domain migration are frequent, and
  yes, we do need RPI, otherwise the sequence for emulated RTOS services
  would be plain wrong (e.g. task creation).
 
 If rpilock is known to protect potentially costly code, you _must not_
 hold other locks while taking it. Otherwise, you do not win a dime by
 using two locks, rather make things worse (overhead of taking two locks
 instead of just one).

I guess that by now you already understood that holding such outer lock
is what should not be done, and what should be fixed, right? So let's
focus on the real issue here: holding two locks is not the problem,
holding them in the wrong sequence, is.

  That all relates to the worst case, of course, the
 one thing we are worried about most.
 
 In that light, the nesting nklock-rpilock must go away, independently
 of the ordering bug. The other way around might be a different thing,
 though I'm not sure if there is actually so much difference between the
 locks in the worst case.
 
 What is the actual _combined_ lock holding time in the longest
 nklock/rpilock nesting path?

It is short.

  Is that one really larger than any other
 pre-existing nklock path?

Yes. Look, could you please assume one second that I did not choose this
implementation randomly? :o)

  Only in that case, it makes sense to think
 about splitting, though you will still be left with precisely the same
 (rather a few cycles more) CPU-local latency. Is there really no chance
 to split the lock paths?
 

The answer to your question is into the dynamics of migrating tasks
between domains, and how this relates to the overall dynamics of the
system. Migration needs priority tracking, priority tracking requires
almost the same amount of work than updating the scheduler data. Since
we can reduce the pressure on the nklock during migration which is a
thread-local action additionally involving the root thread, it is _good_
to do so. Even if this costs a few brain cycles more.

  Ok, the rpilock is local, the nesting level is bearable, let's focus on
  putting this thingy straight.
 
 The whole RPI thing, though required for some scenarios, remains ugly
 and error-prone (including worst-case latency issues).
  I can only
 underline my recommendation to switch off complexity in Xenomai when one
 doesn't need it - which often includes RPI.
  Sorry, Philippe, but I think
 we have to be honest to the users here. RPI remains problematic, at
 least /wrt your beloved latency.

The best way to be honest to users is to depict things as they are:

1) RPI is there because we currently rely on a co-kernel technology, and
we have to make our best to fix the consequences of having two
schedulers by at least coupling their priority scheme when applicable.
Otherwise, you just _cannot_ 

Re: [Xenomai-core] [Xenomai-help] Sporadic PC freeze after rt_task_start

2007-07-19 Thread Jan Kiszka
Philippe Gerum wrote:
 On Thu, 2007-07-19 at 19:18 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 On Thu, 2007-07-19 at 17:35 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 On Thu, 2007-07-19 at 14:40 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 And when looking at the holders of rpilock, I think one issue could be
 that we hold that lock while calling into xnpod_renice_root [1], ie.
 doing a potential context switch. Was this checked to be save?
 xnpod_renice_root() does no reschedule immediately on purpose, we would
 never have been able to run any SMP config more than a couple of seconds
 otherwise. (See the NOSWITCH bit).
 OK, then it's not the cause.

 Furthermore, that code path reveals that we take nklock nested into
 rpilock [2]. I haven't found a spot for the other way around (and I 
 hope
 there is none)
 xnshadow_start().
 Nope, that one is not holding nklock. But I found an offender...
 Gasp. xnshadow_renice() kills us too.
 Looks like we are approaching mainline qualities here - but they have
 at least lockdep (and still face nasty races regularly).

 We only have a 2-level locking depth at most, thare barely qualifies for
 being compared to the situation with mainline. Most often, the more
 radical the solution, the less relevant it is: simple nesting on very
 few levels is not bad, bugous nesting sequence is.

 As long as you can't avoid nesting or the inner lock only protects
 really, really trivial code (list manipulation etc.), I would say there
 is one lock too much... Did I mention that I consider nesting to be
 evil? :- Besides correctness, there is also an increasing worst-case
 behaviour issue with each additional nesting level.

 In this case, we do not want the RPI manipulation to affect the
 worst-case of all other threads by holding the nklock. This is
 fundamentally a migration-related issue, which is a situation that must
 not impact all other contexts relying on the nklock. Given this, you
 need to protect the RPI list and prevent the scheduler data to be
 altered at the same time, there is no cheap trick to avoid this.

 We need to keep the rpilock, otherwise we would have significantly large
 latency penalties, especially when domain migration are frequent, and
 yes, we do need RPI, otherwise the sequence for emulated RTOS services
 would be plain wrong (e.g. task creation).
 If rpilock is known to protect potentially costly code, you _must not_
 hold other locks while taking it. Otherwise, you do not win a dime by
 using two locks, rather make things worse (overhead of taking two locks
 instead of just one).
 
 I guess that by now you already understood that holding such outer lock
 is what should not be done, and what should be fixed, right? So let's
 focus on the real issue here: holding two locks is not the problem,
 holding them in the wrong sequence, is.

Holding two locks in the right order can still be wrong /wrt to latency
as I pointed out. If you can avoid holding both here, I would be much
happier immediately.

 
  That all relates to the worst case, of course, the
 one thing we are worried about most.

 In that light, the nesting nklock-rpilock must go away, independently
 of the ordering bug. The other way around might be a different thing,
 though I'm not sure if there is actually so much difference between the
 locks in the worst case.

 What is the actual _combined_ lock holding time in the longest
 nklock/rpilock nesting path?
 
 It is short.
 
  Is that one really larger than any other
 pre-existing nklock path?
 
 Yes. Look, could you please assume one second that I did not choose this
 implementation randomly? :o)

For sure not randomly, but I still don't understand the motivations
completely.

 
  Only in that case, it makes sense to think
 about splitting, though you will still be left with precisely the same
 (rather a few cycles more) CPU-local latency. Is there really no chance
 to split the lock paths?

 
 The answer to your question is into the dynamics of migrating tasks
 between domains, and how this relates to the overall dynamics of the
 system. Migration needs priority tracking, priority tracking requires
 almost the same amount of work than updating the scheduler data. Since
 we can reduce the pressure on the nklock during migration which is a
 thread-local action additionally involving the root thread, it is _good_
 to do so. Even if this costs a few brain cycles more.

So we are trading off average performance against worst-case spinning
time here?

 
 Ok, the rpilock is local, the nesting level is bearable, let's focus on
 putting this thingy straight.
 The whole RPI thing, though required for some scenarios, remains ugly
 and error-prone (including worst-case latency issues).
  I can only
 underline my recommendation to switch off complexity in Xenomai when one
 doesn't need it - which often includes RPI.
  Sorry, Philippe, but I think
 we have to be honest to the users here. RPI remains problematic, at
 least /wrt your beloved 

[Xenomai-core] RPI is good for you

2007-07-19 Thread Philippe Gerum

Mathias,

Could you try applying the attached patch against v2.3.2, and run your
box using the failing configuration. This patch is a _preliminary_
attempt at fixing two major issues, it is not complete, and may not even
be fully correct since it does not address all the pending issues yet.
Still, I would be interested to know whether I'm on the right path, and
if it changes something to your problem, without making your box jump
out of the window, that is.

TIA,

-- 
Philippe.

Index: ksrc/skins/psos+/task.c
===
--- ksrc/skins/psos+/task.c	(revision 2765)
+++ ksrc/skins/psos+/task.c	(working copy)
@@ -288,13 +288,6 @@
 		goto unlock_and_exit;
 	}
 
-#if defined(__KERNEL__)  defined(CONFIG_XENO_OPT_PERVASIVE)
-	if (xnthread_user_task(task-threadbase) != NULL
-	 !xnthread_test_state(task-threadbase,XNDORMANT)
-	 (!xnpod_primary_p() || task != psos_current_task()))
-		xnshadow_send_sig(task-threadbase, SIGKILL, 1);
-#endif /* __KERNEL__  CONFIG_XENO_OPT_PERVASIVE */
-
 	xnpod_delete_thread(task-threadbase);
 
   unlock_and_exit:
Index: ksrc/skins/vxworks/taskLib.c
===
--- ksrc/skins/vxworks/taskLib.c	(revision 2765)
+++ ksrc/skins/vxworks/taskLib.c	(working copy)
@@ -285,13 +285,6 @@
 		goto error;
 	}
 
-#if defined(__KERNEL__)  defined(CONFIG_XENO_OPT_PERVASIVE)
-	if (xnthread_user_task(task-threadbase) != NULL
-	 !xnthread_test_state(task-threadbase,XNDORMANT)
-	 (!xnpod_primary_p() || task != wind_current_task()))
-		xnshadow_send_sig(task-threadbase, SIGKILL, 1);
-#endif /* __KERNEL__  CONFIG_XENO_OPT_PERVASIVE */
-
 	xnpod_delete_thread(task-threadbase);
 	xnlock_put_irqrestore(nklock, s);
 
Index: ksrc/skins/native/task.c
===
--- ksrc/skins/native/task.c	(revision 2765)
+++ ksrc/skins/native/task.c	(working copy)
@@ -581,29 +581,6 @@
 	if (err)
 		goto unlock_and_exit;
 
-#if defined(__KERNEL__)  defined(CONFIG_XENO_OPT_PERVASIVE)
-	/* rt_task_delete() might be called for cleaning up a just
-	   created shadow task which has not been successfully mapped,
-	   so make sure we have an associated Linux mate before trying
-	   to send it a signal. This will also prevent any action on
-	   kernel-based Xenomai threads for which the user TCB
-	   extension is always NULL.
-	   We don't send any signal to dormant threads because GDB
-	   (6.x) has some problems dealing with vanishing threads
-	   under some circumstances, likely when asynchronous
-	   cancellation is in effect. In most cases, this is a
-	   non-issue since pthread_cancel() is requested from the skin
-	   interface library in parallel on the target thread, but
-	   when calling rt_task_delete() from kernel space against a
-	   created but unstarted user-space task, the Linux thread
-	   mated to the Xenomai shadow might linger unexpectedly on
-	   the startup barrier. */
-	if (xnthread_user_task(task-thread_base) != NULL
-	 !xnthread_test_state(task-thread_base,XNDORMANT)
-	 (!xnpod_primary_p() || task != xeno_current_task()))
-		xnshadow_send_sig(task-thread_base, SIGKILL, 1);
-#endif /* __KERNEL__  CONFIG_XENO_OPT_PERVASIVE */
-
 	/* Does not return if task is current. */
 	xnpod_delete_thread(task-thread_base);
 
Index: ksrc/nucleus/pod.c
===
--- ksrc/nucleus/pod.c	(revision 2765)
+++ ksrc/nucleus/pod.c	(working copy)
@@ -1245,10 +1245,35 @@
 	if (xnthread_test_state(thread, XNZOMBIE))
 		goto unlock_and_exit;	/* No double-deletion. */
 
+	sched = thread-sched;
+
+#if defined(__KERNEL__)  defined(CONFIG_XENO_OPT_PERVASIVE)
+	/* xnpod_delete_thread() might be called for cleaning up a
+	   just created shadow task which has not been successfully
+	   mapped, so make sure we have an associated Linux mate
+	   before trying to send it a signal. This will also prevent
+	   any action on kernel-based Xenomai threads for which the
+	   user TCB extension is always NULL.  We don't send any
+	   signal to dormant threads because GDB (6.x) has some
+	   problems dealing with vanishing threads under some
+	   circumstances, likely when asynchronous cancellation is in
+	   effect. In most cases, this is a non-issue since
+	   pthread_cancel() is requested from the skin interface
+	   library in parallel on the target thread, but when calling
+	   xnpod_delete_thread() from kernel space against a created
+	   but unstarted user-space task, the Linux thread mated to
+	   the Xenomai shadow might linger unexpectedly on the startup
+	   barrier. */
+	if (xnthread_user_task(thread) != NULL
+	 !xnthread_test_state(thread, XNDORMANT)
+	 (!xnpod_primary_p() || thread != sched-runthread)) {
+		xnshadow_send_sig(thread, SIGKILL, 1);
+		goto unlock_and_exit;
+	}
+#endif /* __KERNEL__  CONFIG_XENO_OPT_PERVASIVE */
+
 	xnltt_log_event(xeno_ev_thrdelete, thread-name);
 
-