Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-24 Thread Philippe Gerum
On Thu, 2007-05-24 at 11:43 +0200, Jan Kiszka wrote:
  You don't want to run rpi_pop() after the bulk of xnpod_delete_thread()
 
 Not after, I meant before. But then...
 
  has run for the same thread, and you don't want a preemption to occur
  anytime between the thread deletion and the update of the RPI state that
  still refers to it. You might try moving the call to rpi_pop() to the
  prologue of both routines instead of waiting for the deletion hook to do
  it, but this would introduce a priority issue in the do_taskexit_event
  callout, since we might have lowered the root priority in rpi_pop() and
  be switched out in the middle of the deletion process. Unless we hold
  the nklock all time long, that is. Back to square #1, I'm afraid.
 
 ...there might be other issues, OK.
 
 So we might need some reference counter for the xnthread object so that
 the last user actually frees it and some flags to check what jobs remain
 to be done for cleanup.
 
  latency hit is certainly not seen there; the real issues are left in the
  native and POSIX skin thread deletion routines (heap management and
  other housekeeping calls under nklock and so on).
  That's surely true. Besides RPI manipulation, there is really no
  problematic code left here latency-wise. Other hooks are trickier. But
  my point is that we first need some infrastructure to provide an
  alternative cleanup context before we can start moving things. My patch
  today is a hack...err...workaround from this perspective.
 
  I don't see it this way. This code is very specific, in the sense it
  lays on the inter-domain boundary between Linux and Xenomai's primary
  mode for a critical operation like thread deletion, this is what make
  things a bit confusing. What's below and above this layer has to be as
  seamless as possible, this code in particular can't.
  Well, we could move the memory release of the shadow thread in Linux
  context e.g. (release on idle, a bit like RCU - uuuh).
  
  This is basically what Gilles did in a recent patch to fix some treading
  on freed memory, by releasing the shadow TCB through
  xnheap_schedule_free(). I think we should have a look at the removal
  from registry operations now.
 
 Look, we now have xnheap_schedule_free,

Actually, we had this for the last two years now, it just happened to be
underused.

  next we a need kind of
 xnregistery_schedule_free
  - that's my point, provide a generic platform
 for all these things. More may follow (for new kind of skins eg.).
 

Yes, what we need is a common policy for object creation and deletion,
that properly uses the APC mechanism. The logic behind this being that
any object creation which involves secondary domain work to set them up
- such as threads, heaps, queues, pipes - will invariably require the
deletion process to follow the converse path, i.e. rely on Linux kernel
services.

So far, this has been achieved by setting the lostage bit in the syscall
exec mode, but this has two drawbacks: 1) kernel-based threads take no
advantage of this since they don't go through the syscall mechanism, 2)
when such routine wants to switch to primary mode in order to finish the
housekeeping work, it has to perform this transition manually, by a
call to the hardening service, which is opening Pandora's box if badly
used. From the interface design POV, I'd really prefer that
xnshadow_relax/harden remain internal calls only used by the nucleus. I
guess we both agree on that. But, to sum up, let's drop the
micro-optimization approach to solve this issue, this just would not
scale.

PS: The registry is special, it has to defer some work to Linux by
design, so basically all registrable objects are concerned here.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-21 Thread Philippe Gerum
On Mon, 2007-05-21 at 16:43 +0200, Jan Kiszka wrote:
 Hi Philippe,
 
 Jan Kiszka wrote:
  Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some
  bug of my own):
 
 Here is some code to trigger the issue reliably:
 

Ok, deletion over primary domain, that's why it triggers, since we end
up running the deletion hook over the Xenomai domain. We should post an
APC to get the module release done from the Linux space -
schedule_linux_call should be our friend there. Looking at the
module_put code, this should be ok, even over some virq context.

 #include sys/mman.h
 #include native/task.h
 
 void task_fnct(void *arg)
 {
 rt_task_delete(NULL);
 }
 
 main()
 {
 RT_TASK task;
 mlockall(MCL_CURRENT|MCL_FUTURE);
 rt_task_spawn(task, task, 0, 10, 0, task_fnct, NULL);
 }
 
 
  
  [  102.616000] I-pipe: Detected illicit call from domain 'Xenomai'
  [  102.616000] into a service reserved for domain 'Linux' and below.
  [  102.616000]c741bdc8   c8860ef8 c741bdec c0105683 
  c032c200 c13fe22c
  [  102.616000]c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3 
  c8885100 c78beac0
  [  102.616000]c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150 
  c8860ef8 c741be3c
  [  102.616000] Call Trace:
  [  102.616000]  [c0104d9f] show_trace_log_lvl+0x1f/0x40
  [  102.616000]  [c0104e71] show_stack_log_lvl+0xb1/0xe0
  [  102.616000]  [c0105683] show_stack+0x33/0x40
  [  102.616000]  [c01519ed] ipipe_check_context+0xad/0xc0
  [  102.616000]  [c0142ce9] module_put+0x19/0x90
  [  102.616000]  [c884d075] xnshadow_unmap+0xb5/0x130 [xeno_nucleus]
  [  102.616000]  [c8871dc5] __shadow_delete_hook+0x25/0x30 [xeno_native]
  [  102.616000]  [c8842f78] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus]
  [  102.616000]  [c8844bfb] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus]
  [  102.616000]  [c886f5bd] rt_task_delete+0x20d/0x220 [xeno_native]
  
  I would dare to say that module_put in xnshadow_unmap is not well placed
  as it can wakeup a Linux process. The module ref-counter maintenance
  needs some postponing, I guess.
 
 Attached is a patch proposal. It solves the issue by postponing the
 module_put via a new schedule_linux_call. Note that this approach issues
 LO_WAKEUP_REQ where the old test (p-state != TASK_RUNNING) would not
 have done so. I don't see negative side effects yet, and I'm furthermore
 not sure of the old code was handling SMP scenarios safely (What if the
 thread to be unmapped was running on different CPU than xnshadow_unmap?
 How to ensure test-atomicity then?).
 
 
 Well, this thing seems to work, but it doesn't leave me with a good
 feeling. IMHO, there are far too many cleanup cases with all the
 combinations of non-shadow termination, shadow self-termination,
 termination on task exit, various SMP scenarios, etc. Moreover, quite a
 lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
 At least for shadow threads, I don't think all of this is necessary.
 Actually, deferring most of the uncritical cleanup work into Linux
 context would make things more regular, easier to review, and likely
 less disturbing for overall system latencies. I know that atomic
 scheduling hooks are required to implement certain skins, but I don't
 see why janitor work should be done there as well. And we also still
 have that tiny uncleanness on shadow threads termination around TCB
 release vs. hook execution.
 
 As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
 may (theoretically) bite large setups when e.g. too many threads gets
 killed from the wrong context. At least some overflow detection is
 required here to warn the unfortunate user that he reached hard-coded
 scalability limits.
 
 Jan
 
 
 ---
  ksrc/nucleus/shadow.c |   61 
 +++---
  1 file changed, 34 insertions(+), 27 deletions(-)
 
 Index: xenomai/ksrc/nucleus/shadow.c
 ===
 --- xenomai.orig/ksrc/nucleus/shadow.c
 +++ xenomai/ksrc/nucleus/shadow.c
 @@ -92,6 +92,7 @@ static struct __lostagerq {
  #define LO_RENICE_REQ 2
  #define LO_SIGGRP_REQ 3
  #define LO_SIGTHR_REQ 4
 +#define LO_UNMAP_REQ  5
   int type;
   struct task_struct *task;
   int arg;
 @@ -778,6 +779,28 @@ static inline void unlock_timers(void)
   clrbits(nktbase.status, XNTBLCK);
  }
  
 +static void xnshadow_dereference_skin(unsigned magic)
 +{
 + unsigned muxid;
 +
 + for (muxid = 0; muxid  XENOMAI_MUX_NR; muxid++) {
 + if (muxtable[muxid].magic == magic) {
 + if (xnarch_atomic_dec_and_test(muxtable[0].refcnt))
 + xnarch_atomic_dec(muxtable[0].refcnt);
 + if (xnarch_atomic_dec_and_test(muxtable[muxid].refcnt))
 +
 + /* We were the last thread, decrement the 
 counter,
 +

Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-21 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi Philippe,

 Jan Kiszka wrote:

 Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some
 bug of my own):

 Here is some code to trigger the issue reliably:

 #include sys/mman.h
 #include native/task.h

 void task_fnct(void *arg)
 {
 rt_task_delete(NULL);
 }

 main()
 {
 RT_TASK task;
 mlockall(MCL_CURRENT|MCL_FUTURE);
 rt_task_spawn(task, task, 0, 10, 0, task_fnct, NULL);
 }



 [  102.616000] I-pipe: Detected illicit call from domain 'Xenomai'
 [  102.616000] into a service reserved for domain 'Linux' and below.
 [  102.616000]c741bdc8   c8860ef8 c741bdec c0105683 
 c032c200 c13fe22c
 [  102.616000]c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3 
 c8885100 c78beac0
 [  102.616000]c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150 
 c8860ef8 c741be3c
 [  102.616000] Call Trace:
 [  102.616000]  [c0104d9f] show_trace_log_lvl+0x1f/0x40
 [  102.616000]  [c0104e71] show_stack_log_lvl+0xb1/0xe0
 [  102.616000]  [c0105683] show_stack+0x33/0x40
 [  102.616000]  [c01519ed] ipipe_check_context+0xad/0xc0
 [  102.616000]  [c0142ce9] module_put+0x19/0x90
 [  102.616000]  [c884d075] xnshadow_unmap+0xb5/0x130 [xeno_nucleus]
 [  102.616000]  [c8871dc5] __shadow_delete_hook+0x25/0x30 [xeno_native]
 [  102.616000]  [c8842f78] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus]
 [  102.616000]  [c8844bfb] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus]
 [  102.616000]  [c886f5bd] rt_task_delete+0x20d/0x220 [xeno_native]

 I would dare to say that module_put in xnshadow_unmap is not well placed
 as it can wakeup a Linux process. The module ref-counter maintenance
 needs some postponing, I guess.

 Attached is a patch proposal. It solves the issue by postponing the
 module_put via a new schedule_linux_call. Note that this approach issues
 LO_WAKEUP_REQ where the old test (p-state != TASK_RUNNING) would not
 have done so. I don't see negative side effects yet, and I'm furthermore
 not sure of the old code was handling SMP scenarios safely (What if the
 thread to be unmapped was running on different CPU than xnshadow_unmap?
 How to ensure test-atomicity then?).
 
 This one counts as mine! I am Ok with the fix, but IMHO, the
 if(p-state != TASK_RUNNING) probably has a reason, so I would leave
 it in the new implementation.
 

But then I need some official declaration, that this test cannot race
with whatever in case the target task runs on another CPU. I tried to
direct my brain along this twisted path, but stopped before it started
hurting too much (others would call it laziness). :)

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-21 Thread Jan Kiszka
Philippe Gerum wrote:
 On Mon, 2007-05-21 at 16:43 +0200, Jan Kiszka wrote:
 Hi Philippe,

 Jan Kiszka wrote:
 Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some
 bug of my own):
 Here is some code to trigger the issue reliably:

 #include sys/mman.h
 #include native/task.h

 void task_fnct(void *arg)
 {
 rt_task_delete(NULL);
 }

 main()
 {
 RT_TASK task;
 mlockall(MCL_CURRENT|MCL_FUTURE);
 rt_task_spawn(task, task, 0, 10, 0, task_fnct, NULL);
 }


 [  102.616000] I-pipe: Detected illicit call from domain 'Xenomai'
 [  102.616000] into a service reserved for domain 'Linux' and below.
 [  102.616000]c741bdc8   c8860ef8 c741bdec c0105683 
 c032c200 c13fe22c
 [  102.616000]c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3 
 c8885100 c78beac0
 [  102.616000]c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150 
 c8860ef8 c741be3c
 [  102.616000] Call Trace:
 [  102.616000]  [c0104d9f] show_trace_log_lvl+0x1f/0x40
 [  102.616000]  [c0104e71] show_stack_log_lvl+0xb1/0xe0
 [  102.616000]  [c0105683] show_stack+0x33/0x40
 [  102.616000]  [c01519ed] ipipe_check_context+0xad/0xc0
 [  102.616000]  [c0142ce9] module_put+0x19/0x90
 [  102.616000]  [c884d075] xnshadow_unmap+0xb5/0x130 [xeno_nucleus]
 [  102.616000]  [c8871dc5] __shadow_delete_hook+0x25/0x30 [xeno_native]
 [  102.616000]  [c8842f78] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus]
 [  102.616000]  [c8844bfb] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus]
 [  102.616000]  [c886f5bd] rt_task_delete+0x20d/0x220 [xeno_native]

 I would dare to say that module_put in xnshadow_unmap is not well placed
 as it can wakeup a Linux process. The module ref-counter maintenance
 needs some postponing, I guess.
 
 Mm, I should definitely read mails entirely.
 
 Attached is a patch proposal. It solves the issue by postponing the
 module_put via a new schedule_linux_call. Note that this approach issues
 LO_WAKEUP_REQ where the old test (p-state != TASK_RUNNING) would not
 have done so. I don't see negative side effects yet,
 
 This code has to work with an awful lot of runtime situations, including
 those raised by ptracing/GDB, and thread signaling in general, so test
 harnessing is key here.

Well, what about listing those scenarios and their requirements in a
comment if this is so critical? That may make /me feel better as I could
more easily go through them and check changes like those of today
against them.

 
  and I'm furthermore
 not sure of the old code was handling SMP scenarios safely (What if the
 thread to be unmapped was running on different CPU than xnshadow_unmap?
 How to ensure test-atomicity then?).

 
 This wakeup call is there to release emerging shadows waiting on their
 start barrier that get killed before start. Other regular cases imply
 that either the current thread is exiting - in which case this is a no
 brainer, or it has been sent a group kill signal, in which case it has
 to wake up anyway.

I didn't _removed_ cases where the wakeup takes place, I _added_ some
more my removing the test.

 
 Well, this thing seems to work,
 
 It does, actually.
 
  but it doesn't leave me with a good
 feeling. IMHO, there are far too many cleanup cases with all the
 combinations of non-shadow termination, shadow self-termination,
 termination on task exit, various SMP scenarios, etc.
  Moreover, quite a
 lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
 At least for shadow threads, I don't think all of this is necessary.
 Actually, deferring most of the uncritical cleanup work into Linux
 context would make things more regular, easier to review, and likely
 less disturbing for overall system latencies. I know that atomic
 scheduling hooks are required to implement certain skins, but I don't
 see why janitor work should be done there as well. And we also still
 have that tiny uncleanness on shadow threads termination around TCB
 release vs. hook execution.
 
 Talking about xnshadow_unmap, and as your patch illustrates, what you
 could postpone is basically the first part of the routine, which updates
 the reference counts. The rest _must_ run over the deletion hook in
 atomic context (already runs fine all Linux debug switches on). The

rpi_pop()? No chance to call it before entering the scheduler and then
the hooks?

 latency hit is certainly not seen there; the real issues are left in the
 native and POSIX skin thread deletion routines (heap management and
 other housekeeping calls under nklock and so on).

That's surely true. Besides RPI manipulation, there is really no
problematic code left here latency-wise. Other hooks are trickier. But
my point is that we first need some infrastructure to provide an
alternative cleanup context before we can start moving things. My patch
today is a hack...err...workaround from this perspective.

 
 As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
 may (theoretically) bite large 

Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-21 Thread Philippe Gerum
On Mon, 2007-05-21 at 18:21 +0200, Jan Kiszka wrote:
  
  This code has to work with an awful lot of runtime situations, including
  those raised by ptracing/GDB, and thread signaling in general, so test
  harnessing is key here.
 
 Well, what about listing those scenarios and their requirements in a
 comment if this is so critical?

Eh, because I'm likely lazy as you are? :o

  That may make /me feel better as I could
 more easily go through them and check changes like those of today
 against them.
 

Those situations have been encountered when debugging, there is no
limited set of golden rules to document, I'm afraid, this would have
been far too easy. The entire signal handling wrt ptracing is a terrible
mess - somebody must have endured a very bad Karma to implement this the
way it is (long life utrace). The ChangeLog might be a good start for
comments related to fixes in shadow.c/unmap and various head banging
sessions I went through regarding this (2.4/2.6).

  
   and I'm furthermore
  not sure of the old code was handling SMP scenarios safely (What if the
  thread to be unmapped was running on different CPU than xnshadow_unmap?
  How to ensure test-atomicity then?).
 
  
  This wakeup call is there to release emerging shadows waiting on their
  start barrier that get killed before start. Other regular cases imply
  that either the current thread is exiting - in which case this is a no
  brainer, or it has been sent a group kill signal, in which case it has
  to wake up anyway.
 
 I didn't _removed_ cases where the wakeup takes place, I _added_ some
 more my removing the test.
 

In the light of what I have just written, you would only cause useless
wakeups doing so.

  
  Well, this thing seems to work,
  
  It does, actually.
  
   but it doesn't leave me with a good
  feeling. IMHO, there are far too many cleanup cases with all the
  combinations of non-shadow termination, shadow self-termination,
  termination on task exit, various SMP scenarios, etc.
   Moreover, quite a
  lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
  At least for shadow threads, I don't think all of this is necessary.
  Actually, deferring most of the uncritical cleanup work into Linux
  context would make things more regular, easier to review, and likely
  less disturbing for overall system latencies. I know that atomic
  scheduling hooks are required to implement certain skins, but I don't
  see why janitor work should be done there as well. And we also still
  have that tiny uncleanness on shadow threads termination around TCB
  release vs. hook execution.
  
  Talking about xnshadow_unmap, and as your patch illustrates, what you
  could postpone is basically the first part of the routine, which updates
  the reference counts. The rest _must_ run over the deletion hook in
  atomic context (already runs fine all Linux debug switches on). The
 
 rpi_pop()? No chance to call it before entering the scheduler and then
 the hooks?

What do you have in mind precisely?

 
  latency hit is certainly not seen there; the real issues are left in the
  native and POSIX skin thread deletion routines (heap management and
  other housekeeping calls under nklock and so on).
 
 That's surely true. Besides RPI manipulation, there is really no
 problematic code left here latency-wise. Other hooks are trickier. But
 my point is that we first need some infrastructure to provide an
 alternative cleanup context before we can start moving things. My patch
 today is a hack...err...workaround from this perspective.
 

I don't see it this way. This code is very specific, in the sense it
lays on the inter-domain boundary between Linux and Xenomai's primary
mode for a critical operation like thread deletion, this is what make
things a bit confusing. What's below and above this layer has to be as
seamless as possible, this code in particular can't.

  
  As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
  may (theoretically) bite large setups when e.g. too many threads gets
  killed from the wrong context. At least some overflow detection is
  required here to warn the unfortunate user that he reached hard-coded
  scalability limits.
  
  Yes.
 
 CONFIG_XENO_OPT_DEBUG[_NUCLEUS] or unconditionally (error return code)?
 

Making it depend on the debug switch looks good. People who are serious
about making production software do want to activate this switch once in
a while while developing it, really.

 Jan
 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] module_put called over non-root domain

2007-05-21 Thread Jan Kiszka
Philippe Gerum wrote:
 On Mon, 2007-05-21 at 18:21 +0200, Jan Kiszka wrote:
 This code has to work with an awful lot of runtime situations, including
 those raised by ptracing/GDB, and thread signaling in general, so test
 harnessing is key here.
 Well, what about listing those scenarios and their requirements in a
 comment if this is so critical?
 
 Eh, because I'm likely lazy as you are? :o
 

Sigh, that's not a fair argument... ;-

  That may make /me feel better as I could
 more easily go through them and check changes like those of today
 against them.

 
 Those situations have been encountered when debugging, there is no
 limited set of golden rules to document, I'm afraid, this would have
 been far too easy. The entire signal handling wrt ptracing is a terrible
 mess - somebody must have endured a very bad Karma to implement this the
 way it is (long life utrace). The ChangeLog might be a good start for
 comments related to fixes in shadow.c/unmap and various head banging
 sessions I went through regarding this (2.4/2.6).
 
  and I'm furthermore
 not sure of the old code was handling SMP scenarios safely (What if the
 thread to be unmapped was running on different CPU than xnshadow_unmap?
 How to ensure test-atomicity then?).

 This wakeup call is there to release emerging shadows waiting on their
 start barrier that get killed before start. Other regular cases imply
 that either the current thread is exiting - in which case this is a no
 brainer, or it has been sent a group kill signal, in which case it has
 to wake up anyway.
 I didn't _removed_ cases where the wakeup takes place, I _added_ some
 more my removing the test.

 
 In the light of what I have just written, you would only cause useless
 wakeups doing so.

My question was if the test for useless was really race-safe on SMP.
Then we could introduce it again in the APC handler. Otherwise, it might
be safer to live with some degree of uselessness.

 
 Well, this thing seems to work,
 It does, actually.

  but it doesn't leave me with a good
 feeling. IMHO, there are far too many cleanup cases with all the
 combinations of non-shadow termination, shadow self-termination,
 termination on task exit, various SMP scenarios, etc.
  Moreover, quite a
 lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
 At least for shadow threads, I don't think all of this is necessary.
 Actually, deferring most of the uncritical cleanup work into Linux
 context would make things more regular, easier to review, and likely
 less disturbing for overall system latencies. I know that atomic
 scheduling hooks are required to implement certain skins, but I don't
 see why janitor work should be done there as well. And we also still
 have that tiny uncleanness on shadow threads termination around TCB
 release vs. hook execution.
 Talking about xnshadow_unmap, and as your patch illustrates, what you
 could postpone is basically the first part of the routine, which updates
 the reference counts. The rest _must_ run over the deletion hook in
 atomic context (already runs fine all Linux debug switches on). The
 rpi_pop()? No chance to call it before entering the scheduler and then
 the hooks?
 
 What do you have in mind precisely?

Preemptibility whenever having all things in a single atomic region
doesn't buy us much. Even if rpi_pop per se might not be heavy, adding
it to an already complex path doesn't improve things.

So my spontaneous idea was, as you said the rest _must_ be atomic, if
that piece couldn't be actually moved into the task deletion service and
the task_exit Linux hook, ie. before taking the nklock again and do the
final reschedule. Just an example for what might be improvable - once we
dig here again.

 
 latency hit is certainly not seen there; the real issues are left in the
 native and POSIX skin thread deletion routines (heap management and
 other housekeeping calls under nklock and so on).
 That's surely true. Besides RPI manipulation, there is really no
 problematic code left here latency-wise. Other hooks are trickier. But
 my point is that we first need some infrastructure to provide an
 alternative cleanup context before we can start moving things. My patch
 today is a hack...err...workaround from this perspective.

 
 I don't see it this way. This code is very specific, in the sense it
 lays on the inter-domain boundary between Linux and Xenomai's primary
 mode for a critical operation like thread deletion, this is what make
 things a bit confusing. What's below and above this layer has to be as
 seamless as possible, this code in particular can't.

Well, we could move the memory release of the shadow thread in Linux
context e.g. (release on idle, a bit like RCU - uuuh). Then the
reference counter can find a new home as well. Maybe some more things,
dunno yet.

 
 As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
 may (theoretically) bite large setups when e.g. too many threads gets
 killed from the wrong