Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jeroen Van den Keybus
Hello,


I'm currently not at a level to participate in your discussion. Although I'm willing to supply you with stresstests, I would nevertheless like to learn more from task migrationas this debugging session proceeds. In order to do so, please confirm the following statements or indicate where I went wrong. I hope others may learn from this as well.


xn_shadow_harden(): This is called whenever a Xenomai thread performs a Linux (root domain) system call (notified by Adeos ?). The migrating thread (nRT) is marked INTERRUPTIBLE and run by the Linux kernel wake_up_interruptible_sync() call. Is this thread actually run or does it merely put the thread in some Linux to-do list(I assumed the first case) ? And how does it terminate: is only the system call migrated or is the thread allowed to continue run (at a priority level equal to the Xenomai prioritylevel) until it hits something of the Xenomai API (or trivially: explicitly go to RT using the API) ? In that case, Iexpect the nRT thread to terminate with a schedule() call in the Xeno OS API code which deactivates the task so that it won't ever run in Linux context anymore. A top priority gatekeeper is in place as a software hook to catch Linux's attentionright after that schedule(), which might otherwise schedule something else (and leave only interrupts for Xenomai to come back to life again). I have the impression that I cannot see this gatekeeper, nor the (n)RT threads using the ps command ?


Is it correct to state that thecurrent preemption issueis due to the gatekeeper beinginvoked too soon? Could someone knowing more about the migration technology explain what exactly goes wrong ?

Thanks,


Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Re: [PATCH] Shared irqs v.5

2006-01-23 Thread Dmitry Adamushko
On 23/01/06, Jan Kiszka [EMAIL PROTECTED] wrote:
Dmitry Adamushko wrote: Hello Jan, as I promised earlier today, here is the patch.I finally had a look at your patch (not yet a try), and it looks reallynice and light-weight.

I have another version here at hand. The only difference is that
xnintr_irq_handler() handles all interrupts and destinguished the timer
interrupt via irq == XNARCH_TIMER_IRQ to handle it appropriately.
This way, the i-cache is, hopefully, used a bit more effectively. But
it doesn't make a big difference in other parts of code so you may
start testing with the one I posted earlier.

 Now I only have two topics on my wish list:
o Handling of edge-triggered IRQs (ISA-cards...). As I tried to explain,in this case we have to run the IRQ handler list as long as the full
list completed without any handler reporting XN_ISR_ENABLE back. Thenand only then we are safe to not stall the IRQ line. See e.g.serial8250_interrupt() in linux/drivers/serial/8250.c for a per-driver
solution and [1] for some discussion on sharing IRQs (be warned, it'sfrom the evil side ;) ).
Ok. e.g. we may introduce another flag to handle such a special case.
Something along XN_ISR_EDGETIRQ and maybe a separate
xnintr_etshirq_handler() (xnintr_attach() will set it up properly) so
to avoid interfering with another code. No big overhead I guess.
serial8250_interrupt() defines a maximum number of iterations so we should do the same (?) to avoid brain-damaged cases.
On our systems we already have two of those use-cases: the xeno_16550Ahandling up to 4 devices on the same IRQ on an ISA card (I don't want
to know what worst-case latency can be caused here...) and ourSJA-1000 CAN driver for PC/104 ISA card with 2 controllers on the sameinterrupt line. So a common solution would reduce the code size and
potential bug sources.o Store and display (/proc) the driver name(s) registered on an IRQ linesomewhere (ipipe?). This is just a nice-to-have. I introduced the RTDMAPI with the required argument in place, would be great if we can use
this some day.
Yes, the proper /proc extension should be avalable. Actually, the
native skin can't be extended to support the shared interrupts only
by adding a new flag. The problem is the way the
/proc/xenomai/registry/interrupts is implemented there (and I assume
any other skin follows the same way). The rt_registry object is created
per each RT_INTR structure and, hence, per each xnintr_t.

I'd see the following scheme :

either

/proc/xenomai/interrupts lists all interrupts objects registered on the nucleus layer (xnintr_t should have a name field).

IRQN drivers

3 driver1
...
5 driver2, driver3

and the skin presents per-object information as

ll /proc/xenomai/registry/interrupts

driver1
driver2
driver3

each of those files contains the same information as now.

To achieve this, 

1) xnintr_t should be extended with a name field;

2) rt_intr_create() should contain a name argument and not use auto-generation (as irqN) any more.

or

ll /proc/xenomai/registry/interrupts

3
5
Those are directories and e.g.

ll /proc/xenomai/registry/interrupts/5

driver2
driver3

Those are files and contain the same information as now.

This is harder to implement since the registry interface should be extended (for each skin).

 ...
JanPS: Still at home?
Yes. This week I'm going to Belgium to attend a few meeting with some
customers of my potential employer. So my next step for the nearest
future will be finally determined there :)
 How many degrees Centigrade? I guess our current -9°Chere in Hannover must appear ridiculous, almost subtropical warm to you. ;)

Hey, I'm not from Syberia :o) This is a kind of common delusion I guess
as the whole former USSR is assotiated with cold winters, bears, eak..
KGB etc. :o)

from wikipedia.com (about Belarus) : 

The climate ranges from harsh winters (average January temperatures are in the range −8°C to −2°C) to cool and moist 
summers (average temperature 15°C to 20°C).

Actually, last days it was very cold - even about -30C. This
happens from time to time but very rare (once in a few years or so) and
it's not considered as something normal here. e.g. schools were closed
a few last days when the temperature was below -25. Actually, the
weather is getting crazy last years and not only here :)
[1] http://www.microsoft.com/whdc/system/sysperf/apic.mspx
-- Best regards,Dmitry Adamushko
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Gilles Chanteperdrix
Jeroen Van den Keybus wrote:
  Hello,
  
  
  I'm currently not at a level to participate in your discussion. Although I'm
  willing to supply you with stresstests, I would nevertheless like to learn
  more from task migration as this debugging session proceeds. In order to do
  so, please confirm the following statements or indicate where I went wrong.
  I hope others may learn from this as well.
  
  xn_shadow_harden(): This is called whenever a Xenomai thread performs a
  Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called shadow thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
  (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
  wake_up_interruptible_sync() call. Is this thread actually run or does it
  merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as
suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

  And how does it terminate: is only the system call migrated or is the thread
  allowed to continue run (at a priority level equal to the Xenomai
  priority level) until it hits something of the Xenomai API (or trivially:
  explicitly go to RT using the API) ? 

I am not sure I follow you here. The usual case is that the thread will
remain in primary mode after the system call, but I think a system call
flag allow the other behaviour. So, if I understand the question
correctly, the answer is that it depends on the system call.

  In that case, I expect the nRT thread to terminate with a schedule()
  call in the Xeno OS API code which deactivates the task so that it
  won't ever run in Linux context anymore. A top priority gatekeeper is
  in place as a software hook to catch Linux's attention right after
  that schedule(), which might otherwise schedule something else (and
  leave only interrupts for Xenomai to come back to life again).

Here is the way I understand it. We have two threads, or rather two
views of the same thread, with each its state. Switching from
secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
means changing the two states at once. Since we can not do that, we need
an intermediate state. Since the intermediate state can not be the state
where the two threads are running (they share the same stack and
program counter), the intermediate state is a state where the two
threads are suspended, but another context needs running, it is the
gatekeeper.

   I have
  the impression that I cannot see this gatekeeper, nor the (n)RT
  threads using the ps command ?

The gatekeeper and Xenomai user-space threads are regular Linux
contexts, you can seen them using the ps command.

  
  Is it correct to state that the current preemption issue is due to the
  gatekeeper being invoked too soon ? Could someone knowing more about the
  migration technology explain what exactly goes wrong ?

Jan seems to have found such an issue here. I am not sure I understood
what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains
why I could not observe the bug, 

[Xenomai-core] Re: [PATCH] Shared irqs v.5

2006-01-23 Thread Jan Kiszka
Dmitry Adamushko wrote:
 On 23/01/06, Jan Kiszka [EMAIL PROTECTED] wrote:
 Dmitry Adamushko wrote:
 Hello Jan,

 as I promised earlier today, here is the patch.
 I finally had a look at your patch (not yet a try), and it looks really
 nice and light-weight.
 
 
 I have another version here at hand. The only difference is that
 xnintr_irq_handler() handles all interrupts and destinguished the timer
 interrupt via irq == XNARCH_TIMER_IRQ to handle it appropriately. This
 way, the i-cache is, hopefully, used a bit more effectively. But it doesn't
 make a big difference in other parts of code so you may start testing with
 the one I posted earlier.
 

I see: hunting the microsecond. ;)

 
 Now I only have two topics on my wish list:
 o Handling of edge-triggered IRQs (ISA-cards...). As I tried to explain,
   in this case we have to run the IRQ handler list as long as the full
   list completed without any handler reporting XN_ISR_ENABLE back. Then
   and only then we are safe to not stall the IRQ line. See e.g.
   serial8250_interrupt() in linux/drivers/serial/8250.c for a per-driver
   solution and [1] for some discussion on sharing IRQs (be warned, it's
   from the evil side ;) ).
 
 
 Ok. e.g. we may introduce another flag to handle such a special case.
 Something along XN_ISR_EDGETIRQ and maybe a separate
 xnintr_etshirq_handler() (xnintr_attach() will set it up properly) so to
 avoid interfering with another code. No big overhead I guess.
 serial8250_interrupt() defines a maximum number of iterations so we should
 do the same (?) to avoid brain-damaged cases.

Might be useful for post-mortem analysis - most real-time system will
likely already have caused more or less severe damage after an IRQ
handler looped for 256 times or so. Anyway, if you are able to read some
information about this later on your console, it is better than not
knowing at all why your drilling machine just created an extra-deep hole.

 
   On our systems we already have two of those use-cases: the xeno_16550A
   handling up to 4 devices on the same IRQ on an ISA card (I don't want
   to know what worst-case latency can be caused here...) and our
   SJA-1000 CAN driver for PC/104 ISA card with 2 controllers on the same
   interrupt line. So a common solution would reduce the code size and
   potential bug sources.

 o Store and display (/proc) the driver name(s) registered on an IRQ line
   somewhere (ipipe?). This is just a nice-to-have. I introduced the RTDM
   API with the required argument in place, would be great if we can use
   this some day.
 
 
 Yes, the proper /proc extension should be avalable. Actually, the native
 skin can't be extended to support the shared interrupts only by adding a new
 flag. The problem is the way the /proc/xenomai/registry/interrupts is
 implemented there (and I assume any other skin follows the same way). The
 rt_registry object is created per each RT_INTR structure and, hence, per
 each xnintr_t.
 
 I'd see the following scheme :
 
 either
 
 /proc/xenomai/interrupts lists all interrupts objects registered on the
 nucleus layer (xnintr_t should have a name field).
 
 IRQN  drivers
 
 3  driver1
 ...
 5  driver2, driver3
 
 and the skin presents per-object information as
 
 ll /proc/xenomai/registry/interrupts
 
 driver1
 driver2
 driver3
 
 each of those files contains the same information as now.
 
 To achieve this,
 
 1) xnintr_t should be extended with a name field;
 
 2) rt_intr_create() should contain a name argument and not use
 auto-generation (as irqN) any more.
 
 or
 
 ll /proc/xenomai/registry/interrupts
 
 3
 5
 
 Those are directories and e.g.
 
 ll /proc/xenomai/registry/interrupts/5
 
 driver2
 driver3
 
 Those are files and contain the same information as now.
 
 This is harder to implement since the registry interface should be extended
 (for each skin).

Isn't just the native skin using this registry? Anyway, as the preferred
way of registering IRQ handlers should be via RTDM, and RTDM does not
use the registry, go for the simplest solution. /proc/xenomai/interrupts
is more important in my eyes.

 
 
 ...
 
 Jan


 PS: Still at home?
 
 
 Yes. This week I'm going to Belgium to attend a few meeting with some
 customers of my potential employer. So my next step for the nearest future
 will be finally determined there :)

Best wishes! Just avoid too much extra work, here is already enough to
do. ;)

 
 
 How many degrees Centigrade? I guess our current -9$B!k(BC
 here in Hannover must appear ridiculous, almost subtropical warm to you.
 ;)
 
 
 Hey, I'm not from Syberia :o) This is a kind of common delusion I guess as
 the whole former USSR is assotiated with cold winters, bears, eak.. KGB etc.
 :o)

Former? Did I missed something? :)

 
 from wikipedia.com (about Belarus) :
 
 The climate http://en.wikipedia.org/wiki/Climate ranges from harsh
 winters http://en.wikipedia.org/wiki/Winter (average January temperatures
 are in the range $B!](B8 

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jeroen Van den Keybus wrote:
   Hello,
   
   
   I'm currently not at a level to participate in your discussion. Although 
 I'm
   willing to supply you with stresstests, I would nevertheless like to learn
   more from task migration as this debugging session proceeds. In order to do
   so, please confirm the following statements or indicate where I went wrong.
   I hope others may learn from this as well.
   
   xn_shadow_harden(): This is called whenever a Xenomai thread performs a
   Linux (root domain) system call (notified by Adeos ?). 
 
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.
 
 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called shadow thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.
 
 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.
 
 The migrating thread
   (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
   wake_up_interruptible_sync() call. Is this thread actually run or does it
   merely put the thread in some Linux to-do list (I assumed the first case) ?
 
 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks itself as
 suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the _sync
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

 suspended is not needed, since the gatekeeper may have a high priority,
 and calling schedule() is enough. In any case, the waken up thread does
 not seem to be run immediately, so this rather look like the second
 case.
 
 Since in xnshadow_harden, the running thread marks itself as suspended
 before running wake_up_interruptible_sync, the gatekeeper will run when
 schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
 configuration. In the non-preempt case, the current thread will be
 suspended and the gatekeeper will run when schedule() is explicitely
 called in xnshadow_harden(). In the preempt case, schedule gets called
 when the outermost spinlock is unlocked in wake_up_interruptible_sync().
 
   And how does it terminate: is only the system call migrated or is the 
 thread
   allowed to continue run (at a priority level equal to the Xenomai
   priority level) until it hits something of the Xenomai API (or trivially:
   explicitly go to RT using the API) ? 
 
 I am not sure I follow you here. The usual case is that the thread will
 remain in primary mode after the system call, but I think a system call
 flag allow the other behaviour. So, if I understand the question
 correctly, the answer is that it depends on the system call.
 
   In that case, I expect the nRT thread to terminate with a schedule()
   call in the Xeno OS API code which deactivates the task so that it
   won't ever run in Linux context anymore. A top priority gatekeeper is
   in place as a software hook to catch Linux's attention right after
   that schedule(), which might otherwise schedule something else (and
   leave only interrupts for Xenomai to come back to life again).
 
 Here is the way I understand it. We have two threads, or rather two
 views of the same thread, with each its state. Switching from
 secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
 means changing the two states at once. Since we can not do that, we need
 an intermediate state. Since the intermediate state can not be the state
 where the two threads are running (they share the same stack and
 program counter), the intermediate state is a state where the two
 threads are suspended, but another context needs running, it is the
 gatekeeper.
 
I have
   the impression that I cannot see this gatekeeper, nor the (n)RT
   threads using the ps command ?
 
 The gatekeeper and Xenomai user-space 

[Xenomai-core] Ipipe xenomai and LTTng

2006-01-23 Thread Alexis Berlemont
Hi,

The patch available at 

http://download.gna.org/adeos/patches/v2.6/i386/combo/adeos-ipipe-2.6.14-i386-1.1-00-lttng-0.5.0a.patch

is a Ipipe + LTTng patch for the 2.6.14.5-i386 kernel release.

In order to trace LTTng events in Xenomai, the patch 
xenomai-2.1-rc1-lttng-05.patch.bz2 is necessary, this patch replaces the 
former LTT code with calls to LTTng tracing functions.

This Xenomai patch is an experimental ugly try, it does not contain any filter 
facilities. All Xeno events are recorded.

I have used the tool genevent' (available on the LTTng site) to generate all 
C tracing functions. 

If anyone is OK to test this stuff, please use the QUICKSTART file available 
at http://ltt.polymtl.ca/svn/ltt/branches/poly/QUICKSTART (just replace the 
LTTng patches by our Ipipe + LTTng patch).

This patch has been created and tested with the folloving patches and 
packages:
- adeos-ipipe-2.6.14-i386-1.1-00.patch
- patch-2.6.14-lttng-0.5.0a.tar.bz2
- lttng-modules-0.4.tar.gz
- LinuxTraceToolkitViewer-0.8.0-17122005.tar.gz
- genevent-0.3.tar.gz

The Ipipe patch comes from the Adeos download area and the LTTng stuff is 
available at the following address : http://ltt.polymtl.ca/

This patch has only been tested on a UP machine.

Alexis.


xenomai-2.1-rc1-lttng-05.patch.bz2
Description: BZip2 compressed data


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jeroen Van den Keybus
Hello,


I'm currently not at a level to participate in your discussion. Although I'm willing to supply you with stresstests, I would nevertheless like to learn more from task migrationas this debugging session proceeds. In order to do so, please confirm the following statements or indicate where I went wrong. I hope others may learn from this as well.


xn_shadow_harden(): This is called whenever a Xenomai thread performs a Linux (root domain) system call (notified by Adeos ?). The migrating thread (nRT) is marked INTERRUPTIBLE and run by the Linux kernel wake_up_interruptible_sync() call. Is this thread actually run or does it merely put the thread in some Linux to-do list(I assumed the first case) ? And how does it terminate: is only the system call migrated or is the thread allowed to continue run (at a priority level equal to the Xenomai prioritylevel) until it hits something of the Xenomai API (or trivially: explicitly go to RT using the API) ? In that case, Iexpect the nRT thread to terminate with a schedule() call in the Xeno OS API code which deactivates the task so that it won't ever run in Linux context anymore. A top priority gatekeeper is in place as a software hook to catch Linux's attentionright after that schedule(), which might otherwise schedule something else (and leave only interrupts for Xenomai to come back to life again). I have the impression that I cannot see this gatekeeper, nor the (n)RT threads using the ps command ?


Is it correct to state that thecurrent preemption issueis due to the gatekeeper beinginvoked too soon? Could someone knowing more about the migration technology explain what exactly goes wrong ?

Thanks,


Jeroen.


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Gilles Chanteperdrix
Jeroen Van den Keybus wrote:
  Hello,
  
  
  I'm currently not at a level to participate in your discussion. Although I'm
  willing to supply you with stresstests, I would nevertheless like to learn
  more from task migration as this debugging session proceeds. In order to do
  so, please confirm the following statements or indicate where I went wrong.
  I hope others may learn from this as well.
  
  xn_shadow_harden(): This is called whenever a Xenomai thread performs a
  Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called shadow thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
  (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
  wake_up_interruptible_sync() call. Is this thread actually run or does it
  merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as
suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

  And how does it terminate: is only the system call migrated or is the thread
  allowed to continue run (at a priority level equal to the Xenomai
  priority level) until it hits something of the Xenomai API (or trivially:
  explicitly go to RT using the API) ? 

I am not sure I follow you here. The usual case is that the thread will
remain in primary mode after the system call, but I think a system call
flag allow the other behaviour. So, if I understand the question
correctly, the answer is that it depends on the system call.

  In that case, I expect the nRT thread to terminate with a schedule()
  call in the Xeno OS API code which deactivates the task so that it
  won't ever run in Linux context anymore. A top priority gatekeeper is
  in place as a software hook to catch Linux's attention right after
  that schedule(), which might otherwise schedule something else (and
  leave only interrupts for Xenomai to come back to life again).

Here is the way I understand it. We have two threads, or rather two
views of the same thread, with each its state. Switching from
secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
means changing the two states at once. Since we can not do that, we need
an intermediate state. Since the intermediate state can not be the state
where the two threads are running (they share the same stack and
program counter), the intermediate state is a state where the two
threads are suspended, but another context needs running, it is the
gatekeeper.

   I have
  the impression that I cannot see this gatekeeper, nor the (n)RT
  threads using the ps command ?

The gatekeeper and Xenomai user-space threads are regular Linux
contexts, you can seen them using the ps command.

  
  Is it correct to state that the current preemption issue is due to the
  gatekeeper being invoked too soon ? Could someone knowing more about the
  migration technology explain what exactly goes wrong ?

Jan seems to have found such an issue here. I am not sure I understood
what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains
why I could not observe the bug, 

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jeroen Van den Keybus wrote:
   Hello,
   
   
   I'm currently not at a level to participate in your discussion. Although 
 I'm
   willing to supply you with stresstests, I would nevertheless like to learn
   more from task migration as this debugging session proceeds. In order to do
   so, please confirm the following statements or indicate where I went wrong.
   I hope others may learn from this as well.
   
   xn_shadow_harden(): This is called whenever a Xenomai thread performs a
   Linux (root domain) system call (notified by Adeos ?). 
 
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.
 
 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called shadow thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.
 
 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.
 
 The migrating thread
   (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
   wake_up_interruptible_sync() call. Is this thread actually run or does it
   merely put the thread in some Linux to-do list (I assumed the first case) ?
 
 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks itself as
 suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the _sync
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

 suspended is not needed, since the gatekeeper may have a high priority,
 and calling schedule() is enough. In any case, the waken up thread does
 not seem to be run immediately, so this rather look like the second
 case.
 
 Since in xnshadow_harden, the running thread marks itself as suspended
 before running wake_up_interruptible_sync, the gatekeeper will run when
 schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
 configuration. In the non-preempt case, the current thread will be
 suspended and the gatekeeper will run when schedule() is explicitely
 called in xnshadow_harden(). In the preempt case, schedule gets called
 when the outermost spinlock is unlocked in wake_up_interruptible_sync().
 
   And how does it terminate: is only the system call migrated or is the 
 thread
   allowed to continue run (at a priority level equal to the Xenomai
   priority level) until it hits something of the Xenomai API (or trivially:
   explicitly go to RT using the API) ? 
 
 I am not sure I follow you here. The usual case is that the thread will
 remain in primary mode after the system call, but I think a system call
 flag allow the other behaviour. So, if I understand the question
 correctly, the answer is that it depends on the system call.
 
   In that case, I expect the nRT thread to terminate with a schedule()
   call in the Xeno OS API code which deactivates the task so that it
   won't ever run in Linux context anymore. A top priority gatekeeper is
   in place as a software hook to catch Linux's attention right after
   that schedule(), which might otherwise schedule something else (and
   leave only interrupts for Xenomai to come back to life again).
 
 Here is the way I understand it. We have two threads, or rather two
 views of the same thread, with each its state. Switching from
 secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
 means changing the two states at once. Since we can not do that, we need
 an intermediate state. Since the intermediate state can not be the state
 where the two threads are running (they share the same stack and
 program counter), the intermediate state is a state where the two
 threads are suspended, but another context needs running, it is the
 gatekeeper.
 
I have
   the impression that I cannot see this gatekeeper, nor the (n)RT
   threads using the ps command ?
 
 The gatekeeper and Xenomai user-space