subject:"\[Xenomai\-core\] \[BUG\] racy xnshadow_harden under CONFIG

Jeroen Van den Keybus wrote:
>> Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
>> switching to SVN head?
> 
> 
> Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely
> patch it against the SVN tree ? After that, what will 'svn up' do to the
> patched tree ?

The CONFIG_PREEMPT fix is already contained in the latest SVN revision,
no need to patch anymore.

When unsure if a patch will cleanly apply, try "patch --dry-run" first.
(Virtually) rejected hunks can then be used to asses if the patch fits -
without messing up the code base immediately.

> 
> Remember I'm quite new to Linux. Actually, I spent half an hour finding out
> how that patch stuff (especially the -p option) works.
> 

:) (it's no problem to ask even these kind of "stupid" questions to the
list or us directly - no one will bite you!)

Jan

signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Revision 466 contains the mutex-info fix, but that is post -rc2. Why notswitching to SVN head?

 
Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely patch it against the SVN tree ? After that, what will 'svn up' do to the patched tree ?
 
Remember I'm quite new to Linux. Actually, I spent half an hour finding out how that patch stuff (especially the -p option) works.
 
Jeroen.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jeroen Van den Keybus wrote:
>>> I've installed both patches and the problem seems to have disappeared.
>>> I'll try it on another machine tomorrow, too. Meanwhile: thanks very
>>> much for the assistance !
> 
> 
> While testing more thoroughly, my triggers for zero mutex values after
> acquiring the lock are going off again. I was using the SVN xenomai
> development tree, but I've now switched to the (fixed) 2.1-rc2 in order to
> apply the patches. Is Jan's bugfix included in that one ?

Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
switching to SVN head?

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT


> I've installed both patches and the problem seems to have disappeared.> I'll try it on another machine tomorrow, too. Meanwhile: thanks very
> much for the assistance !
 
While testing more thoroughly, my triggers for zero mutex values after acquiring the lock are going off again. I was using the SVN xenomai development tree, but I've now switched to the (fixed) 2.1-rc2 in order to apply the patches. Is Jan's bugfix included in that one ?

 
Jeroen.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jeroen Van den Keybus wrote:
>> Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
>> switching to SVN head?
> 
> 
> Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely
> patch it against the SVN tree ? After that, what will 'svn up' do to the
> patched tree ?

The CONFIG_PREEMPT fix is already contained in the latest SVN revision,
no need to patch anymore.

When unsure if a patch will cleanly apply, try "patch --dry-run" first.
(Virtually) rejected hunks can then be used to asses if the patch fits -
without messing up the code base immediately.

> 
> Remember I'm quite new to Linux. Actually, I spent half an hour finding out
> how that patch stuff (especially the -p option) works.
> 

:) (it's no problem to ask even these kind of "stupid" questions to the
list or us directly - no one will bite you!)

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Revision 466 contains the mutex-info fix, but that is post -rc2. Why notswitching to SVN head?

 
Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely patch it against the SVN tree ? After that, what will 'svn up' do to the patched tree ?
 
Remember I'm quite new to Linux. Actually, I spent half an hour finding out how that patch stuff (especially the -p option) works.
 
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jeroen Van den Keybus wrote:
>>> I've installed both patches and the problem seems to have disappeared.
>>> I'll try it on another machine tomorrow, too. Meanwhile: thanks very
>>> much for the assistance !
> 
> 
> While testing more thoroughly, my triggers for zero mutex values after
> acquiring the lock are going off again. I was using the SVN xenomai
> development tree, but I've now switched to the (fixed) 2.1-rc2 in order to
> apply the patches. Is Jan's bugfix included in that one ?

Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
switching to SVN head?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT


> I've installed both patches and the problem seems to have disappeared.> I'll try it on another machine tomorrow, too. Meanwhile: thanks very
> much for the assistance !
 
While testing more thoroughly, my triggers for zero mutex values after acquiring the lock are going off again. I was using the SVN xenomai development tree, but I've now switched to the (fixed) 2.1-rc2 in order to apply the patches. Is Jan's bugfix included in that one ?

 
Jeroen.
  
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Philippe Gerum


Jeroen Van den Keybus wrote:

And now, Ladies and Gentlemen, with the patches attached.

 
I've installed both patches and the problem seems to have disappeared. 
I'll try it on another machine tomorrow, too. Meanwhile: thanks very 
much for the assistance !




Actually, the effort you made to provide a streamlined testcase that triggered the 
bug did most of the job, so you are the one to thank here. The rest was only a 
matter of dealing with my own bugs, which is a sisyphean activity I'm rather 
familiar with.



Jeroen.
 



--

Philippe.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Jeroen Van den Keybus

And now, Ladies and Gentlemen, with the patches attached.
 
I've installed both patches and the problem seems to have disappeared. I'll try it on another machine tomorrow, too. Meanwhile: thanks very much for the assistance !
Jeroen.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Jan Kiszka

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> At this chance: any comments on the panic-freeze extension for the
>> tracer? I need to rework the Xenomai patch, but the ipipe side should be
>> ready for merge.
>>
> 
> No issue with the ipipe side since it only touches the tracer support
> code. No issue either at first sight with the Xeno side, aside of the
> trace being frozen twice in do_schedule_event? (once in this routine,
> twice in xnpod_fatal); but maybe it's wanted to freeze the situation
> before the stack is dumped; is it?

Yes, this is the reason for it. Actually, only the first freeze has any
effect, later calls will be ignored.

Hmm, I though to remember some issue of the Xenomai-side patch when
tracing was disabled but I cannot reproduce this issue again (was likely
related to other hacks while tracking down the PREEMPT issue). So from
my POV that patch is ready for merge as well.

Jan

signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Philippe Gerum


Jeroen Van den Keybus wrote:

And now, Ladies and Gentlemen, with the patches attached.

 
I've installed both patches and the problem seems to have disappeared. 
I'll try it on another machine tomorrow, too. Meanwhile: thanks very 
much for the assistance !




Actually, the effort you made to provide a streamlined testcase that triggered the 
bug did most of the job, so you are the one to thank here. The rest was only a 
matter of dealing with my own bugs, which is a sisyphean activity I'm rather 
familiar with.



Jeroen.
 



--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Jeroen Van den Keybus

And now, Ladies and Gentlemen, with the patches attached.
 
I've installed both patches and the problem seems to have disappeared. I'll try it on another machine tomorrow, too. Meanwhile: thanks very much for the assistance !
Jeroen.
 
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Jan Kiszka

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> At this chance: any comments on the panic-freeze extension for the
>> tracer? I need to rework the Xenomai patch, but the ipipe side should be
>> ready for merge.
>>
> 
> No issue with the ipipe side since it only touches the tracer support
> code. No issue either at first sight with the Xeno side, aside of the
> trace being frozen twice in do_schedule_event? (once in this routine,
> twice in xnpod_fatal); but maybe it's wanted to freeze the situation
> before the stack is dumped; is it?

Yes, this is the reason for it. Actually, only the first freeze has any
effect, later calls will be ignored.

Hmm, I though to remember some issue of the Xenomai-side patch when
tracing was disabled but I cannot reproduce this issue again (was likely
related to other hacks while tracking down the PREEMPT issue). So from
my POV that patch is ready for merge as well.

Jan

signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-31 Thread Philippe Gerum


Jan Kiszka wrote:

Philippe Gerum wrote:


Philippe Gerum wrote:


Jan Kiszka wrote:



Gilles Chanteperdrix wrote:



Jeroen Van den Keybus wrote:


Hello,


I'm currently not at a level to participate in your


discussion. Although I'm


willing to supply you with stresstests, I would nevertheless like


to learn


more from task migration as this debugging session proceeds. In


order to do


so, please confirm the following statements or indicate where I


went wrong.


I hope others may learn from this as well.


xn_shadow_harden(): This is called whenever a Xenomai thread


performs a


Linux (root domain) system call (notified by Adeos ?).


xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a
few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread


(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
wake_up_interruptible_sync() call. Is this thread actually run or


does it


merely put the thread in some Linux to-do list (I assumed the


first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks
itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be
unlinked from the Linux runqueue before the gatekeeper processes the
resumption request, whatever event the kernel is processing
asynchronously in the meantime. This is the reason why, as you already
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
CPU from the hardening thread whilst keeping it linked to the
runqueue: upon return from such preemption, the gatekeeper might have
run already,  hence the newly hardened thread ends up being seen as
runnable by both the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in
xnshadow_harden() before waking up the gk, until the gk eventually
promotes it to the Xenomai scheduling mode and downgrades this
priority back to normal, but we would pay additional latencies induced
by each aborted rescheduling attempt that may occur during the atomic
path we want to enforce.

The other way is to make sure that no in-kernel preemption of the
hardening task could occur after step 1) and until step 2) is
performed, given that we cannot currently call schedule() with
interrupts or preemption off. I'm on it.



Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,




Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)



Looks like, indeed.


At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.



No issue with the ipipe side since it only touches the tracer support code. No 
issue either at first sight with the Xeno side, aside of the trace being frozen 
twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but 
maybe it's wanted to freeze the situation before the stack is dumped

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
 Jeroen Van den Keybus wrote:
 > Hello,
 > > > I'm currently not at a level to participate in your
 discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like
 to learn
 > more from task migration as this debugging session proceeds. In
 order to do
 > so, please confirm the following statements or indicate where I
 went wrong.
 > I hope others may learn from this as well.
 > > xn_shadow_harden(): This is called whenever a Xenomai thread
 performs a
 > Linux (root domain) system call (notified by Adeos ?).
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a
 few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.

 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called "shadow" thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.

 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.

 The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or
 does it
 > merely put the thread in some Linux to-do list (I assumed the
 first case) ?

 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks
 itself as
 suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
re

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so 
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.

The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.

> Could anyone interested in this issue test the following couple of patches?

> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

> Both patches are needed to fix the issue.

> TIA,

And now, Ladies and Gentlemen, with the patches attached.

--

Philippe.

--- 2.6.15-x86/kernel/sched.c	2006-01-07 15:18:31.0 +0100
+++ 2.6.15-ipipe/kernel/sched.c	2006-01-30 15:15:27.0 +0100
@@ -2963,7 +2963,7 @@
 	 * Otherwise, whine if we are scheduling when we should not be.
 	 */
 	if (likely(!current->exit_state)) {
-		if (unlikely(in_atomic())) {
+		if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) {
 			printk(KERN_ERR "scheduling while atomic: "
 "%s/0x%08x/%d\n",
 current->comm, preempt_count(), current->pid);
@@ -2972,8 +2972,13 @@
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_ad

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so 
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.

The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.

Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,

--

Philippe.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> 
> 
> I'm currently not at a level to participate in your discussion. Although I'm

> willing to supply you with stresstests, I would nevertheless like to learn
> more from task migration as this debugging session proceeds. In order to do
> so, please confirm the following statements or indicate where I went wrong.
> I hope others may learn from this as well.
> 
> xn_shadow_harden(): This is called whenever a Xenomai thread performs a
> Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or does it
> merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be unlinked from 
the Linux runqueue before the gatekeeper processes the resumption request, 
whatever event the kernel is processing asynchronously in the meantime. This is 
the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our 
toy by stealing the CPU from the hardening thread whilst keeping it linked to the 
runqueue: upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both the Linux 
and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually promotes it to 
the Xenomai scheduling mode and downgrades this priority back to normal, but we 
would pay additional latencies induced by each aborted rescheduling attempt that 
may occur during the atomic path we want to enforce.

The other way is to make sure that no in-kernel preemption of the hardening task 
could occur after step 1) and until step 2) is performed, given that we cannot 
currently call schedule() with interrupts or preemption off. I'm on it.

suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

> And how does it terminate: is only the system call migrated or is the thread
> allowed to continue run (at a priority level equal to the Xenomai
> priority level) until it hits something of the Xenomai API (or trivially:
> explicitly go to RT using th

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT


Jan Kiszka wrote:

Philippe Gerum wrote:


Philippe Gerum wrote:


Jan Kiszka wrote:



Gilles Chanteperdrix wrote:



Jeroen Van den Keybus wrote:


Hello,


I'm currently not at a level to participate in your


discussion. Although I'm


willing to supply you with stresstests, I would nevertheless like


to learn


more from task migration as this debugging session proceeds. In


order to do


so, please confirm the following statements or indicate where I


went wrong.


I hope others may learn from this as well.


xn_shadow_harden(): This is called whenever a Xenomai thread


performs a


Linux (root domain) system call (notified by Adeos ?).


xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a
few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread


(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
wake_up_interruptible_sync() call. Is this thread actually run or


does it


merely put the thread in some Linux to-do list (I assumed the


first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks
itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be
unlinked from the Linux runqueue before the gatekeeper processes the
resumption request, whatever event the kernel is processing
asynchronously in the meantime. This is the reason why, as you already
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
CPU from the hardening thread whilst keeping it linked to the
runqueue: upon return from such preemption, the gatekeeper might have
run already,  hence the newly hardened thread ends up being seen as
runnable by both the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in
xnshadow_harden() before waking up the gk, until the gk eventually
promotes it to the Xenomai scheduling mode and downgrades this
priority back to normal, but we would pay additional latencies induced
by each aborted rescheduling attempt that may occur during the atomic
path we want to enforce.

The other way is to make sure that no in-kernel preemption of the
hardening task could occur after step 1) and until step 2) is
performed, given that we cannot currently call schedule() with
interrupts or preemption off. I'm on it.



Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,




Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)



Looks like, indeed.


At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.



No issue with the ipipe side since it only touches the tracer support code. No 
issue either at first sight with the Xeno side, aside of the trace being frozen 
twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but 
maybe it's wanted to freeze the situation before the stack is dumped

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote:
Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync();
>>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.
Err...  that's exactly the reason I have explained in my first
mail for this thread :) Blah.. I wish I was smoking something special
before so I would point that as the reason of my forgetfulness.

Actually, we could use PREEMPT_ACTIVE indeed + something else (probably
another flag) to distinguish between a case when PREEMPT_ACTIVE is set
by Linux and another case when it's set by xnshadow_harden().

xnshadow_harden()
{
struct task_struct *this_task = current;
...
xnthread_t *thread = xnshadow_thread(this_task);

if (!thread)
    return;

...
gk->thread = thread;

+ add_preempt_count(PREEMPT_ACTIVE);

// should be checked in schedule()
+ xnthread_set_flags(thread, XNATOMIC_TRANSIT);

set_current_state(TASK_INTERRUPTIBLE);
wake_up_interruptible_sync(&gk->waitq);
+ schedule();

+ sub_preempt_count(PREEMPT_ACTIVE);
...
}

Then, something like the following code should be called from schedule() : 

void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq)
{
xnthread_t *thread = xnshadow_thread(task);

if (!thread)
    return;

if (xnthread_test_flags(thread, XNATOMIC_TRANSIT))
    {
    xnthread_clear_flags(thread, XNATOMIC_TRANSIT);
    deactivate_task(task, rq);
    }
}

-

schedule.c : 
...

    switch_count = &prev->nivcsw;

    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE))         switch_count = &prev->nvcsw;

        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&

                unlikely(signal_pending(prev))
))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);
        }
    }

// removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT

+ #ifdef CONFIG_IPIPE
+ ipipe_transit_cleanup(prev, rq);
+ #endif /* CONFIG_IPIPE */
...

Not very gracefully maybe, but could work or am I missing something important?

-- 
Best regards,Dmitry Adamushko

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Dmitry Adamushko wrote:
>>> ...
> 
>> I have not checked it yet but my presupposition that something as easy as
>> :
>>> preempt_disable()
>>>
>>> wake_up_interruptible_sync();
>>> schedule();
>>>
>>> preempt_enable();
>> It's a no-go: "scheduling while atomic". One of my first attempts to
>> solve it.
> 
> 
> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.
> To this end, ACTIVE_PREEMPT is set up.
> The use of preempt_enable/disable() here is wrong.
> 
> 
> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
>> Kind of Gordian knot. :(
> 
> 
> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?
> We don't have a "preempted while atomic" message here as it seems to be a
> legal way to call schedule() with that flag being set up.

When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.

> 
> 
>>> could work... err.. and don't blame me if no, it's some one else who has
>>> written that nonsense :o)
>>>
>>> --
>>> Best regards,
>>> Dmitry Adamushko
>>>
>> Jan
>>
>>
>>
>>
> 
> 
> --
> Best regards,
> Dmitry Adamushko
> 
> 
> 
> 
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core




signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

>> ...
> I have not checked it yet but my presupposition that something as easy as :
>> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it.

My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible.
To this end, ACTIVE_PREEMPT is set up.
The use of preempt_enable/disable() here is wrong.
 The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(
Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here?
We don't have a "preempted while atomic" message here as it seems to be
a legal way to call schedule() with that flag being set up.
 >>> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

[Xenomai]
o Shadow threads have only one stack, i.e. one context. If the
real-time part is active (this includes it is blocked on some xnsynch
object or delayed), the original Linux task must NEVER EVER be
executed, even if it will immediately fall asleep again. That's
because the stack is in use by the real-time part at that time. And
this condition is checked in do_schedule_event() [1].

[Linux]
o A Linux task which has called set_current_state() will
remain in the run-queue as long as it calls schedule() on its own.
This means that it can be preempted (if CONFIG_PREEMPT is set)
between set_current_state() and schedule() and then even be resumed
again. Only the explicit call of schedule() will trigger
deactivate_task() which will in turn remove current from the
run-queue.

Ok, if this is true, let's have a look at xnshadow_harden(): After
grabbing the gatekeeper sem and putting itself in gk->thread, a task
going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
gatekeeper [2]. This does not include a Linux reschedule due to the
_sync version of wake_up_interruptible. What can happen now?

1) No interruption until we can called schedule() [3]. All fine as we
will not be removed from the run-queue before the gatekeeper starts
kicking our RT part, thus no conflict in using the thread's stack.

3) Interruption by a RT IRQ. This would just delay the path described
above, even if some RT threads get executed. Once they are finished, we
continue in xnshadow_harden() - given that the RT part does not trigger
the following case:

3) Interruption by some Linux IRQ. This may cause other threads to
become runnable as well, but the gatekeeper has the highest prio and
will therefore be the next. The problem is that the rescheduling on
Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
remove it from the Linux run-queue. And now we are in real troubles: The
gatekeeper will kick off our RT part which will take over the thread's
stack. As soon as the RT domain falls asleep and Linux takes over again,
it will continue our non-RT part as well! Actually, this seems to be the
reason for the panic in do_schedule_event(). Without
CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
TIME now, thus violating my first assumption. The system gets fatally
corrupted.

Yep, that's it. And we may not lock out the interrupts before calling
schedule to prevent that.

Well, I would be happy if someone can prove me wrong here.

The problem is that I don't see a solution because Linux does not
provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
currently considering a hack to remove the migrating Linux thread
manually from the run-queue, but this could easily break the Linux
scheduler.

Maybe the best way would be to provide atomic wakeup-and-schedule
support into the Adeos patch for Linux tasks; previous attempts to fix
this by circumventing the potential for preemption from outside of the
scheduler code have all failed, and this bug is uselessly lingering for
that reason.

Having slept on this, I'm going to add a simple extension to the Linux scheduler
available from Adeos, in order to get an atomic/unpreemptable path from the
statement when the current task's state is changed for suspension (e.g.
TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic
section, which looks like the sanest way to solve this issue, i.e. without gory
hackery all over the place. Patch will follow later for testing this approach.

Jan

PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.

[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573

[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461

[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Philippe.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
 Jeroen Van den Keybus wrote:
 > Hello,
 > > > I'm currently not at a level to participate in your
 discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like
 to learn
 > more from task migration as this debugging session proceeds. In
 order to do
 > so, please confirm the following statements or indicate where I
 went wrong.
 > I hope others may learn from this as well.
 > > xn_shadow_harden(): This is called whenever a Xenomai thread
 performs a
 > Linux (root domain) system call (notified by Adeos ?).
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a
 few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.

 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called "shadow" thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.

 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.

 The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or
 does it
 > merely put the thread in some Linux to-do list (I assumed the
 first case) ?

 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks
 itself as
 suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
re

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so 
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.

The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.

Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,

--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so 
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.

The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.

> Could anyone interested in this issue test the following couple of patches?

> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

> Both patches are needed to fix the issue.

> TIA,

And now, Ladies and Gentlemen, with the patches attached.

--

Philippe.

--- 2.6.15-x86/kernel/sched.c	2006-01-07 15:18:31.0 +0100
+++ 2.6.15-ipipe/kernel/sched.c	2006-01-30 15:15:27.0 +0100
@@ -2963,7 +2963,7 @@
 	 * Otherwise, whine if we are scheduling when we should not be.
 	 */
 	if (likely(!current->exit_state)) {
-		if (unlikely(in_atomic())) {
+		if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) {
 			printk(KERN_ERR "scheduling while atomic: "
 "%s/0x%08x/%d\n",
 current->comm, preempt_count(), current->pid);
@@ -2972,8 +2972,13 @@
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_ad

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

Jeroen Van den Keybus wrote:
> Hello,
> 
> 
> I'm currently not at a level to participate in your discussion. Although I'm

> willing to supply you with stresstests, I would nevertheless like to learn
> more from task migration as this debugging session proceeds. In order to do
> so, please confirm the following statements or indicate where I went wrong.
> I hope others may learn from this as well.
> 
> xn_shadow_harden(): This is called whenever a Xenomai thread performs a
> Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or does it
> merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

You could not guarantee the following execution sequence doing so either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be unlinked from 
the Linux runqueue before the gatekeeper processes the resumption request, 
whatever event the kernel is processing asynchronously in the meantime. This is 
the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our 
toy by stealing the CPU from the hardening thread whilst keeping it linked to the 
runqueue: upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both the Linux 
and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually promotes it to 
the Xenomai scheduling mode and downgrades this priority back to normal, but we 
would pay additional latencies induced by each aborted rescheduling attempt that 
may occur during the atomic path we want to enforce.

The other way is to make sure that no in-kernel preemption of the hardening task 
could occur after step 1) and until step 2) is performed, given that we cannot 
currently call schedule() with interrupts or preemption off. I'm on it.

suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

> And how does it terminate: is only the system call migrated or is the thread
> allowed to continue run (at a priority level equal to the Xenomai
> priority level) until it hits something of the Xenomai API (or trivially:
> explicitly go to RT using th

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote:
Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync();
>>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.
Err...  that's exactly the reason I have explained in my first
mail for this thread :) Blah.. I wish I was smoking something special
before so I would point that as the reason of my forgetfulness.

Actually, we could use PREEMPT_ACTIVE indeed + something else (probably
another flag) to distinguish between a case when PREEMPT_ACTIVE is set
by Linux and another case when it's set by xnshadow_harden().

xnshadow_harden()
{
struct task_struct *this_task = current;
...
xnthread_t *thread = xnshadow_thread(this_task);

if (!thread)
    return;

...
gk->thread = thread;

+ add_preempt_count(PREEMPT_ACTIVE);

// should be checked in schedule()
+ xnthread_set_flags(thread, XNATOMIC_TRANSIT);

set_current_state(TASK_INTERRUPTIBLE);
wake_up_interruptible_sync(&gk->waitq);
+ schedule();

+ sub_preempt_count(PREEMPT_ACTIVE);
...
}

Then, something like the following code should be called from schedule() : 

void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq)
{
xnthread_t *thread = xnshadow_thread(task);

if (!thread)
    return;

if (xnthread_test_flags(thread, XNATOMIC_TRANSIT))
    {
    xnthread_clear_flags(thread, XNATOMIC_TRANSIT);
    deactivate_task(task, rq);
    }
}

-

schedule.c : 
...

    switch_count = &prev->nivcsw;

    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE))         switch_count = &prev->nvcsw;

        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&

                unlikely(signal_pending(prev))
))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);
        }
    }

// removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT

+ #ifdef CONFIG_IPIPE
+ ipipe_transit_cleanup(prev, rq);
+ #endif /* CONFIG_IPIPE */
...

Not very gracefully maybe, but could work or am I missing something important?

-- 
Best regards,Dmitry Adamushko


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Dmitry Adamushko wrote:
>>> ...
> 
>> I have not checked it yet but my presupposition that something as easy as
>> :
>>> preempt_disable()
>>>
>>> wake_up_interruptible_sync();
>>> schedule();
>>>
>>> preempt_enable();
>> It's a no-go: "scheduling while atomic". One of my first attempts to
>> solve it.
> 
> 
> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.
> To this end, ACTIVE_PREEMPT is set up.
> The use of preempt_enable/disable() here is wrong.
> 
> 
> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
>> Kind of Gordian knot. :(
> 
> 
> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?
> We don't have a "preempted while atomic" message here as it seems to be a
> legal way to call schedule() with that flag being set up.

When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.

> 
> 
>>> could work... err.. and don't blame me if no, it's some one else who has
>>> written that nonsense :o)
>>>
>>> --
>>> Best regards,
>>> Dmitry Adamushko
>>>
>> Jan
>>
>>
>>
>>
> 
> 
> --
> Best regards,
> Dmitry Adamushko
> 
> 
> 
> 
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

>> ...
> I have not checked it yet but my presupposition that something as easy as :
>> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it.

My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible.
To this end, ACTIVE_PREEMPT is set up.
The use of preempt_enable/disable() here is wrong.
 The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(
Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here?
We don't have a "preempted while atomic" message here as it seems to be
a legal way to call schedule() with that flag being set up.
 >>> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Philippe Gerum wrote:

Jan Kiszka wrote:

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

Yep, that's it. And we may not lock out the interrupts before calling
schedule to prevent that.

Well, I would be happy if someone can prove me wrong here.

Jan

[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573

[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461

[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Philippe.

___

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Jan Kiszka wrote:

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

Yep, that's it. And we may not lock out the interrupts before calling schedule to
prevent that.

Well, I would be happy if someone can prove me wrong here.

Maybe the best way would be to provide atomic wakeup-and-schedule support into the
Adeos patch for Linux tasks; previous attempts to fix this by circumventing the
potential for preemption from outside of the scheduler code have all failed, and
this bug is uselessly lingering for that reason.

Jan

[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573
[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461
[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Philippe.

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-29 Thread Philippe Gerum

Jan Kiszka wrote:

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

Yep, that's it. And we may not lock out the interrupts before calling schedule to
prevent that.

Well, I would be happy if someone can prove me wrong here.

Maybe the best way would be to provide atomic wakeup-and-schedule support into the
Adeos patch for Linux tasks; previous attempts to fix this by circumventing the
potential for preemption from outside of the scheduler code have all failed, and
this bug is uselessly lingering for that reason.

Jan

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-24 Thread Jan Kiszka

Dmitry Adamushko wrote:
> On 23/01/06, Gilles Chanteperdrix <[EMAIL PROTECTED]> wrote:
>> Jeroen Van den Keybus wrote:
>>> Hello,
> 
> 
> 
>> [ skip-skip-skip ]
>>
> 
> 
>> Since in xnshadow_harden, the running thread marks itself as suspended
>> before running wake_up_interruptible_sync, the gatekeeper will run when
>> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
>> configuration. In the non-preempt case, the current thread will be
>> suspended and the gatekeeper will run when schedule() is explicitely
>> called in xnshadow_harden(). In the preempt case, schedule gets called
>> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
> 
> In fact, no.
> 
> wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's
> why it's "sync" actually.
> 
> Only if the need_resched was already set before calling
> wake_up_interruptible_sync(), then yes.
> 
> The secuence is as follows :
> 
> wake_up_interruptible_sync ---> wake_up_sync ---> wake_up_common(...,
> sync=1, ...) ---> ... ---> try_to_wake_up(..., sync=1)
> 
> Look at the end of  try_to_wake_up() to see when it calls resched_task().
> The comment there speaks for itself.
> 
> So let's suppose need_resched == 0 (it's per-task of course).
> As a result of wake_up_interruptible_sync() the new task is added to the
> current active run-queue but need_resched remains to be unset in the hope
> that the waker will call schedule() on its own soon.
> 
> I have CONFIG_PREEMPT set on my machine but I have never encountered a bug
> described by Jan.
> 
> The catalyst of the problem,  I guess, is  that some IRQ interrupts a task
> between wake_up_interruptible_sync() and schedule() and its ISR, in turn,
> wakes up another task which prio is higher than the one of our waker (as a
> result, the need_resched flag is set). And now, rescheduling occurs on
> return from irq handling code (ret_from_intr -> ...-> preempt_irq_schedule()
> -> schedule()).

Yes, this is exactly what happened. I unfortunately have not saved a
related trace I took with the extended ipipe-tracer (the one I sent ends
too early), but they showed a preemption right after the wake_up, first
by one of the other real-time threads in Jeroen's scenario, and then, as
a result of some xnshadow_relax() of that thread, a Linux
preempt_schedule to the gatekeeper. We do not see this bug that often as
it requires a specific load and it must hit a really small race window.

> 
> Some events should coincide, yep. But I guess that problem does not occur
> every time?
> 
> I have not checked it yet but my presupposition that something as easy as :
> 
> preempt_disable()
> 
> wake_up_interruptible_sync();
> schedule();
> 
> preempt_enable();

It's a no-go: "scheduling while atomic". One of my first attempts to
solve it.

The only way to enter schedule() without being preemptible is via
ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(

> 
> 
> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)
> 
> --
> Best regards,
> Dmitry Adamushko
> 

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-24 Thread Dmitry Adamushko

On 23/01/06, Gilles Chanteperdrix <[EMAIL PROTECTED]> wrote:
Jeroen Van den Keybus wrote: > Hello, 
[ skip-skip-skip ]
 
Since in xnshadow_harden, the running thread marks itself as suspendedbefore running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*configuration. In the non-preempt case, the current thread will besuspended and the gatekeeper will run when schedule() is explicitelycalled in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().
In fact, no.

wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's why it's "sync" actually.

Only if the need_resched was already set before calling wake_up_interruptible_sync(), then yes.

The secuence is as follows :

wake_up_interruptible_sync ---> wake_up_sync --->
wake_up_common(..., sync=1, ...) ---> ... --->
try_to_wake_up(..., sync=1)

Look at the end of  try_to_wake_up() to see when it calls resched_task(). The comment there speaks for itself.

So let's suppose need_resched == 0 (it's per-task of course).
As a result of wake_up_interruptible_sync() the new task is added to
the current active run-queue but need_resched remains to be unset in
the hope that the waker will call schedule() on its own soon.

I have CONFIG_PREEMPT set on my machine but I have never encountered a bug described by Jan.

The catalyst of the problem,  I guess, is  that some IRQ
interrupts a task between wake_up_interruptible_sync() and schedule()
and its ISR, in turn, wakes up another task which prio is higher than
the one of our waker (as a result, the need_resched flag is set). And
now, rescheduling occurs on return from irq handling code
(ret_from_intr -> ...-> preempt_irq_schedule() -> schedule()).

Some events should coincide, yep. But I guess that problem does not occur every time?

I have not checked it yet but my presupposition that something as easy as :

preempt_disable() 

wake_up_interruptible_sync();
schedule();

preempt_enable();


could work... err.. and don't blame  me if no, it's some one else who has written that nonsense :o)
-- Best regards,Dmitry Adamushko

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-24 Thread Dmitry Adamushko

On 23/01/06, Gilles Chanteperdrix <[EMAIL PROTECTED]> wrote:
Jeroen Van den Keybus wrote: > Hello, 
[ skip-skip-skip ]
 
Since in xnshadow_harden, the running thread marks itself as suspendedbefore running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*configuration. In the non-preempt case, the current thread will besuspended and the gatekeeper will run when schedule() is explicitelycalled in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().
In fact, no.

wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's why it's "sync" actually.

Only if the need_resched was already set before calling wake_up_interruptible_sync(), then yes.

The secuence is as follows :

wake_up_interruptible_sync ---> wake_up_sync --->
wake_up_common(..., sync=1, ...) ---> ... --->
try_to_wake_up(..., sync=1)

Look at the end of  try_to_wake_up() to see when it calls resched_task(). The comment there speaks for itself.

So let's suppose need_resched == 0 (it's per-task of course).
As a result of wake_up_interruptible_sync() the new task is added to
the current active run-queue but need_resched remains to be unset in
the hope that the waker will call schedule() on its own soon.

I have CONFIG_PREEMPT set on my machine but I have never encountered a bug described by Jan.

The catalyst of the problem,  I guess, is  that some IRQ
interrupts a task between wake_up_interruptible_sync() and schedule()
and its ISR, in turn, wakes up another task which prio is higher than
the one of our waker (as a result, the need_resched flag is set). And
now, rescheduling occurs on return from irq handling code
(ret_from_intr -> ...-> preempt_irq_schedule() -> schedule()).

Some events should coincide, yep. But I guess that problem does not occur every time?

I have not checked it yet but my presupposition that something as easy as :

preempt_disable() 

wake_up_interruptible_sync();
schedule();

preempt_enable();


could work... err.. and don't blame  me if no, it's some one else who has written that nonsense :o)
-- Best regards,Dmitry Adamushko
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-24 Thread Jan Kiszka

Dmitry Adamushko wrote:
> On 23/01/06, Gilles Chanteperdrix <[EMAIL PROTECTED]> wrote:
>> Jeroen Van den Keybus wrote:
>>> Hello,
> 
> 
> 
>> [ skip-skip-skip ]
>>
> 
> 
>> Since in xnshadow_harden, the running thread marks itself as suspended
>> before running wake_up_interruptible_sync, the gatekeeper will run when
>> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
>> configuration. In the non-preempt case, the current thread will be
>> suspended and the gatekeeper will run when schedule() is explicitely
>> called in xnshadow_harden(). In the preempt case, schedule gets called
>> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
> 
> In fact, no.
> 
> wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's
> why it's "sync" actually.
> 
> Only if the need_resched was already set before calling
> wake_up_interruptible_sync(), then yes.
> 
> The secuence is as follows :
> 
> wake_up_interruptible_sync ---> wake_up_sync ---> wake_up_common(...,
> sync=1, ...) ---> ... ---> try_to_wake_up(..., sync=1)
> 
> Look at the end of  try_to_wake_up() to see when it calls resched_task().
> The comment there speaks for itself.
> 
> So let's suppose need_resched == 0 (it's per-task of course).
> As a result of wake_up_interruptible_sync() the new task is added to the
> current active run-queue but need_resched remains to be unset in the hope
> that the waker will call schedule() on its own soon.
> 
> I have CONFIG_PREEMPT set on my machine but I have never encountered a bug
> described by Jan.
> 
> The catalyst of the problem,  I guess, is  that some IRQ interrupts a task
> between wake_up_interruptible_sync() and schedule() and its ISR, in turn,
> wakes up another task which prio is higher than the one of our waker (as a
> result, the need_resched flag is set). And now, rescheduling occurs on
> return from irq handling code (ret_from_intr -> ...-> preempt_irq_schedule()
> -> schedule()).

Yes, this is exactly what happened. I unfortunately have not saved a
related trace I took with the extended ipipe-tracer (the one I sent ends
too early), but they showed a preemption right after the wake_up, first
by one of the other real-time threads in Jeroen's scenario, and then, as
a result of some xnshadow_relax() of that thread, a Linux
preempt_schedule to the gatekeeper. We do not see this bug that often as
it requires a specific load and it must hit a really small race window.

> 
> Some events should coincide, yep. But I guess that problem does not occur
> every time?
> 
> I have not checked it yet but my presupposition that something as easy as :
> 
> preempt_disable()
> 
> wake_up_interruptible_sync();
> schedule();
> 
> preempt_enable();

It's a no-go: "scheduling while atomic". One of my first attempts to
solve it.

The only way to enter schedule() without being preemptible is via
ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(

> 
> 
> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)
> 
> --
> Best regards,
> Dmitry Adamushko
> 

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jan Kiszka

Gilles Chanteperdrix wrote:
> Jeroen Van den Keybus wrote:
>  > Hello,
>  > 
>  > 
>  > I'm currently not at a level to participate in your discussion. Although 
> I'm
>  > willing to supply you with stresstests, I would nevertheless like to learn
>  > more from task migration as this debugging session proceeds. In order to do
>  > so, please confirm the following statements or indicate where I went wrong.
>  > I hope others may learn from this as well.
>  > 
>  > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
>  > Linux (root domain) system call (notified by Adeos ?). 
> 
> xnshadow_harden() is called whenever a thread running in secondary
> mode (that is, running as a regular Linux thread, handled by Linux
> scheduler) is switching to primary mode (where it will run as a Xenomai
> thread, handled by Xenomai scheduler). Migrations occur for some system
> calls. More precisely, Xenomai skin system calls tables associates a few
> flags with each system call, and some of these flags cause migration of
> the caller when it issues the system call.
> 
> Each Xenomai user-space thread has two contexts, a regular Linux
> thread context, and a Xenomai thread called "shadow" thread. Both
> contexts share the same stack and program counter, so that at any time,
> at least one of the two contexts is seen as suspended by the scheduler
> which handles it.
> 
> Before xnshadow_harden is called, the Linux thread is running, and its
> shadow is seen in suspended state with XNRELAX bit by Xenomai
> scheduler. After xnshadow_harden, the Linux context is seen suspended
> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
> running by Xenomai scheduler.
> 
> The migrating thread
>  > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>  > wake_up_interruptible_sync() call. Is this thread actually run or does it
>  > merely put the thread in some Linux to-do list (I assumed the first case) ?
> 
> Here, I am not sure, but it seems that when calling
> wake_up_interruptible_sync the woken up task is put in the current CPU
> runqueue, and this task (i.e. the gatekeeper), will not run until the
> current thread (i.e. the thread running xnshadow_harden) marks itself as
> suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

> suspended is not needed, since the gatekeeper may have a high priority,
> and calling schedule() is enough. In any case, the waken up thread does
> not seem to be run immediately, so this rather look like the second
> case.
> 
> Since in xnshadow_harden, the running thread marks itself as suspended
> before running wake_up_interruptible_sync, the gatekeeper will run when
> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
> configuration. In the non-preempt case, the current thread will be
> suspended and the gatekeeper will run when schedule() is explicitely
> called in xnshadow_harden(). In the preempt case, schedule gets called
> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
>  > And how does it terminate: is only the system call migrated or is the 
> thread
>  > allowed to continue run (at a priority level equal to the Xenomai
>  > priority level) until it hits something of the Xenomai API (or trivially:
>  > explicitly go to RT using the API) ? 
> 
> I am not sure I follow you here. The usual case is that the thread will
> remain in primary mode after the system call, but I think a system call
> flag allow the other behaviour. So, if I understand the question
> correctly, the answer is that it depends on the system call.
> 
>  > In that case, I expect the nRT thread to terminate with a schedule()
>  > call in the Xeno OS API code which deactivates the task so that it
>  > won't ever run in Linux context anymore. A top priority gatekeeper is
>  > in place as a software hook to catch Linux's attention right after
>  > that schedule(), which might otherwise schedule something else (and
>  > leave only interrupts for Xenomai to come back to life again).
> 
> Here is the way I understand it. We have two threads, or rather two
> "views" of the same thread, with each its state. Switching from
> secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
> means changing the two states at once. Since we can not do that, we need
> an intermediate state. Since the intermediate state can not be the state
> where the two threads are running (they share the same stack and
> program counter), the intermediate state is a state where the two
> threads are suspended, but another context needs running, it is the
> gatekeeper.
> 
>  >  I have
>  > the impression that I c

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Gilles Chanteperdrix

Jeroen Van den Keybus wrote:
 > Hello,
 > 
 > 
 > I'm currently not at a level to participate in your discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like to learn
 > more from task migration as this debugging session proceeds. In order to do
 > so, please confirm the following statements or indicate where I went wrong.
 > I hope others may learn from this as well.
 > 
 > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
 > Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or does it
 > merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as
suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

 > And how does it terminate: is only the system call migrated or is the thread
 > allowed to continue run (at a priority level equal to the Xenomai
 > priority level) until it hits something of the Xenomai API (or trivially:
 > explicitly go to RT using the API) ? 

I am not sure I follow you here. The usual case is that the thread will
remain in primary mode after the system call, but I think a system call
flag allow the other behaviour. So, if I understand the question
correctly, the answer is that it depends on the system call.

 > In that case, I expect the nRT thread to terminate with a schedule()
 > call in the Xeno OS API code which deactivates the task so that it
 > won't ever run in Linux context anymore. A top priority gatekeeper is
 > in place as a software hook to catch Linux's attention right after
 > that schedule(), which might otherwise schedule something else (and
 > leave only interrupts for Xenomai to come back to life again).

Here is the way I understand it. We have two threads, or rather two
"views" of the same thread, with each its state. Switching from
secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
means changing the two states at once. Since we can not do that, we need
an intermediate state. Since the intermediate state can not be the state
where the two threads are running (they share the same stack and
program counter), the intermediate state is a state where the two
threads are suspended, but another context needs running, it is the
gatekeeper.

 >  I have
 > the impression that I cannot see this gatekeeper, nor the (n)RT
 > threads using the ps command ?

The gatekeeper and Xenomai user-space threads are regular Linux
contexts, you can seen them using the ps command.

 > 
 > Is it correct to state that the current preemption issue is due to the
 > gatekeeper being invoked too soon ? Could someone knowing more about the
 > migration technology explain what exactly goes wrong ?

Jan seems to have found such an issue here. I am not sure I understood
what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jeroen Van den Keybus

Hello,
I'm currently not at a level to participate in your discussion. Although I'm willing to supply you with stresstests, I would nevertheless like to learn more from task migration as this debugging session proceeds. In order to do so, please confirm the following statements or indicate where I went wrong. I hope others may learn from this as well.

xn_shadow_harden(): This is called whenever a Xenomai thread performs a Linux (root domain) system call (notified by Adeos ?). The migrating thread (nRT) is marked INTERRUPTIBLE and run by the Linux kernel wake_up_interruptible_sync() call. Is this thread actually run or does it merely put the thread in some Linux to-do list (I assumed the first case) ? And how does it terminate: is only the system call migrated or is the thread allowed to continue run (at a priority level equal to the Xenomai priority level) until it hits something of the Xenomai API (or trivially: explicitly go to RT using the API) ? In that case, I expect the nRT thread to terminate with a schedule() call in the Xeno OS API code which deactivates the task so that it won't ever run in Linux context anymore. A top priority gatekeeper is in place as a software hook to catch Linux's attention right after that schedule(), which might otherwise schedule something else (and leave only interrupts for Xenomai to come back to life again). I have the impression that I cannot see this gatekeeper, nor the (n)RT threads using the ps command ?

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jan Kiszka

Gilles Chanteperdrix wrote:
> Jeroen Van den Keybus wrote:
>  > Hello,
>  > 
>  > 
>  > I'm currently not at a level to participate in your discussion. Although 
> I'm
>  > willing to supply you with stresstests, I would nevertheless like to learn
>  > more from task migration as this debugging session proceeds. In order to do
>  > so, please confirm the following statements or indicate where I went wrong.
>  > I hope others may learn from this as well.
>  > 
>  > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
>  > Linux (root domain) system call (notified by Adeos ?). 
> 
> xnshadow_harden() is called whenever a thread running in secondary
> mode (that is, running as a regular Linux thread, handled by Linux
> scheduler) is switching to primary mode (where it will run as a Xenomai
> thread, handled by Xenomai scheduler). Migrations occur for some system
> calls. More precisely, Xenomai skin system calls tables associates a few
> flags with each system call, and some of these flags cause migration of
> the caller when it issues the system call.
> 
> Each Xenomai user-space thread has two contexts, a regular Linux
> thread context, and a Xenomai thread called "shadow" thread. Both
> contexts share the same stack and program counter, so that at any time,
> at least one of the two contexts is seen as suspended by the scheduler
> which handles it.
> 
> Before xnshadow_harden is called, the Linux thread is running, and its
> shadow is seen in suspended state with XNRELAX bit by Xenomai
> scheduler. After xnshadow_harden, the Linux context is seen suspended
> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
> running by Xenomai scheduler.
> 
> The migrating thread
>  > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>  > wake_up_interruptible_sync() call. Is this thread actually run or does it
>  > merely put the thread in some Linux to-do list (I assumed the first case) ?
> 
> Here, I am not sure, but it seems that when calling
> wake_up_interruptible_sync the woken up task is put in the current CPU
> runqueue, and this task (i.e. the gatekeeper), will not run until the
> current thread (i.e. the thread running xnshadow_harden) marks itself as
> suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

> suspended is not needed, since the gatekeeper may have a high priority,
> and calling schedule() is enough. In any case, the waken up thread does
> not seem to be run immediately, so this rather look like the second
> case.
> 
> Since in xnshadow_harden, the running thread marks itself as suspended
> before running wake_up_interruptible_sync, the gatekeeper will run when
> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
> configuration. In the non-preempt case, the current thread will be
> suspended and the gatekeeper will run when schedule() is explicitely
> called in xnshadow_harden(). In the preempt case, schedule gets called
> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
>  > And how does it terminate: is only the system call migrated or is the 
> thread
>  > allowed to continue run (at a priority level equal to the Xenomai
>  > priority level) until it hits something of the Xenomai API (or trivially:
>  > explicitly go to RT using the API) ? 
> 
> I am not sure I follow you here. The usual case is that the thread will
> remain in primary mode after the system call, but I think a system call
> flag allow the other behaviour. So, if I understand the question
> correctly, the answer is that it depends on the system call.
> 
>  > In that case, I expect the nRT thread to terminate with a schedule()
>  > call in the Xeno OS API code which deactivates the task so that it
>  > won't ever run in Linux context anymore. A top priority gatekeeper is
>  > in place as a software hook to catch Linux's attention right after
>  > that schedule(), which might otherwise schedule something else (and
>  > leave only interrupts for Xenomai to come back to life again).
> 
> Here is the way I understand it. We have two threads, or rather two
> "views" of the same thread, with each its state. Switching from
> secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
> means changing the two states at once. Since we can not do that, we need
> an intermediate state. Since the intermediate state can not be the state
> where the two threads are running (they share the same stack and
> program counter), the intermediate state is a state where the two
> threads are suspended, but another context needs running, it is the
> gatekeeper.
> 
>  >  I have
>  > the impression that I c

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Gilles Chanteperdrix

Jeroen Van den Keybus wrote:
 > Hello,
 > 
 > 
 > I'm currently not at a level to participate in your discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like to learn
 > more from task migration as this debugging session proceeds. In order to do
 > so, please confirm the following statements or indicate where I went wrong.
 > I hope others may learn from this as well.
 > 
 > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
 > Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or does it
 > merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as
suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

 > And how does it terminate: is only the system call migrated or is the thread
 > allowed to continue run (at a priority level equal to the Xenomai
 > priority level) until it hits something of the Xenomai API (or trivially:
 > explicitly go to RT using the API) ? 

I am not sure I follow you here. The usual case is that the thread will
remain in primary mode after the system call, but I think a system call
flag allow the other behaviour. So, if I understand the question
correctly, the answer is that it depends on the system call.

 > In that case, I expect the nRT thread to terminate with a schedule()
 > call in the Xeno OS API code which deactivates the task so that it
 > won't ever run in Linux context anymore. A top priority gatekeeper is
 > in place as a software hook to catch Linux's attention right after
 > that schedule(), which might otherwise schedule something else (and
 > leave only interrupts for Xenomai to come back to life again).

Here is the way I understand it. We have two threads, or rather two
"views" of the same thread, with each its state. Switching from
secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
means changing the two states at once. Since we can not do that, we need
an intermediate state. Since the intermediate state can not be the state
where the two threads are running (they share the same stack and
program counter), the intermediate state is a state where the two
threads are suspended, but another context needs running, it is the
gatekeeper.

 >  I have
 > the impression that I cannot see this gatekeeper, nor the (n)RT
 > threads using the ps command ?

The gatekeeper and Xenomai user-space threads are regular Linux
contexts, you can seen them using the ps command.

 > 
 > Is it correct to state that the current preemption issue is due to the
 > gatekeeper being invoked too soon ? Could someone knowing more about the
 > migration technology explain what exactly goes wrong ?

Jan seems to have found such an issue here. I am not sure I understood
what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-23 Thread Jeroen Van den Keybus

Is it correct to state that the current preemption issue is due to the gatekeeper being invoked too soon ? Could someone knowing more about the migration technology explain what exactly goes wrong ?
Thanks,
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-22 Thread Dmitry Adamushko

> Hi,
> 
> well, if I'm not totally wrong, we have a design problem in the
> RT-thread hardening path. I dug into the crash Jeroen reported > and I'm
> quite sure that this is the reason.
> 
> So that's the bad news. The good one is that we can at least 
> work around
> it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
> it's a 2.6-only issue).
> 
> 
> But let's start with two assumptions my further analysis is 
> based on:
> 
> [Xenomai]
>  o Shadow threads have only one stack, i.e. one context. If the
>   real-time part is active (this includes it is blocked on some 
> xnsynch object or delayed), the original Linux task must 
> NEVER EVER be
>   executed, even if it will immediately fall asleep again. That's
>   because the stack is in use by the real-time part at
that time. > And this condition is checked in do_schedule_event()
[1].
> 
> [Linux]
>  o A Linux task which has called 
> set_current_state() will
>  remain in the run-queue as long as it calls schedule() on its 
> own.

Yes, you are right.

Let's keep in mind the following piece of code.

[*]

[code]    from sched.c::schedule()
...
    switch_count = &prev->nivcsw;
    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE)) {    <--- MUST BE TRUE FOR A
TASK TO BE REMOVED
        switch_count = &prev->nvcsw;
        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
                unlikely(signal_pending(prev
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);       
    <--- removing from the active queue
        }
    }
...
[/code]

On executing schedule(), a "current" (prev = current) task is not removed from the active queue in one of the following cases:

[1] prev->state == 0, i.e. == TASK_RUNNING (since #define TASK_RUNNING  0);

[2] add_preempt_count(PREEMPT_ACTIVE) has been called before calling schedule() from the task's context
    i.e. from the context of the "current" task (prev = current in schedule());

[3] there is a pending signal for the "current" task.

Keeping that in mind too, let's take a look at what happens in your "crash"-scenario.

> ...
>
> 3) Interruption by some Linux IRQ. This may cause other 
> threads to become runnable as well, but the gatekeeper has 
> the highest prio and will therefore be the next. The problem is 
> that the rescheduling on Linux IRQ exit will PREEMPT our task >
in xnshadow_harden(), it will NOT remove it from the Linux 
> run-queue.

Right. But what actually happens is the following sequence of calls:

ret_from_intr ---> resume_kernel ---> need_resched --->
sched.c::preempt_schedule_irq() ---> schedule()   
    (**)

As a result, schedule() is called indeed but it does not execute the [*] code -
the "current" task is not removed from the active queue.
The reason is [2] (from the list above) and that's done in preempt_schedule_irq().

> And now we are in real troubles: The
> gatekeeper will kick off our RT part which will take over the 
> thread's stack. As soon as the RT domain falls asleep and 
> Linux takes over again, it will continue our non-RT part as well!
> Actually, this seems to be the reason for the panic in 
> do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG 
> and this check, we will run both parts AT THE SAME
> TIME now, thus violating my first assumption. The system gets > fatally corrupted.
>
> Well, I would be happy if someone can prove me wrong here.

I'm afraid you are right.


> The problem is that I don't see a solution because Linux does 
> not provide an atomic wake-up + schedule-out under 
> CONFIG_PREEMPT. I'm
> currently considering a hack to remove the migrating Linux 
> thread manually from the run-queue, but this could easily break > the Linux scheduler.

I have a "stupid" idea on top of my head but I'd prefer to test it on
my own first so not to look as a complete idiot if it's totally wrong.
Err... it's difficult to look more an idiot than I'm already? :o)


> Jan

-- Best regards,Dmitry Adamushko

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-22 Thread Dmitry Adamushko

> Hi,
> 
> well, if I'm not totally wrong, we have a design problem in the
> RT-thread hardening path. I dug into the crash Jeroen reported > and I'm
> quite sure that this is the reason.
> 
> So that's the bad news. The good one is that we can at least 
> work around
> it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
> it's a 2.6-only issue).
> 
> 
> But let's start with two assumptions my further analysis is 
> based on:
> 
> [Xenomai]
>  o Shadow threads have only one stack, i.e. one context. If the
>   real-time part is active (this includes it is blocked on some 
> xnsynch object or delayed), the original Linux task must 
> NEVER EVER be
>   executed, even if it will immediately fall asleep again. That's
>   because the stack is in use by the real-time part at
that time. > And this condition is checked in do_schedule_event()
[1].
> 
> [Linux]
>  o A Linux task which has called 
> set_current_state() will
>  remain in the run-queue as long as it calls schedule() on its 
> own.

Yes, you are right.

Let's keep in mind the following piece of code.

[*]

[code]    from sched.c::schedule()
...
    switch_count = &prev->nivcsw;
    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE)) {    <--- MUST BE TRUE FOR A
TASK TO BE REMOVED
        switch_count = &prev->nvcsw;
        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
                unlikely(signal_pending(prev
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);       
    <--- removing from the active queue
        }
    }
...
[/code]

On executing schedule(), a "current" (prev = current) task is not removed from the active queue in one of the following cases:

[1] prev->state == 0, i.e. == TASK_RUNNING (since #define TASK_RUNNING  0);

[2] add_preempt_count(PREEMPT_ACTIVE) has been called before calling schedule() from the task's context
    i.e. from the context of the "current" task (prev = current in schedule());

[3] there is a pending signal for the "current" task.

Keeping that in mind too, let's take a look at what happens in your "crash"-scenario.

> ...
>
> 3) Interruption by some Linux IRQ. This may cause other 
> threads to become runnable as well, but the gatekeeper has 
> the highest prio and will therefore be the next. The problem is 
> that the rescheduling on Linux IRQ exit will PREEMPT our task >
in xnshadow_harden(), it will NOT remove it from the Linux 
> run-queue.

Right. But what actually happens is the following sequence of calls:

ret_from_intr ---> resume_kernel ---> need_resched --->
sched.c::preempt_schedule_irq() ---> schedule()   
    (**)

As a result, schedule() is called indeed but it does not execute the [*] code -
the "current" task is not removed from the active queue.
The reason is [2] (from the list above) and that's done in preempt_schedule_irq().

> And now we are in real troubles: The
> gatekeeper will kick off our RT part which will take over the 
> thread's stack. As soon as the RT domain falls asleep and 
> Linux takes over again, it will continue our non-RT part as well!
> Actually, this seems to be the reason for the panic in 
> do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG 
> and this check, we will run both parts AT THE SAME
> TIME now, thus violating my first assumption. The system gets > fatally corrupted.
>
> Well, I would be happy if someone can prove me wrong here.

I'm afraid you are right.


> The problem is that I don't see a solution because Linux does 
> not provide an atomic wake-up + schedule-out under 
> CONFIG_PREEMPT. I'm
> currently considering a hack to remove the migrating Linux 
> thread manually from the run-queue, but this could easily break > the Linux scheduler.

I have a "stupid" idea on top of my head but I'd prefer to test it on
my own first so not to look as a complete idiot if it's totally wrong.
Err... it's difficult to look more an idiot than I'm already? :o)


> Jan

-- Best regards,Dmitry Adamushko
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Hannes Mayer wrote:
> Jan Kiszka wrote:
> [...]
>> PS: Out of curiosity I also checked RTAI's migration mechanism in this
>> regard. It's similar except for the fact that it does the gatekeeper's
>> work in the Linux scheduler's tail (i.e. after the next context switch).
>> And RTAI seems it suffers from the very same race. So this is either a
>> fundamental issue - or I'm fundamentally wrong.
> 
> 
> Well, most of the stuff you guys talk about in this thread is still
> beyond my level, but out of curiosity I ported the SEM example to
> RTAI (see attached sem.c)
> I couldn't come up with something similar to rt_sem_inquire and
> rt_task_inquire in RTAI (in "void output(char c)")...
> Anyway, unless I haven't missed something else important while
> porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).
> 

My claim on the RTAI race is based on quick code analysis and a bit
outdated information about its core design. I haven't tried any code to
crash it, and I guess it will take a slightly different test design to
trigger the issue there. As soon as someone could follow my reasoning
and confirm it (don't mind that you did not understand it, I hadn't
either two days ago, this is quite heavy stuff), I will inform Paolo
about this potential problem.

Jan

signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-21 Thread Hannes Mayer


Jan Kiszka wrote:
[...]

PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.



Well, most of the stuff you guys talk about in this thread is still
beyond my level, but out of curiosity I ported the SEM example to
RTAI (see attached sem.c)
I couldn't come up with something similar to rt_sem_inquire and
rt_task_inquire in RTAI (in "void output(char c)")...
Anyway, unless I haven't missed something else important while
porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).

Best regards,
Hannes.
/* TEST_SEM.C ported to RTAI3.3*/

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#include 
#include 
#include 

int fd, err;
int t0end = 1;
int t1end = 1;

SEM *s, *m;
float tmax = 1.0e9;

#define CHECK(arg) check(arg, __LINE__)

int check(int r, int n)
{
if (r != 0)
fprintf(stderr, "L%d: %s.\n", n, strerror(-r));
return(r);
}

void output(char c) {
static int cnt = 0;
int n;
char buf[2];
buf[0] = c;
if (cnt == 80) {
buf[1] = '\n';
n = 2;
cnt = 0;
}
else {
n = 1;
cnt++;
}
/*   
CHECK(rt_sem_inquire(&m, &seminfo));
if (seminfo.count != 0) {
RT_TASK_INFO taskinfo;
CHECK(rt_task_inquire(NULL, &taskinfo));
fprintf(stderr, "ALERT: No lock! (count=%ld) Offending task: %s\n",
seminfo.count, taskinfo.name);
}
*/  
if (write(fd, buf, n) != n) {
fprintf(stderr, "File write error.\n");
CHECK( rt_sem_signal(s) );
}
   
}

static void *task0(void *args) {
   RT_TASK *handler;

   if (!(handler = rt_task_init_schmod(nam2num("T0HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
  printf("CANNOT INIT HANDLER TASK > T0HDLR <\n");
  exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t0end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t0end ) {
   rt_sleep((float)rand()*tmax/(float)RAND_MAX);
   rt_sem_wait(m);
   output('0');
   CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}

static void *task1(void *args) {
   RT_TASK *handler;
   if (!(handler = rt_task_init_schmod(nam2num("T1HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
  printf("CANNOT INIT HANDLER TASK > T1HDLR <\n");
  exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t1end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t1end ) {
   rt_sleep((float)rand()*tmax/(float)RAND_MAX);
   rt_sem_wait(m);
   output('1');
   CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}


void sighandler(int arg)
{
CHECK(rt_sem_signal(s));
}

int main(int argc, char *argv[])
{
   RT_TASK *maint; //, *squaretask;
   int t0, t1;
  
   if ((fd = open("dump.txt", O_CREAT | O_TRUNC | O_WRONLY)) < 0)
fprintf(stderr, "File open error.\n");
   else {
  if (argc == 2) {
 tmax = atof(argv[1]);
 if (tmax == 0.0)
tmax = 1.0e7;
  }
  rt_set_oneshot_mode();
  start_rt_timer(0);
  m = rt_sem_init(nam2num("MSEM"), 1);
  s = rt_sem_init(nam2num("SSEM"), 0);
  signal(SIGINT, sighandler);
  if (!(maint = rt_task_init(nam2num("MAIN"), 1, 0, 0))) {
 printf("CANNOT INIT MAIN TASK > MAIN <\n");
 exit(1);
  }
  t0 = rt_thread_create(task0, NULL, 1);  // create thread
  while (t0end) {   // wait until thread went to hard real time
 usleep(10);
  }
  t1 = rt_thread_create(task1, NULL, 1);  // create thread
  while (t1end) {   // wait until thread went to hard real time
 usleep(10);
  }   
  printf("Running for %.2f seconds.\n", (float)MAXLONG/1.0e9);
   
  rt_sem_wait(s);
   
  signal(SIGINT, SIG_IGN);
  t0end = 1;
  t1end = 1;
  printf("TEST ENDS\n");
  CHECK( rt_thread_join(t0) );
  CHECK( rt_thread_join(t1) );
  CHECK(rt_sem_delete(s));
  CHECK(rt_sem_delete(m));
  CHECK( rt_task_delete(maint) );
   close(fd);
   }
   return 0;
}

[Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

Well, I would be happy if someone can prove me wrong here.

Jan

signature.asc
Description: OpenPGP digital signature

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

Hannes Mayer wrote:
> Jan Kiszka wrote:
> [...]
>> PS: Out of curiosity I also checked RTAI's migration mechanism in this
>> regard. It's similar except for the fact that it does the gatekeeper's
>> work in the Linux scheduler's tail (i.e. after the next context switch).
>> And RTAI seems it suffers from the very same race. So this is either a
>> fundamental issue - or I'm fundamentally wrong.
> 
> 
> Well, most of the stuff you guys talk about in this thread is still
> beyond my level, but out of curiosity I ported the SEM example to
> RTAI (see attached sem.c)
> I couldn't come up with something similar to rt_sem_inquire and
> rt_task_inquire in RTAI (in "void output(char c)")...
> Anyway, unless I haven't missed something else important while
> porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).
> 

My claim on the RTAI race is based on quick code analysis and a bit
outdated information about its core design. I haven't tried any code to
crash it, and I guess it will take a slightly different test design to
trigger the issue there. As soon as someone could follow my reasoning
and confirm it (don't mind that you did not understand it, I hadn't
either two days ago, this is quite heavy stuff), I will inform Paolo
about this potential problem.

Jan

signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-21 Thread Hannes Mayer


Jan Kiszka wrote:
[...]

PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.



Well, most of the stuff you guys talk about in this thread is still
beyond my level, but out of curiosity I ported the SEM example to
RTAI (see attached sem.c)
I couldn't come up with something similar to rt_sem_inquire and
rt_task_inquire in RTAI (in "void output(char c)")...
Anyway, unless I haven't missed something else important while
porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).

Best regards,
Hannes.
/* TEST_SEM.C ported to RTAI3.3*/

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#include 
#include 
#include 

int fd, err;
int t0end = 1;
int t1end = 1;

SEM *s, *m;
float tmax = 1.0e9;

#define CHECK(arg) check(arg, __LINE__)

int check(int r, int n)
{
if (r != 0)
fprintf(stderr, "L%d: %s.\n", n, strerror(-r));
return(r);
}

void output(char c) {
static int cnt = 0;
int n;
char buf[2];
buf[0] = c;
if (cnt == 80) {
buf[1] = '\n';
n = 2;
cnt = 0;
}
else {
n = 1;
cnt++;
}
/*   
CHECK(rt_sem_inquire(&m, &seminfo));
if (seminfo.count != 0) {
RT_TASK_INFO taskinfo;
CHECK(rt_task_inquire(NULL, &taskinfo));
fprintf(stderr, "ALERT: No lock! (count=%ld) Offending task: %s\n",
seminfo.count, taskinfo.name);
}
*/  
if (write(fd, buf, n) != n) {
fprintf(stderr, "File write error.\n");
CHECK( rt_sem_signal(s) );
}
   
}

static void *task0(void *args) {
   RT_TASK *handler;

   if (!(handler = rt_task_init_schmod(nam2num("T0HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
  printf("CANNOT INIT HANDLER TASK > T0HDLR <\n");
  exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t0end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t0end ) {
   rt_sleep((float)rand()*tmax/(float)RAND_MAX);
   rt_sem_wait(m);
   output('0');
   CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}

static void *task1(void *args) {
   RT_TASK *handler;
   if (!(handler = rt_task_init_schmod(nam2num("T1HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
  printf("CANNOT INIT HANDLER TASK > T1HDLR <\n");
  exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t1end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t1end ) {
   rt_sleep((float)rand()*tmax/(float)RAND_MAX);
   rt_sem_wait(m);
   output('1');
   CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}


void sighandler(int arg)
{
CHECK(rt_sem_signal(s));
}

int main(int argc, char *argv[])
{
   RT_TASK *maint; //, *squaretask;
   int t0, t1;
  
   if ((fd = open("dump.txt", O_CREAT | O_TRUNC | O_WRONLY)) < 0)
fprintf(stderr, "File open error.\n");
   else {
  if (argc == 2) {
 tmax = atof(argv[1]);
 if (tmax == 0.0)
tmax = 1.0e7;
  }
  rt_set_oneshot_mode();
  start_rt_timer(0);
  m = rt_sem_init(nam2num("MSEM"), 1);
  s = rt_sem_init(nam2num("SSEM"), 0);
  signal(SIGINT, sighandler);
  if (!(maint = rt_task_init(nam2num("MAIN"), 1, 0, 0))) {
 printf("CANNOT INIT MAIN TASK > MAIN <\n");
 exit(1);
  }
  t0 = rt_thread_create(task0, NULL, 1);  // create thread
  while (t0end) {   // wait until thread went to hard real time
 usleep(10);
  }
  t1 = rt_thread_create(task1, NULL, 1);  // create thread
  while (t1end) {   // wait until thread went to hard real time
 usleep(10);
  }   
  printf("Running for %.2f seconds.\n", (float)MAXLONG/1.0e9);
   
  rt_sem_wait(s);
   
  signal(SIGINT, SIG_IGN);
  t0end = 1;
  t1end = 1;
  printf("TEST ENDS\n");
  CHECK( rt_thread_join(t0) );
  CHECK( rt_thread_join(t1) );
  CHECK(rt_sem_delete(s));
  CHECK(rt_sem_delete(m));
  CHECK( rt_task_delete(maint) );
   close(fd);
   }
   return 0;
}


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

[Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT