Jan Kiszka wrote:
Philippe Gerum wrote:

Philippe Gerum wrote:

Jan Kiszka wrote:


Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:

Hello,

I'm currently not at a level to participate in your

discussion. Although I'm

willing to supply you with stresstests, I would nevertheless like

to learn

more from task migration as this debugging session proceeds. In

order to do

so, please confirm the following statements or indicate where I

went wrong.

I hope others may learn from this as well.

xn_shadow_harden(): This is called whenever a Xenomai thread

performs a

Linux (root domain) system call (notified by Adeos ?).

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a
few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread

(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
wake_up_interruptible_sync() call. Is this thread actually run or

does it

merely put the thread in some Linux to-do list (I assumed the

first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks
itself as
suspended and calls schedule(). Maybe, marking the running thread as



Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.


You could not guarantee the following execution sequence doing so
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be
unlinked from the Linux runqueue before the gatekeeper processes the
resumption request, whatever event the kernel is processing
asynchronously in the meantime. This is the reason why, as you already
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
CPU from the hardening thread whilst keeping it linked to the
runqueue: upon return from such preemption, the gatekeeper might have
run already,  hence the newly hardened thread ends up being seen as
runnable by both the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in
xnshadow_harden() before waking up the gk, until the gk eventually
promotes it to the Xenomai scheduling mode and downgrades this
priority back to normal, but we would pay additional latencies induced
by each aborted rescheduling attempt that may occur during the atomic
path we want to enforce.

The other way is to make sure that no in-kernel preemption of the
hardening task could occur after step 1) and until step 2) is
performed, given that we cannot currently call schedule() with
interrupts or preemption off. I'm on it.


Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,



Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)


Looks like, indeed.

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.


No issue with the ipipe side since it only touches the tracer support code. No issue either at first sight with the Xeno side, aside of the trace being frozen twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but maybe it's wanted to freeze the situation before the stack is dumped; is it?

I'm queuing the ipipe side patch for 1.2, which will also provide the support we need for atomic scheduling in order to solve the migration bug.

--

Philippe.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to