On 2011-07-16 10:52, Philippe Gerum wrote:
On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:
On 2011-07-15 15:10, Jan Kiszka wrote:
But... right now it looks like we found our primary regression:
"nucleus/shadow: shorten the uninterruptible path to secondary mode".
It opens a short windows during relax where the migrated task may be
active under both schedulers. We are currently evaluating a revert
(looks good so far), and I need to work out my theory in more
Looks like this commit just made a long-standing flaw in Xenomai's
interrupt handling more visible: We reschedule over the interrupt stack
in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
other archs have interrupt stacks, the point is Xenomai's design wrongly
assumes there are no such things.
Fortunately, no, this is not a design issue, no such assumption was ever
made, but the Xenomai core expects this to be handled on a per-arch
basis with the interrupt pipeline.
And that's already the problem: If Linux uses interrupt stacks, relying
on ipipe to disable this during Xenomai interrupt handler execution is
at best a workaround. A fragile one unless you increase the pre-thread
stack size by the size of the interrupt stack. Lacking support for a
generic rescheduling hook became a problem by the time Linux introduced
As you pointed out, there is no way
to handle this via some generic Xenomai-only support.
ppc64 now has separate interrupt stacks, which is why I disabled
IRQSTACKS which became the builtin default at some point. Blackfin goes
through a Xenomai-defined irq tail handler as well, because it may not
reschedule over nested interrupt stacks.
How does this arch prevent that xnpod_schedule in the generic interrupt
handler tail does its normal work?
Fact is that such pending
problem with x86_64 was overlooked since day #1 by /me.
We were lucky so far that the values
saved on this shared stack were apparently "compatible", means we were
overwriting them with identical or harmless values. But that's no longer
true when interrupts are hitting us in the xnpod_suspend_thread path of
a relaxing shadow.
Makes sense. It would be better to find a solution that does not make
the relax path uninterruptible again for a significant amount of time.
On low end platforms we support (i.e. non-x86* mainly), this causes
obvious latency spots.
I agree. Conceptually, the interruptible relaxation should be safe now
after recent fixes.
Likely the only possible fix is establishing a reschedule hook for
Xenomai in the interrupt exit path after the original stack is restored
- - just like Linux works. Requires changes to both ipipe and Xenomai
__ipipe_run_irqtail() is in the I-pipe core for such purpose. If
instantiated properly for x86_64, and paired with xnarch_escalate() for
that arch as well, it could be an option for running the rescheduling
procedure when safe.
Nope, that doesn't work. The stack is switched later in the return path
in entry_64.S. We need a hook there, ideally a conditional one,
controlled by some per-cpu variable that is set by Xenomai on return
from its interrupt handlers to signal the rescheduling need.
Xenomai-core mailing list