On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote:
> On 2011-07-16 10:52, Philippe Gerum wrote:
> > On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:
> >> On 2011-07-15 15:10, Jan Kiszka wrote:
> >>> But... right now it looks like we found our primary regression:
> >>> "nucleus/shadow: shorten the uninterruptible path to secondary mode".
> >>> It opens a short windows during relax where the migrated task may be
> >>> active under both schedulers. We are currently evaluating a revert
> >>> (looks good so far), and I need to work out my theory in more
> >>> details.
> >>
> >> Looks like this commit just made a long-standing flaw in Xenomai's
> >> interrupt handling more visible: We reschedule over the interrupt stack
> >> in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
> >> other archs have interrupt stacks, the point is Xenomai's design wrongly
> >> assumes there are no such things.
> >
> > Fortunately, no, this is not a design issue, no such assumption was ever
> > made, but the Xenomai core expects this to be handled on a per-arch
> > basis with the interrupt pipeline.
> And that's already the problem: If Linux uses interrupt stacks, relying 
> on ipipe to disable this during Xenomai interrupt handler execution is 
> at best a workaround. A fragile one unless you increase the pre-thread 
> stack size by the size of the interrupt stack. Lacking support for a 
> generic rescheduling hook became a problem by the time Linux introduced 
> interrupt threads.

Don't assume too much. What was done for ppc64 was not meant as a
general policy. Again, this is a per-arch decision.

> > As you pointed out, there is no way
> > to handle this via some generic Xenomai-only support.
> >
> > ppc64 now has separate interrupt stacks, which is why I disabled
> > IRQSTACKS which became the builtin default at some point. Blackfin goes
> > through a Xenomai-defined irq tail handler as well, because it may not
> > reschedule over nested interrupt stacks.
> How does this arch prevent that xnpod_schedule in the generic interrupt 
> handler tail does its normal work?

It polls some hw status to know whether a rescheduling would be safe.
See xnarch_escalate().

> > Fact is that such pending
> > problem with x86_64 was overlooked since day #1 by /me.
> >
> >>   We were lucky so far that the values
> >> saved on this shared stack were apparently "compatible", means we were
> >> overwriting them with identical or harmless values. But that's no longer
> >> true when interrupts are hitting us in the xnpod_suspend_thread path of
> >> a relaxing shadow.
> >>
> >
> > Makes sense. It would be better to find a solution that does not make
> > the relax path uninterruptible again for a significant amount of time.
> > On low end platforms we support (i.e. non-x86* mainly), this causes
> > obvious latency spots.
> I agree. Conceptually, the interruptible relaxation should be safe now 
> after recent fixes.
> >
> >> Likely the only possible fix is establishing a reschedule hook for
> >> Xenomai in the interrupt exit path after the original stack is restored
> >> - - just like Linux works. Requires changes to both ipipe and Xenomai
> >> unfortunately.
> >
> > __ipipe_run_irqtail() is in the I-pipe core for such purpose. If
> > instantiated properly for x86_64, and paired with xnarch_escalate() for
> > that arch as well, it could be an option for running the rescheduling
> > procedure when safe.
> Nope, that doesn't work. The stack is switched later in the return path 
> in entry_64.S. We need a hook there, ideally a conditional one, 
> controlled by some per-cpu variable that is set by Xenomai on return 
> from its interrupt handlers to signal the rescheduling need.

Yes, makes sense. The way to make it conditional without dragging bits
of Xenomai logic into the kernel innards is not obvious though.

It is probably time to officially introduce "exo-kernel" oriented bits
into the Linux thread info. PTDs have too lose semantics to be practical
if we want to avoid trashing the I-cache by calling probe hooks within
the dual kernel, each time we want to check some basic condition (e.g.
resched needed). A backlink to a foreign TCB there would help too.

Which leads us to killing the ad hoc kernel threads (and stacks) at some
point, which are an absolute pain.

> Jan


Xenomai-core mailing list

Reply via email to