Jan Kiszka wrote:
Philippe Gerum wrote:

Fixed. The cause was related to the thread migration routine to
primary mode (xnshadow_harden), which would spuriously call the Linux
rescheduling procedure from the primary domain under certain
circumstances. This bug only triggers on preemptible kernels. This
also fixes the spinlock recursion issue which is sometimes triggered
when the spinlock debug option is active.

Gasp. I've found a severe regression with this fix, so more work is
needed. More later.

End of alert. Should be ok now.

No crashes so far, looks good. But the final test, a box which always
went to hell very quickly, is still waiting in my office - more on Monday.

Anyway, there seems to be some latency issues pending. I discovered this
again with my migration test. Please give it a try on a mid- (800 MHz
Athlon in my case) to low-end box. On that Athlon I got peaks of over
100 us in the userspace latency test right on starting migration. The
Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook
there were alarms (>30 us) hitting in the native registry during
rt_task_create. I have no clue yet if anything is broken there.

I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section.

We need
that back-tracer soon - did I mentioned this before? ;)

Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support.

BTW, a kernel timer latency test based on a RTDM device is half-done.
I'm able to dump kernel-based timed-task latencies via a patched
testsuite latency. Histograms need to be added as well as a timer
handler latency test. Will keep you posted.

Ack. This would also cleanly solve the "where-am-i-going-to-put-that-stuff" issue wrt the latency kernel module the user-space section cannot/should not have to compile anymore in 2.1. I guess that moving it to the ksrc/drivers/ section would then be the most natural thing to do.




Reply via email to