Philippe Gerum wrote:
> Jan Kiszka wrote:
> 
>> Philippe Gerum wrote:
>>
>>>>> ...
>>>>> Fixed. The cause was related to the thread migration routine to
>>>>> primary mode (xnshadow_harden), which would spuriously call the Linux
>>>>> rescheduling procedure from the primary domain under certain
>>>>> circumstances. This bug only triggers on preemptible kernels. This
>>>>> also fixes the spinlock recursion issue which is sometimes triggered
>>>>> when the spinlock debug option is active.
>>>>>
>>>>
>>>> Gasp. I've found a severe regression with this fix, so more work is
>>>> needed. More later.
>>>>
>>>
>>> End of alert. Should be ok now.
>>>
>>
>>
>> No crashes so far, looks good. But the final test, a box which always
>> went to hell very quickly, is still waiting in my office - more on
>> Monday.
>>
>> Anyway, there seems to be some latency issues pending. I discovered this
>> again with my migration test. Please give it a try on a mid- (800 MHz
>> Athlon in my case) to low-end box. On that Athlon I got peaks of over
>> 100 us in the userspace latency test right on starting migration. The
>> Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook
>> there were alarms (>30 us) hitting in the native registry during
>> rt_task_create. I have no clue yet if anything is broken there.
> 
> 
> I suspect that rt_registry_enter() is inherently a long operation when
> considered as a non-preemptible sum of reasonably short ones. Since it
> is always called with interrupts enabled, we should split the work in
> there, releasing interrupts in the middle. The tricky thing is that we
> must ensure that the new registration slot is not exposed in a
> half-baked state during the preemptible section.

Yea, I guess there are a few more of such complex call chains inside the
core lock, at least when looking at the native skin. For a regression
test suite, we should define load scenarios of low-prio realtime tasks
doing some init/cleanup and communication while e.g. the latency test is
running. This should give a clearer picture what numbers you can expect
in a normal application scenarios.

> 
>> We need
>> that back-tracer soon - did I mentioned this before? ;)
> 
> 
> Well, we have a backtrace support for detecting latency peaks, but it's
> dependent on NMI availability. The thing is that not every platform
> provides a programmable NMI support. A possible option would be to
> overload the existing LTT tracepoints in order to keep an execution
> backtrace, so that we would not have to rely on any hw support.
> 

The advantage of Fu's mcount-based tracer will be that is can capture
also functions you do not expect, e.g. accidentally called kernel
services. His patch, likely against Adeos, will enable kernel-wide
function tracing which you can use to instrument IRQ-off paths or (in a
second step or so) others things you are interested in. And it will
maintain a FULL calling history, something that NMI can't do.

NMI will still be useful for hard lock-ups, LTT for a more global view
what's happening, but the mcount-instrumentation should give deep
insights on the core's and skin's critical timing behaviours.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to