Re: [Xenomai-help] memcpy performance on Xenomai

Philippe Gerum Tue, 15 May 2007 08:16:47 -0700

On Tue, 2007-05-15 at 14:40 +0000, Daniel Schnell wrote:
> This was not the culprit. Same results.
> 
> Does Xenomai replace the memcpy() call with an own implementation ? (I don't 
> think so.)


No.

> 
> What about trashing of cash lines through context switches ? 

Interrupts also participate in cache trashing.

> But then if we run it on Linux alone we should also have trashed cache lines. 
> There should not be any difference.

It depends. You are running 2.4.25/ppc kernel IIRC, which means that
your system endures much fewer preemptions on a vanilla kernel (100 hz
timer, no kernel preemption). Depending on the Xenomai timer freq, and
the number of RT thread switches in your app, your cache may be under
permanent pressure.

> Is maybe the presence of a Xenomai POSIX thread cause a lot of ctx switches,
> even if only a memcpy is executed inside the thread ? Shouldn't Xenomai 
> threads
>  run totally uninterrupted if they have the highest prio ?

I don't get what you mean actually. If your thread needs no switching,
then Xenomai does no switches, period. However, if your RT thread is
continuously moving from primary to secondary mode and back for
instance, then switches would occur at a high rate;
see /proc/xenomai/stats to check this.

2.4/ppc kernels could possibly cause secondary mode switches to Xenomai
threads, due to on-demand mapping and COW management issues when copying
data, especially to/from large buffers. So, memcpy in primary mode ->
page_fault -> mode_transition -> internal context_switch -> back to
memcpy in secondary mode for the same thread.

High prio threads can also be preempted by interrupts.

> 
> Please could somebody actually run this test on his hardware and see if these 
> differences between Xenomai POSIX skin and Linux native are happening there 
> as well ?
> 

FWIW, you have all the needed tools to check this yourself.

First, sampling /proc/xenomai/stats would tell you the average number of
ctx switches, and the number of mode transitions, on a per-thread
basis. 
Then, you could move to a 2.6.x kernel for the purpose of testing and
without having to change anything else runtime-wise, this would enable
the latency tracer facility (Kernel hacking -> I-pipe debugging). A
simple log showing how/by whom a given user-space memcpy has been
preempted would definitely shed some light on this issue.

> 
> Best regards,
> 
> Daniel Schnell
> 
> 
> -----Original Message-----
> From: Gilles Chanteperdrix [mailto:[EMAIL PROTECTED] 
> Sent: 15. maí 2007 12:16
> To: Daniel Schnell
> Cc: [email protected]
> Subject: Re: [Xenomai-help] memcpy performance on Xenomai
> 
> 
> Improving clock_gettime overhead by reading directly the tsc is my very next 
> task. If you want to check if the effect you measure is the result of 
> clock_gettime overhead, you can measure the duration of memcpy with the 
> native api service rt_timer_tsc, and convert the tsc difference with 
> rt_timer_tsc2ns.
> 
-- 
Philippe.



_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Re: [Xenomai-help] memcpy performance on Xenomai

Reply via email to