On Tue, 2007-05-15 at 14:40 +0000, Daniel Schnell wrote: > This was not the culprit. Same results. > > Does Xenomai replace the memcpy() call with an own implementation ? (I don't > think so.)
No. > > What about trashing of cash lines through context switches ? Interrupts also participate in cache trashing. > But then if we run it on Linux alone we should also have trashed cache lines. > There should not be any difference. It depends. You are running 2.4.25/ppc kernel IIRC, which means that your system endures much fewer preemptions on a vanilla kernel (100 hz timer, no kernel preemption). Depending on the Xenomai timer freq, and the number of RT thread switches in your app, your cache may be under permanent pressure. > Is maybe the presence of a Xenomai POSIX thread cause a lot of ctx switches, > even if only a memcpy is executed inside the thread ? Shouldn't Xenomai > threads > run totally uninterrupted if they have the highest prio ? I don't get what you mean actually. If your thread needs no switching, then Xenomai does no switches, period. However, if your RT thread is continuously moving from primary to secondary mode and back for instance, then switches would occur at a high rate; see /proc/xenomai/stats to check this. 2.4/ppc kernels could possibly cause secondary mode switches to Xenomai threads, due to on-demand mapping and COW management issues when copying data, especially to/from large buffers. So, memcpy in primary mode -> page_fault -> mode_transition -> internal context_switch -> back to memcpy in secondary mode for the same thread. High prio threads can also be preempted by interrupts. > > Please could somebody actually run this test on his hardware and see if these > differences between Xenomai POSIX skin and Linux native are happening there > as well ? > FWIW, you have all the needed tools to check this yourself. First, sampling /proc/xenomai/stats would tell you the average number of ctx switches, and the number of mode transitions, on a per-thread basis. Then, you could move to a 2.6.x kernel for the purpose of testing and without having to change anything else runtime-wise, this would enable the latency tracer facility (Kernel hacking -> I-pipe debugging). A simple log showing how/by whom a given user-space memcpy has been preempted would definitely shed some light on this issue. > > Best regards, > > Daniel Schnell > > > -----Original Message----- > From: Gilles Chanteperdrix [mailto:[EMAIL PROTECTED] > Sent: 15. maĆ 2007 12:16 > To: Daniel Schnell > Cc: [email protected] > Subject: Re: [Xenomai-help] memcpy performance on Xenomai > > > Improving clock_gettime overhead by reading directly the tsc is my very next > task. If you want to check if the effect you measure is the result of > clock_gettime overhead, you can measure the duration of memcpy with the > native api service rt_timer_tsc, and convert the tsc difference with > rt_timer_tsc2ns. > -- Philippe. _______________________________________________ Xenomai-help mailing list [email protected] https://mail.gna.org/listinfo/xenomai-help
