Gilles Chanteperdrix wrote:
On Thu, Apr 3, 2008 at 2:17 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:Sebastian Smolorz wrote:Gilles Chanteperdrix wrote:On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz <[EMAIL PROTECTED]> wrote:Jan Kiszka wrote: > Sebastian Smolorz wrote: >> Jan Kiszka wrote: >>> Cornelius Köpp wrote: >>>> I talked with Sebastian Smolorz about this and he builds his own >>>> independent kernel-config to check. He got the samedrifting-effect>>>> with Xenomai 2.4.2 and Xenomai 2.4.3 running latency overseveral>>>> hours. His kernel-config ist attached as >>>> 'config-2.6.24-xenomai-2.4.3__ssm'. >>>> >>>> Our kernel-configs are both based on a config used with Xenomai2.3.4>>>> and Linux 220.127.116.11 without any drifting effects. >>> 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe itis>>> not a PIC vs. APIC thing, but rather a rounding problem of largerTSC>>> values (that naturally show up when the system runs for a longertime).>> This hint seems to point into the right direction. I tried out a >> modified pod_32.h (xnarch_tsc_to_ns() commented out) so that theold>> implementation in include/asm-generic/bits/pod.h was used. Thedrifting>> bug disappeared. So there seems so be a buggy x86-specific >> implementation of this routine. > > Hmm, maybe even a conceptional issue: the multiply-shift-based > xnarch_tsc_to_ns is not as precise as the stillmultiply-divide-based> xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc,we> may loose some bits, maybe too many bits... > > It looks like this bites us in the kernel latency tests (-t2 should > suffer as well). Those recalculate their timeouts each round basedon> absolute nanoseconds. In contrast, the periodic user mode task of-t0> uses a periodic timer that is forwarded via a tsc-based interval. > > You (or Cornelius) could try to analyse the calculation path of the > involved timeouts, specifically to understand why the scheduledtimeout> of the underlying task timer (which is tsc-based) tend to divergefrom> the calculated one (ns-based). So here comes the explanation. The error is inside the function rthal_llmulshft(). It returns wrong values which are too small - the higher the given TSC value the bigger the error. The function rtdm_clock_read_monotonic() calls rthal_llmulshft(). As rtdm_clock_read_monotonic() is called every time the latency kernel thread runs  the values reported by latency become smaller overtime.In contrast, the latency task in user space only uses the conversion from TSC to ns only once when calling rt_timer_inquire . timer_info.date is too small, timer_info.tsc is right. So allcalculateddeltas in  are shifted to a smaller value. This value is constant during the runtime of lateny in user space because no more conversion from TSC to ns occurs.latency does conversions from tsc to ns, but it converts time differences, so the error is small relative to the results.Of course. I wasn't precise with my last statement. It should be: No moreconversions from *absolute* TSC values to ns occur.This patch may do the trick: it uses the inverted tsc-to-ns function instead of the frequency-based one. Be warned, it is totally untested inside Xenomai, I just ran it in a user space test program. But it may give an idea. Gilles, not sure if this is related to my quickly hacked test, but with RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000000000000000 (or larger) I get an arithmetic exception with the rthal_llimd-based conversion to nanoseconds. Is there an input range we may have to exclude for rthal_llimd?rthal_llimd does a multiplication first, then a division. The multiplication can not overflow, but the result of the division may not fit on 64 bits, you then get an exception on x86. This happens only with m > d.
OK, for tsc-to-ns this only bites us after a few hundred years of uptime - or when we have settable tsc counters (does Linux tweak them beyond aligning on SMP?).
But there is also the risk the other way around: ns-to-tsc with frequency > 1GHz will fall apart (kernel oops!) when the user provides a large timeout in nanoseconds that we then try to convert to tsc. Not good. Wrong values are one thing, but oopses are even worse.
Any idea how to fix this? Jan
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomaifirstname.lastname@example.org https://mail.gna.org/listinfo/xenomai-core