Gilles Chanteperdrix wrote:
On Thu, Apr 3, 2008 at 2:17 PM, Jan Kiszka <[EMAIL PROTECTED]> wrote:
Sebastian Smolorz wrote:

Gilles Chanteperdrix wrote:

On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz

Jan Kiszka wrote:
 > Sebastian Smolorz wrote:
 >> Jan Kiszka wrote:
 >>> Cornelius Köpp wrote:
 >>>> I talked with Sebastian Smolorz about this and he builds his own
 >>>> independent kernel-config to check. He got the same
 >>>> with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over
 >>>> hours. His kernel-config ist attached as
 >>>> 'config-2.6.24-xenomai-2.4.3__ssm'.
 >>>> Our kernel-configs are both based on a config used with Xenomai
 >>>> and Linux without any drifting effects.
 >>> 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it
 >>> not a PIC vs. APIC thing, but rather a rounding problem of larger
 >>> values (that naturally show up when the system runs for a longer
 >> This hint seems to point into the right direction. I tried out a
 >> modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the
 >> implementation in include/asm-generic/bits/pod.h was used. The
 >> bug disappeared. So there seems so be a buggy x86-specific
 >> implementation of this routine.
 > Hmm, maybe even a conceptional issue: the multiply-shift-based
 > xnarch_tsc_to_ns is not as precise as the still
 > xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc,
 > may loose some bits, maybe too many bits...
 > It looks like this bites us in the kernel latency tests (-t2 should
 > suffer as well). Those recalculate their timeouts each round based
 > absolute nanoseconds. In contrast, the periodic user mode task of
 > uses a periodic timer that is forwarded via a tsc-based interval.
 > You (or Cornelius) could try to analyse the calculation path of the
 > involved timeouts, specifically to understand why the scheduled
 > of the underlying task timer (which is tsc-based) tend to diverge
 > the calculated one (ns-based).

 So here comes the explanation. The error is inside the function
 rthal_llmulshft(). It returns wrong values which are too small - the
 higher the given TSC value the bigger the error. The function
 rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
 rtdm_clock_read_monotonic() is called every time the latency kernel
 thread runs [1] the values reported by latency become smaller over
 In contrast, the latency task in user space only uses the conversion
 from TSC to ns only once when calling rt_timer_inquire [2]. is too small, timer_info.tsc is right. So all
 deltas in [3] are shifted to a smaller value. This value is constant
 during the runtime of lateny in user space because no more conversion
 from TSC to ns occurs.

latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results.

Of course. I wasn't precise with my last statement. It should be: No more
conversions from *absolute* TSC values to ns occur.

 This patch may do the trick: it uses the inverted tsc-to-ns function
instead of the frequency-based one. Be warned, it is totally untested inside
Xenomai, I just ran it in a user space test program. But it may give an

 Gilles, not sure if this is related to my quickly hacked test, but with
RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000000000000000 (or larger) I get an
arithmetic exception with the rthal_llimd-based conversion to nanoseconds.
Is there an input range we may have to exclude for rthal_llimd?

rthal_llimd does a multiplication first, then a division. The
multiplication can not overflow, but the result of the division may
not fit on 64 bits, you then get an exception on x86. This happens
only with m > d.

OK, for tsc-to-ns this only bites us after a few hundred years of uptime - or when we have settable tsc counters (does Linux tweak them beyond aligning on SMP?).

But there is also the risk the other way around: ns-to-tsc with frequency > 1GHz will fall apart (kernel oops!) when the user provides a large timeout in nanoseconds that we then try to convert to tsc. Not good. Wrong values are one thing, but oopses are even worse.

Any idea how to fix this?


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to