On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
<[EMAIL PROTECTED]> wrote:
> Jan Kiszka wrote:
> > Sebastian Smolorz wrote:
> >> Jan Kiszka wrote:
> >>> Cornelius Köpp wrote:
> >>>> Hello,
> >>>> I run the latency test from testsuite on several hard and software
> >>>> configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results
> >>>> shows a "strange" behavior: In Kernel mode (-t1) the latencys
> >>>> constantly linear decrease. See attached plot
> >>>> 'drifting_latencys_in_kernelmode.png' of latency test running 48h on
> >>>> Pentium3 700. This effect could be reproduced, even on other hardware
> >>>> (Pentium-M 1400).
> >>> As our P3 boards did not support APIC-based timing (IIRC), your kernel
> >>> has correctly disabled the related kernel support. But the Pentium M
> >>> should be fine. So could you check if we are seeing some TSC clocks
> >>> vs. PIT timer rounding issue by enabling the local APIC on the Pentium M?
> >> There is no difference in enabling the local APIC on the Pentium M WRT
> >> this bug.
> >>>> The usermode (-t0) did not show a drifting, but is influenced by a
> >>>> test ran in kernelmode before.
> >>> What do you mean with "is influenced"?
> >> Cornelius saw the following behaviour: If the latency test was run in
> >> user space first, no drift appeared over time. If latency was run in
> >> kernel space (with the reported ngeative drift) a following latency test
> >> in user space showed also negative values but with no additional drift
> >> over time.
> Correction: The initial negative drift when starting user mode latency
> does not depend on a former run of latency in kernel mode but on the
> time passed between system start and the starting point of latency -t0.
> Or, as explained below, it depends on the value of the TSC.
> >>>> I talked with Sebastian Smolorz about this and he builds his own
> >>>> independent kernel-config to check. He got the same drifting-effect
> >>>> with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
> >>>> hours. His kernel-config ist attached as
> >>>> 'config-2.6.24-xenomai-2.4.3__ssm'.
> >>>> Our kernel-configs are both based on a config used with Xenomai 2.3.4
> >>>> and Linux 188.8.131.52 without any drifting effects.
> >>> 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
> >>> not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
> >>> values (that naturally show up when the system runs for a longer time).
> >> This hint seems to point into the right direction. I tried out a
> >> modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
> >> implementation in include/asm-generic/bits/pod.h was used. The drifting
> >> bug disappeared. So there seems so be a buggy x86-specific
> >> implementation of this routine.
> > Hmm, maybe even a conceptional issue: the multiply-shift-based
> > xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
> > xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
> > may loose some bits, maybe too many bits...
> > It looks like this bites us in the kernel latency tests (-t2 should
> > suffer as well). Those recalculate their timeouts each round based on
> > absolute nanoseconds. In contrast, the periodic user mode task of -t0
> > uses a periodic timer that is forwarded via a tsc-based interval.
> > You (or Cornelius) could try to analyse the calculation path of the
> > involved timeouts, specifically to understand why the scheduled timeout
> > of the underlying task timer (which is tsc-based) tend to diverge from
> > the calculated one (ns-based).
> So here comes the explanation. The error is inside the function
> rthal_llmulshft(). It returns wrong values which are too small - the
> higher the given TSC value the bigger the error. The function
> rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
> rtdm_clock_read_monotonic() is called every time the latency kernel
> thread runs  the values reported by latency become smaller over time.
> In contrast, the latency task in user space only uses the conversion
> from TSC to ns only once when calling rt_timer_inquire .
> timer_info.date is too small, timer_info.tsc is right. So all calculated
> deltas in  are shifted to a smaller value. This value is constant
> during the runtime of lateny in user space because no more conversion
> from TSC to ns occurs.
latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results. In
contrast, doing substractions of conversion results is wrong. In other
start = rt_timer_tsc();
stop = rt_timer_tsc();
diffns = rt_timer_tsc2ns(stop - start);
is right. Whereas doing:
start = rt_timer_tsc2ns(rt_timer_tsc());
stop = rt_timer_tsc2ns(rt_timer_tsc());
diffns = stop - start;
Xenomai-core mailing list