Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Sebastian Smolorz
Jan Kiszka wrote:
 Sebastian Smolorz wrote:
 Jan Kiszka wrote:
 This patch may do the trick: it uses the inverted tsc-to-ns function 
 instead of the frequency-based one. Be warned, it is totally untested 
 inside Xenomai, I just ran it in a user space test program. But it 
 may give an idea.

 Your patch needed two minor corrections (ns instead of ts in functions 
 xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of 
 latency -t1 seems to prove your bug-fix: There seems to be no drift.
 
 That's good to hear.
 
 If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more 
 precise but introduces a new function xnarch_ns_to_tsc() which is also 
 less precise than the generic xnarch_ns_to_tsc(), right?
 
 Yes. It is now precisely the inverse imprecision, so to say. :)
 
 So isn't there still the danger of getting wrong values when calling 
 xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?
 
 Only if the user decides to implement his own conversion. Xenomai with 
 all its skins and both in kernel and user space should always run 
 through the xnarch_* path.

OK, would you commit the patch?

-- 
Sebastian

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Jan Kiszka

Sebastian Smolorz wrote:

Jan Kiszka wrote:

Sebastian Smolorz wrote:

Jan Kiszka wrote:
This patch may do the trick: it uses the inverted tsc-to-ns function 
instead of the frequency-based one. Be warned, it is totally 
untested inside Xenomai, I just ran it in a user space test program. 
But it may give an idea.


Your patch needed two minor corrections (ns instead of ts in 
functions xnarch_ns_to_tsc()) in order to compile. A short run (30 
minutes) of latency -t1 seems to prove your bug-fix: There seems to 
be no drift.


That's good to hear.

If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more 
precise but introduces a new function xnarch_ns_to_tsc() which is 
also less precise than the generic xnarch_ns_to_tsc(), right?


Yes. It is now precisely the inverse imprecision, so to say. :)

So isn't there still the danger of getting wrong values when calling 
xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?


Only if the user decides to implement his own conversion. Xenomai with 
all its skins and both in kernel and user space should always run 
through the xnarch_* path.


OK, would you commit the patch?


Will do unless someone else has concerns. Gilles, Philippe? ARM and 
Blackfin then need to be fixed similarly, full patch attached.


Jan
---
 ChangeLog|7 +++
 include/asm-arm/bits/init.h  |3 ++-
 include/asm-arm/bits/pod.h   |7 +++
 include/asm-blackfin/bits/init.h |3 ++-
 include/asm-blackfin/bits/pod.h  |7 +++
 include/asm-x86/bits/init_32.h   |3 ++-
 include/asm-x86/bits/init_64.h   |3 ++-
 include/asm-x86/bits/pod_32.h|7 +++
 include/asm-x86/bits/pod_64.h|7 +++
 9 files changed, 43 insertions(+), 4 deletions(-)

Index: b/include/asm-x86/bits/init_32.h
===
--- a/include/asm-x86/bits/init_32.h
+++ b/include/asm-x86/bits/init_32.h
@@ -73,7 +73,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
 	int err;
 
 	err = rthal_init();
@@ -89,6 +89,7 @@ static inline int xnarch_init(void)
 
 	xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
 			  xnarch_tsc_scale, xnarch_tsc_shift);
+	xnarch_tsc_divide = 1  xnarch_tsc_shift;
 
 	err = xnarch_calibrate_sched();
 
Index: b/include/asm-x86/bits/init_64.h
===
--- a/include/asm-x86/bits/init_64.h
+++ b/include/asm-x86/bits/init_64.h
@@ -70,7 +70,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
 	int err;
 
 	err = rthal_init();
@@ -86,6 +86,7 @@ static inline int xnarch_init(void)
 
 	xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
 			  xnarch_tsc_scale, xnarch_tsc_shift);
+	xnarch_tsc_divide = 1  xnarch_tsc_shift;
 
 	err = xnarch_calibrate_sched();
 
Index: b/include/asm-x86/bits/pod_32.h
===
--- a/include/asm-x86/bits/pod_32.h
+++ b/include/asm-x86/bits/pod_32.h
@@ -25,6 +25,7 @@
 
 unsigned xnarch_tsc_scale;
 unsigned xnarch_tsc_shift;
+unsigned xnarch_tsc_divide;
 
 long long xnarch_tsc_to_ns(long long ts)
 {
@@ -32,6 +33,12 @@ long long xnarch_tsc_to_ns(long long ts)
 }
 #define XNARCH_TSC_TO_NS
 
+long long xnarch_ns_to_tsc(long long ns)
+{
+	return xnarch_llimd(ns, xnarch_tsc_divide, xnarch_tsc_scale);
+}
+#define XNARCH_NS_TO_TSC
+
 #include asm-generic/xenomai/bits/pod.h
 #include asm/xenomai/switch.h
 
Index: b/include/asm-x86/bits/pod_64.h
===
--- a/include/asm-x86/bits/pod_64.h
+++ b/include/asm-x86/bits/pod_64.h
@@ -24,6 +24,7 @@
 
 unsigned xnarch_tsc_scale;
 unsigned xnarch_tsc_shift;
+unsigned xnarch_tsc_divide;
 
 long long xnarch_tsc_to_ns(long long ts)
 {
@@ -31,6 +32,12 @@ long long xnarch_tsc_to_ns(long long ts)
 }
 #define XNARCH_TSC_TO_NS
 
+long long xnarch_ns_to_tsc(long long ns)
+{
+	return xnarch_llimd(ns, xnarch_tsc_divide, xnarch_tsc_scale);
+}
+#define XNARCH_NS_TO_TSC
+
 #include asm-generic/xenomai/bits/pod.h
 #include asm/xenomai/switch.h
 
Index: b/include/asm-arm/bits/init.h
===
--- a/include/asm-arm/bits/init.h
+++ b/include/asm-arm/bits/init.h
@@ -67,7 +67,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
 	int err;
 
 	err = rthal_init();
@@ -77,6 +77,7 @@ static inline int xnarch_init(void)
 
 	xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
 			  xnarch_tsc_scale, xnarch_tsc_shift);
+	xnarch_tsc_divide 

Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Gilles Chanteperdrix
On Fri, Apr 4, 2008 at 12:45 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

 Sebastian Smolorz wrote:

  Jan Kiszka wrote:
 
   Sebastian Smolorz wrote:
  
Jan Kiszka wrote:
   
 This patch may do the trick: it uses the inverted tsc-to-ns function
 instead of the frequency-based one. Be warned, it is totally untested inside
 Xenomai, I just ran it in a user space test program. But it may give an
 idea.

   
Your patch needed two minor corrections (ns instead of ts in functions
 xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of latency
 -t1 seems to prove your bug-fix: There seems to be no drift.
   
  
   That's good to hear.
  
  
If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more
 precise but introduces a new function xnarch_ns_to_tsc() which is also less
 precise than the generic xnarch_ns_to_tsc(), right?
   
  
   Yes. It is now precisely the inverse imprecision, so to say. :)
  
  
So isn't there still the danger of getting wrong values when calling
 xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?
   
  
   Only if the user decides to implement his own conversion. Xenomai with
 all its skins and both in kernel and user space should always run through
 the xnarch_* path.
  
 
  OK, would you commit the patch?
 

  Will do unless someone else has concerns. Gilles, Philippe? ARM and
 Blackfin then need to be fixed similarly, full patch attached.

Well, I am sorry, but I do not like this solution;
- the aim of scaled math is to avoid divisions, and with this patch we
end up using divisions;
- with scaled math we do wrong calculations, and making a wrong
xnarch_ns_to_tsc only works for values which should be passed to
xnarch_tsc_to_ns.

So, I would like to propose again my solution, which is exact, but use
no division. Its drawback is that it makes a few more additions and
multiplications than rthal_llimd, but has no division. If it happens
to be slower than llimd on some platforms (maybe x86 ?), I would use
llimd. After all, if the division is fast, we may be wrong to try and
avoid it.

For the records, here is the code:
typedef struct {
unsigned long long frac;/* Fractionary part. */
unsigned long integ;/* Integer part. */
} u32frac_t;

/* m/d == integ + frac / 2^64 */

static inline void precalc(u32frac_t *const f,
   const unsigned long m,
   const unsigned long d)
{
f-integ = m / d;
f-frac = div96by32(u32tou64(m % d, 0), 0, d, NULL);
}

unsigned long fast_imuldiv(unsigned long op, u32frac_t f)
{
const unsigned long tmp = (ullmul(op, f.frac  32))  32;

if(f.integ)
return tmp + op * f.integ;

return tmp;
}

#define add64and32(h, l, s) do {\
__asm__ (addl %2, %1\n\t  \
 adcl $0, %0  \
 : +r(h), +r(l) \
 : r(s)); \
} while(0)

#define add96and64(l0, l1, l2, s0, s1) do { \
__asm__ (addl %4, %2\n\t  \
 adcl %3, %1\n\t  \
 adcl $0, %0\n\t  \
 : +r(l0), +r(l1), +r(l2) \
 : r(s0), r(s1));   \
} while(0)

static inline __attribute_const__ unsigned long long
mul64by64_high(const unsigned long long op, const unsigned long long m)
{
/* Compute high 64 bits of multiplication 64 bits x 64 bits. */
unsigned long long t1, t2, t3;
u_long oph, opl, mh, ml, t0, t1h, t1l, t2h, t2l, t3h, t3l;

u64tou32(op, oph, opl);
u64tou32(m, mh, ml);
t0 = ullmul(opl, ml)  32;
t1 = ullmul(oph, ml); u64tou32(t1, t1h, t1l);
add64and32(t1h, t1l, t0);
t2 = ullmul(opl, mh); u64tou32(t2, t2h, t2l);
t3 = ullmul(oph, mh); u64tou32(t3, t3h, t3l);
add64and32(t3h, t3l, t2h);
add96and64(t3h, t3l, t2l, t1h, t1l);

return u64fromu32(t3h, t3l);
}

static inline __attribute_const__ unsigned long long
fast_llimd(const unsigned long long op, const u32frac_t f)
{
const unsigned long long tmp = mul64by64_high(op, f.frac);

if(f.integ)
return tmp + op * f.integ;

return tmp;
}


-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Jan Kiszka

Gilles Chanteperdrix wrote:

On Fri, Apr 4, 2008 at 12:45 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

Sebastian Smolorz wrote:


Jan Kiszka wrote:


Sebastian Smolorz wrote:


Jan Kiszka wrote:


This patch may do the trick: it uses the inverted tsc-to-ns function

instead of the frequency-based one. Be warned, it is totally untested inside
Xenomai, I just ran it in a user space test program. But it may give an
idea.

Your patch needed two minor corrections (ns instead of ts in functions

xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of latency
-t1 seems to prove your bug-fix: There seems to be no drift.

That's good to hear.



If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more

precise but introduces a new function xnarch_ns_to_tsc() which is also less
precise than the generic xnarch_ns_to_tsc(), right?

Yes. It is now precisely the inverse imprecision, so to say. :)



So isn't there still the danger of getting wrong values when calling

xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?

Only if the user decides to implement his own conversion. Xenomai with

all its skins and both in kernel and user space should always run through
the xnarch_* path.

OK, would you commit the patch?


 Will do unless someone else has concerns. Gilles, Philippe? ARM and
Blackfin then need to be fixed similarly, full patch attached.


Well, I am sorry, but I do not like this solution;
- the aim of scaled math is to avoid divisions, and with this patch we
end up using divisions;


Please check again, no new division due to my patch, just different 
parameters for the existing one.



- with scaled math we do wrong calculations, and making a wrong
xnarch_ns_to_tsc only works for values which should be passed to
xnarch_tsc_to_ns.


IMHO, the error is within the range of the clock's precision, if not 
even below. So struggling for mathematically precise conversion of 
imprecise physical values makes no sense to me. Therefore I once 
proposed the scaled-math optimization.


Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Jan Kiszka

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:

On Fri, Apr 4, 2008 at 12:45 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

Sebastian Smolorz wrote:


Jan Kiszka wrote:


Sebastian Smolorz wrote:


Jan Kiszka wrote:


This patch may do the trick: it uses the inverted tsc-to-ns function
instead of the frequency-based one. Be warned, it is totally untested 
inside

Xenomai, I just ran it in a user space test program. But it may give an
idea.
Your patch needed two minor corrections (ns instead of ts in 
functions
xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of 
latency

-t1 seems to prove your bug-fix: There seems to be no drift.

That's good to hear.



If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more
precise but introduces a new function xnarch_ns_to_tsc() which is 
also less

precise than the generic xnarch_ns_to_tsc(), right?

Yes. It is now precisely the inverse imprecision, so to say. :)



So isn't there still the danger of getting wrong values when calling

xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?

Only if the user decides to implement his own conversion. Xenomai with
all its skins and both in kernel and user space should always run 
through

the xnarch_* path.

OK, would you commit the patch?


 Will do unless someone else has concerns. Gilles, Philippe? ARM and
Blackfin then need to be fixed similarly, full patch attached.


Well, I am sorry, but I do not like this solution;
- the aim of scaled math is to avoid divisions, and with this patch we
end up using divisions;


Please check again, no new division due to my patch, just different 
parameters for the existing one.



- with scaled math we do wrong calculations, and making a wrong
xnarch_ns_to_tsc only works for values which should be passed to
xnarch_tsc_to_ns.


IMHO, the error is within the range of the clock's precision, if not 
even below. So struggling for mathematically precise conversion of 
imprecise physical values makes no sense to me. Therefore I once 
proposed the scaled-math optimization.


But this does not mean that I'm opposing even faster division-less 
ns_to_tsc with scaled-math parameters, i.e. combining best of both worlds!


Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Gilles Chanteperdrix
On Fri, Apr 4, 2008 at 3:25 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

 Gilles Chanteperdrix wrote:

  On Fri, Apr 4, 2008 at 12:45 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
 
   Sebastian Smolorz wrote:
  
  
Jan Kiszka wrote:
   
   
 Sebastian Smolorz wrote:


  Jan Kiszka wrote:
 
 
   This patch may do the trick: it uses the inverted tsc-to-ns
 function
  
 

   
   instead of the frequency-based one. Be warned, it is totally untested
 inside
   Xenomai, I just ran it in a user space test program. But it may give an
   idea.
  
   

  Your patch needed two minor corrections (ns instead of ts in
 functions
 

   
   xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of
 latency
   -t1 seems to prove your bug-fix: There seems to be no drift.
  
   
 That's good to hear.



  If I got your patch correctly, it doesn't make xnarch_tsc_to_ns
 more
 

   
   precise but introduces a new function xnarch_ns_to_tsc() which is also
 less
   precise than the generic xnarch_ns_to_tsc(), right?
  
   
 Yes. It is now precisely the inverse imprecision, so to say. :)



  So isn't there still the danger of getting wrong values when
 calling
 

   
   xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?
  
   
 Only if the user decides to implement his own conversion. Xenomai
 with

   
   all its skins and both in kernel and user space should always run
 through
   the xnarch_* path.
  
OK, would you commit the patch?
   
   
Will do unless someone else has concerns. Gilles, Philippe? ARM and
   Blackfin then need to be fixed similarly, full patch attached.
  
 
  Well, I am sorry, but I do not like this solution;
  - the aim of scaled math is to avoid divisions, and with this patch we
  end up using divisions;
 

  Please check again, no new division due to my patch, just different
 parameters for the existing one.

I just checked your patch rapidly, but saw that xnarch_ns_to_tsc was
using llimd, so does use division. My fast_llimd could be used to
replace both the llimd calls in xnarch_tsc_to_ns and xnarch_ns_to_tsc.




  - with scaled math we do wrong calculations, and making a wrong
  xnarch_ns_to_tsc only works for values which should be passed to
  xnarch_tsc_to_ns.
 

  IMHO, the error is within the range of the clock's precision, if not even
 below. So struggling for mathematically precise conversion of imprecise
 physical values makes no sense to me. Therefore I once proposed the
 scaled-math optimization.

Now that I have understood what really happens, I disagree with this
approach. Take the implementation of clock_gettime (or
rtdm_clock_read, for that matter). They basically do
xnarch_tsc_to_ns(ipipe_read_tsc()). The relative error may be small,
but in the very frequent use case of substracting two results of
consecutive reads of ipipe_read_tsc, the result of the substraction is
essentially garbage, because the result of the substraction may be of
the same order as the absolute error of the conversion. And I insist,
this use case of clock_gettime or rtdm_clock_read is a very realistic
use case.

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Gilles Chanteperdrix
On Fri, Apr 4, 2008 at 3:57 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

 Gilles Chanteperdrix wrote:

  On Fri, Apr 4, 2008 at 3:25 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
 
   Gilles Chanteperdrix wrote:
  
  
On Fri, Apr 4, 2008 at 12:45 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
   
   
 Sebastian Smolorz wrote:



  Jan Kiszka wrote:
 
 
 
   Sebastian Smolorz wrote:
  
  
  
Jan Kiszka wrote:
   
   
   
 This patch may do the trick: it uses the inverted tsc-to-ns

   
  
 

   
   function
  
   
 instead of the frequency-based one. Be warned, it is totally
 untested

   
   inside
  
   
 Xenomai, I just ran it in a user space test program. But it may give
 an
 idea.


 
  
Your patch needed two minor corrections (ns instead of ts in
   
  
 

   
   functions
  
   
 xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of

   
   latency
  
   
 -t1 seems to prove your bug-fix: There seems to be no drift.


 
   That's good to hear.
  
  
  
  
If I got your patch correctly, it doesn't make
 xnarch_tsc_to_ns
   
  
 

   
   more
  
   
 precise but introduces a new function xnarch_ns_to_tsc() which is
 also

   
   less
  
   
 precise than the generic xnarch_ns_to_tsc(), right?


 
   Yes. It is now precisely the inverse imprecision, so to say. :)
  
  
  
  
So isn't there still the danger of getting wrong values when
   
  
 

   
   calling
  
   
 xnarch_tsc_to_ns()  not in combination with xnarch_ns_to_tsc()?


 
   Only if the user decides to implement his own conversion.
 Xenomai
  
 

   
   with
  
   
 all its skins and both in kernel and user space should always run

   
   through
  
   
 the xnarch_* path.


  OK, would you commit the patch?
 
 
 
  Will do unless someone else has concerns. Gilles, Philippe? ARM and
 Blackfin then need to be fixed similarly, full patch attached.


Well, I am sorry, but I do not like this solution;
- the aim of scaled math is to avoid divisions, and with this patch we
end up using divisions;
   
   
Please check again, no new division due to my patch, just different
   parameters for the existing one.
  
 
  I just checked your patch rapidly, but saw that xnarch_ns_to_tsc was
  using llimd, so does use division. My fast_llimd could be used to
  replace both the llimd calls in xnarch_tsc_to_ns and xnarch_ns_to_tsc.
 
 
  
  
  
- with scaled math we do wrong calculations, and making a wrong
xnarch_ns_to_tsc only works for values which should be passed to
xnarch_tsc_to_ns.
   
   
IMHO, the error is within the range of the clock's precision, if not
 even
   below. So struggling for mathematically precise conversion of imprecise
   physical values makes no sense to me. Therefore I once proposed the
   scaled-math optimization.
  
 
  Now that I have understood what really happens, I disagree with this
  approach. Take the implementation of clock_gettime (or
  rtdm_clock_read, for that matter). They basically do
  xnarch_tsc_to_ns(ipipe_read_tsc()). The relative error may be small,
  but in the very frequent use case of substracting two results of
  consecutive reads of ipipe_read_tsc, the result of the substraction is
  essentially garbage, because the result of the substraction may be of
  the same order as the absolute error of the conversion. And I insist,
  this use case of clock_gettime or rtdm_clock_read is a very realistic
  use case.
 

  This use case is valid, but I don't see the error scenario you sketch: The
 error of the conversion is only relevant for large deltas,
 tsc_to_ns(B)-tsc_to_ns(A)=B-A for any small B-A. Cornelius' test nicely
 showed constantly increasing deviation, not something that jumped around.
 Essentially, we are just replacing

 xnarch_llimd(ts, 10, RTHAL_CPU_FREQ);

  with

 xnarch_llimd(ts, xnarch_tsc_scale, 1xnarch_tsc_shift);

  which introduces a linearly increasing error of the _absolute_ results, not
 of relative ones. But if you can prove me wrong, I would take everything
 back and agree on kicking out the scaled math immediately!

Right. We are approximating a fraction with another fraction. But my
first impression remains: I do not like the idea of making
xnarch_ns_to_tsc wrong because xnarch_tsc_to_ns is wrong.

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Jan Kiszka

Gilles Chanteperdrix wrote:

Right. We are approximating a fraction with another fraction. But my
first impression remains: I do not like the idea of making
xnarch_ns_to_tsc wrong because xnarch_tsc_to_ns is wrong.


Well, my first impression was originally the same: If we still need 
llimd in the ns-to-tsc patch, then we should keep the precise way. But 
that was wrong as this thread demonstrated. We have to ensure that 
ns_to_tsc(tsc_to_ns(x)) remains x with only minor last-digit errors. So 
either use scaled math parameters in both ways or fall back to the 
original calculation.


However the final decision for 2.5 is (pro or contra scaled math), at 
least for 2.4.x we have to fix things now without turning the upside 
down. That means apply my patch or revert scaled-math optimizations for 
all archs.


Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Gilles Chanteperdrix
On Fri, Apr 4, 2008 at 4:33 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
 Gilles Chanteperdrix wrote:

  Right. We are approximating a fraction with another fraction. But my
  first impression remains: I do not like the idea of making
  xnarch_ns_to_tsc wrong because xnarch_tsc_to_ns is wrong.
 

  Well, my first impression was originally the same: If we still need llimd
 in the ns-to-tsc patch, then we should keep the precise way. But that was
 wrong as this thread demonstrated. We have to ensure that
 ns_to_tsc(tsc_to_ns(x)) remains x with only minor last-digit errors. So
 either use scaled math parameters in both ways or fall back to the original
 calculation.

Now that I think about it, this scaled math approach is about taking
an approximation of the CPU frequency, which is already approximative.
So, I will not oppose longer to your patch.


  However the final decision for 2.5 is (pro or contra scaled math), at least
 for 2.4.x we have to fix things now without turning the upside down. That
 means apply my patch or revert scaled-math optimizations for all archs.



-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-04 Thread Philippe Gerum
Gilles Chanteperdrix wrote:
 On Fri, Apr 4, 2008 at 4:33 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
 Gilles Chanteperdrix wrote:

 Right. We are approximating a fraction with another fraction. But my
 first impression remains: I do not like the idea of making
 xnarch_ns_to_tsc wrong because xnarch_tsc_to_ns is wrong.

  Well, my first impression was originally the same: If we still need llimd
 in the ns-to-tsc patch, then we should keep the precise way. But that was
 wrong as this thread demonstrated. We have to ensure that
 ns_to_tsc(tsc_to_ns(x)) remains x with only minor last-digit errors. So
 either use scaled math parameters in both ways or fall back to the original
 calculation.
 
 Now that I think about it, this scaled math approach is about taking
 an approximation of the CPU frequency, which is already approximative.
 So, I will not oppose longer to your patch.


Ok, so I take this for a green light to commit too. Please commit.

  However the final decision for 2.5 is (pro or contra scaled math), at least
 for 2.4.x we have to fix things now without turning the upside down. That
 means apply my patch or revert scaled-math optimizations for all archs.
 
 
 


-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-03 Thread Jan Kiszka

Sebastian Smolorz wrote:

Gilles Chanteperdrix wrote:

On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
[EMAIL PROTECTED] wrote:

Jan Kiszka wrote:
  Sebastian Smolorz wrote:
  Jan Kiszka wrote:
  Cornelius Köpp wrote:
  I talked with Sebastian Smolorz about this and he builds his own
  independent kernel-config to check. He got the same 
drifting-effect

  with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
  hours. His kernel-config ist attached as
  'config-2.6.24-xenomai-2.4.3__ssm'.
 
  Our kernel-configs are both based on a config used with Xenomai 
2.3.4

  and Linux 2.6.20.15 without any drifting effects.
  2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
  not a PIC vs. APIC thing, but rather a rounding problem of 
larger TSC
  values (that naturally show up when the system runs for a longer 
time).

  This hint seems to point into the right direction. I tried out a
  modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
  implementation in include/asm-generic/bits/pod.h was used. The 
drifting

  bug disappeared. So there seems so be a buggy x86-specific
  implementation of this routine.
 
  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
  xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
  may loose some bits, maybe too many bits...
 
  It looks like this bites us in the kernel latency tests (-t2 should
  suffer as well). Those recalculate their timeouts each round based on
  absolute nanoseconds. In contrast, the periodic user mode task of -t0
  uses a periodic timer that is forwarded via a tsc-based interval.
 
  You (or Cornelius) could try to analyse the calculation path of the
  involved timeouts, specifically to understand why the scheduled 
timeout
  of the underlying task timer (which is tsc-based) tend to diverge 
from

  the calculated one (ns-based).

 So here comes the explanation. The error is inside the function
 rthal_llmulshft(). It returns wrong values which are too small - the
 higher the given TSC value the bigger the error. The function
 rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
 rtdm_clock_read_monotonic() is called every time the latency kernel
 thread runs [1] the values reported by latency become smaller over 
time.


 In contrast, the latency task in user space only uses the conversion
 from TSC to ns only once when calling rt_timer_inquire [2].
 timer_info.date is too small, timer_info.tsc is right. So all 
calculated

  deltas in [3] are shifted to a smaller value. This value is constant
 during the runtime of lateny in user space because no more conversion
 from TSC to ns occurs.


latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results.


Of course. I wasn't precise with my last statement. It should be: No 
more conversions from *absolute* TSC values to ns occur.




This patch may do the trick: it uses the inverted tsc-to-ns function 
instead of the frequency-based one. Be warned, it is totally untested 
inside Xenomai, I just ran it in a user space test program. But it may 
give an idea.


Gilles, not sure if this is related to my quickly hacked test, but with 
RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000 (or larger) I get 
an arithmetic exception with the rthal_llimd-based conversion to 
nanoseconds. Is there an input range we may have to exclude for rthal_llimd?


Jan
---
 include/asm-x86/bits/init_32.h |3 ++-
 include/asm-x86/bits/init_64.h |3 ++-
 include/asm-x86/bits/pod_32.h  |7 +++
 include/asm-x86/bits/pod_64.h  |7 +++
 4 files changed, 18 insertions(+), 2 deletions(-)

Index: b/include/asm-x86/bits/init_32.h
===
--- a/include/asm-x86/bits/init_32.h
+++ b/include/asm-x86/bits/init_32.h
@@ -73,7 +73,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
 	int err;
 
 	err = rthal_init();
@@ -89,6 +89,7 @@ static inline int xnarch_init(void)
 
 	xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
 			  xnarch_tsc_scale, xnarch_tsc_shift);
+	xnarch_tsc_divide = 1  xnarch_tsc_shift;
 
 	err = xnarch_calibrate_sched();
 
Index: b/include/asm-x86/bits/init_64.h
===
--- a/include/asm-x86/bits/init_64.h
+++ b/include/asm-x86/bits/init_64.h
@@ -70,7 +70,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+	extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
 	int err;
 
 	err = rthal_init();
@@ -86,6 +86,7 @@ static inline int xnarch_init(void)
 
 	xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
 			  xnarch_tsc_scale, 

Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-03 Thread Gilles Chanteperdrix
On Thu, Apr 3, 2008 at 2:17 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

 Sebastian Smolorz wrote:

  Gilles Chanteperdrix wrote:
 
   On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
   [EMAIL PROTECTED] wrote:
  
Jan Kiszka wrote:
  Sebastian Smolorz wrote:
  Jan Kiszka wrote:
  Cornelius Köpp wrote:
  I talked with Sebastian Smolorz about this and he builds his own
  independent kernel-config to check. He got the same
 drifting-effect
  with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over
 several
  hours. His kernel-config ist attached as
  'config-2.6.24-xenomai-2.4.3__ssm'.
 
  Our kernel-configs are both based on a config used with Xenomai
 2.3.4
  and Linux 2.6.20.15 without any drifting effects.
  2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it
 is
  not a PIC vs. APIC thing, but rather a rounding problem of larger
 TSC
  values (that naturally show up when the system runs for a longer
 time).
  This hint seems to point into the right direction. I tried out a
  modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the
 old
  implementation in include/asm-generic/bits/pod.h was used. The
 drifting
  bug disappeared. So there seems so be a buggy x86-specific
  implementation of this routine.
 
  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still
 multiply-divide-based
  xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc,
 we
  may loose some bits, maybe too many bits...
 
  It looks like this bites us in the kernel latency tests (-t2 should
  suffer as well). Those recalculate their timeouts each round based
 on
  absolute nanoseconds. In contrast, the periodic user mode task of
 -t0
  uses a periodic timer that is forwarded via a tsc-based interval.
 
  You (or Cornelius) could try to analyse the calculation path of the
  involved timeouts, specifically to understand why the scheduled
 timeout
  of the underlying task timer (which is tsc-based) tend to diverge
 from
  the calculated one (ns-based).
   
 So here comes the explanation. The error is inside the function
 rthal_llmulshft(). It returns wrong values which are too small - the
 higher the given TSC value the bigger the error. The function
 rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
 rtdm_clock_read_monotonic() is called every time the latency kernel
 thread runs [1] the values reported by latency become smaller over
 time.
   
 In contrast, the latency task in user space only uses the conversion
 from TSC to ns only once when calling rt_timer_inquire [2].
 timer_info.date is too small, timer_info.tsc is right. So all
 calculated
 deltas in [3] are shifted to a smaller value. This value is constant
 during the runtime of lateny in user space because no more conversion
 from TSC to ns occurs.
   
  
   latency does conversions from tsc to ns, but it converts time
   differences, so the error is small relative to the results.
  
 
  Of course. I wasn't precise with my last statement. It should be: No more
 conversions from *absolute* TSC values to ns occur.
 
 

  This patch may do the trick: it uses the inverted tsc-to-ns function
 instead of the frequency-based one. Be warned, it is totally untested inside
 Xenomai, I just ran it in a user space test program. But it may give an
 idea.

  Gilles, not sure if this is related to my quickly hacked test, but with
 RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000 (or larger) I get an
 arithmetic exception with the rthal_llimd-based conversion to nanoseconds.
 Is there an input range we may have to exclude for rthal_llimd?

rthal_llimd does a multiplication first, then a division. The
multiplication can not overflow, but the result of the division may
not fit on 64 bits, you then get an exception on x86. This happens
only with m  d.


-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-03 Thread Jan Kiszka

Gilles Chanteperdrix wrote:

On Thu, Apr 3, 2008 at 2:17 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

Sebastian Smolorz wrote:


Gilles Chanteperdrix wrote:


On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
[EMAIL PROTECTED] wrote:


Jan Kiszka wrote:
  Sebastian Smolorz wrote:
  Jan Kiszka wrote:
  Cornelius Köpp wrote:
  I talked with Sebastian Smolorz about this and he builds his own
  independent kernel-config to check. He got the same

drifting-effect

  with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over

several

  hours. His kernel-config ist attached as
  'config-2.6.24-xenomai-2.4.3__ssm'.
 
  Our kernel-configs are both based on a config used with Xenomai

2.3.4

  and Linux 2.6.20.15 without any drifting effects.
  2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it

is

  not a PIC vs. APIC thing, but rather a rounding problem of larger

TSC

  values (that naturally show up when the system runs for a longer

time).

  This hint seems to point into the right direction. I tried out a
  modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the

old

  implementation in include/asm-generic/bits/pod.h was used. The

drifting

  bug disappeared. So there seems so be a buggy x86-specific
  implementation of this routine.
 
  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still

multiply-divide-based

  xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc,

we

  may loose some bits, maybe too many bits...
 
  It looks like this bites us in the kernel latency tests (-t2 should
  suffer as well). Those recalculate their timeouts each round based

on

  absolute nanoseconds. In contrast, the periodic user mode task of

-t0

  uses a periodic timer that is forwarded via a tsc-based interval.
 
  You (or Cornelius) could try to analyse the calculation path of the
  involved timeouts, specifically to understand why the scheduled

timeout

  of the underlying task timer (which is tsc-based) tend to diverge

from

  the calculated one (ns-based).

 So here comes the explanation. The error is inside the function
 rthal_llmulshft(). It returns wrong values which are too small - the
 higher the given TSC value the bigger the error. The function
 rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
 rtdm_clock_read_monotonic() is called every time the latency kernel
 thread runs [1] the values reported by latency become smaller over

time.

 In contrast, the latency task in user space only uses the conversion
 from TSC to ns only once when calling rt_timer_inquire [2].
 timer_info.date is too small, timer_info.tsc is right. So all

calculated

 deltas in [3] are shifted to a smaller value. This value is constant
 during the runtime of lateny in user space because no more conversion
 from TSC to ns occurs.


latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results.


Of course. I wasn't precise with my last statement. It should be: No more

conversions from *absolute* TSC values to ns occur.



 This patch may do the trick: it uses the inverted tsc-to-ns function
instead of the frequency-based one. Be warned, it is totally untested inside
Xenomai, I just ran it in a user space test program. But it may give an
idea.

 Gilles, not sure if this is related to my quickly hacked test, but with
RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000 (or larger) I get an
arithmetic exception with the rthal_llimd-based conversion to nanoseconds.
Is there an input range we may have to exclude for rthal_llimd?


rthal_llimd does a multiplication first, then a division. The
multiplication can not overflow, but the result of the division may
not fit on 64 bits, you then get an exception on x86. This happens
only with m  d.


OK, for tsc-to-ns this only bites us after a few hundred years of uptime 
- or when we have settable tsc counters (does Linux tweak them beyond 
aligning on SMP?).


But there is also the risk the other way around: ns-to-tsc with 
frequency  1GHz will fall apart (kernel oops!) when the user provides a 
large timeout in nanoseconds that we then try to convert to tsc. Not 
good. Wrong values are one thing, but oopses are even worse.


Any idea how to fix this?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-03 Thread Sebastian Smolorz

Jan Kiszka wrote:

Sebastian Smolorz wrote:

Gilles Chanteperdrix wrote:

On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
[EMAIL PROTECTED] wrote:

Jan Kiszka wrote:
  Sebastian Smolorz wrote:
  Jan Kiszka wrote:
  Cornelius Köpp wrote:
  I talked with Sebastian Smolorz about this and he builds his own
  independent kernel-config to check. He got the same 
drifting-effect

  with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
  hours. His kernel-config ist attached as
  'config-2.6.24-xenomai-2.4.3__ssm'.
 
  Our kernel-configs are both based on a config used with 
Xenomai 2.3.4

  and Linux 2.6.20.15 without any drifting effects.
  2.3.x did not incorporate the new TSC-to-ns conversion. Maybe 
it is
  not a PIC vs. APIC thing, but rather a rounding problem of 
larger TSC
  values (that naturally show up when the system runs for a 
longer time).

  This hint seems to point into the right direction. I tried out a
  modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the 
old
  implementation in include/asm-generic/bits/pod.h was used. The 
drifting

  bug disappeared. So there seems so be a buggy x86-specific
  implementation of this routine.
 
  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still 
multiply-divide-based
  xnarch_ns_to_tsc. So when converting from tsc over ns back to 
tsc, we

  may loose some bits, maybe too many bits...
 
  It looks like this bites us in the kernel latency tests (-t2 should
  suffer as well). Those recalculate their timeouts each round 
based on
  absolute nanoseconds. In contrast, the periodic user mode task of 
-t0

  uses a periodic timer that is forwarded via a tsc-based interval.
 
  You (or Cornelius) could try to analyse the calculation path of the
  involved timeouts, specifically to understand why the scheduled 
timeout
  of the underlying task timer (which is tsc-based) tend to diverge 
from

  the calculated one (ns-based).

 So here comes the explanation. The error is inside the function
 rthal_llmulshft(). It returns wrong values which are too small - the
 higher the given TSC value the bigger the error. The function
 rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
 rtdm_clock_read_monotonic() is called every time the latency kernel
 thread runs [1] the values reported by latency become smaller over 
time.


 In contrast, the latency task in user space only uses the conversion
 from TSC to ns only once when calling rt_timer_inquire [2].
 timer_info.date is too small, timer_info.tsc is right. So all 
calculated

  deltas in [3] are shifted to a smaller value. This value is constant
 during the runtime of lateny in user space because no more conversion
 from TSC to ns occurs.


latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results.


Of course. I wasn't precise with my last statement. It should be: No 
more conversions from *absolute* TSC values to ns occur.




This patch may do the trick: it uses the inverted tsc-to-ns function 
instead of the frequency-based one. Be warned, it is totally untested 
inside Xenomai, I just ran it in a user space test program. But it may 
give an idea.


Your patch needed two minor corrections (ns instead of ts in functions 
xnarch_ns_to_tsc()) in order to compile. A short run (30 minutes) of 
latency -t1 seems to prove your bug-fix: There seems to be no drift.


If I got your patch correctly, it doesn't make xnarch_tsc_to_ns more 
precise but introduces a new function xnarch_ns_to_tsc() which is also 
less precise than the generic xnarch_ns_to_tsc(), right? So isn't there 
still the danger of getting wrong values when calling xnarch_tsc_to_ns() 
 not in combination with xnarch_ns_to_tsc()?


--
Sebastian
---
 include/asm-x86/bits/init_32.h |3 ++-
 include/asm-x86/bits/init_64.h |3 ++-
 include/asm-x86/bits/pod_32.h  |7 +++
 include/asm-x86/bits/pod_64.h  |7 +++
 4 files changed, 18 insertions(+), 2 deletions(-)

Index: b/include/asm-x86/bits/init_32.h
===
--- a/include/asm-x86/bits/init_32.h
+++ b/include/asm-x86/bits/init_32.h
@@ -73,7 +73,7 @@ int xnarch_calibrate_sched(void)
 
 static inline int xnarch_init(void)
 {
-   extern unsigned xnarch_tsc_scale, xnarch_tsc_shift;
+   extern unsigned xnarch_tsc_scale, xnarch_tsc_shift, xnarch_tsc_divide;
int err;
 
err = rthal_init();
@@ -89,6 +89,7 @@ static inline int xnarch_init(void)
 
xnarch_init_llmulshft(10, RTHAL_CPU_FREQ,
  xnarch_tsc_scale, xnarch_tsc_shift);
+   xnarch_tsc_divide = 1  xnarch_tsc_shift;
 
err = xnarch_calibrate_sched();
 
Index: b/include/asm-x86/bits/init_64.h
===
--- a/include/asm-x86/bits/init_64.h
+++ b/include/asm-x86/bits/init_64.h
@@ -70,7 +70,7 @@ int 

Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Sebastian Smolorz
Jan Kiszka wrote:
 Cornelius Köpp wrote:
 Hello,
 I run the latency test from testsuite on several hard and software 
 configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results 
 shows a strange behavior: In Kernel mode (-t1) the latencys 
 constantly linear decrease. See attached plot 
 'drifting_latencys_in_kernelmode.png' of latency test running 48h on 
 Pentium3 700. This effect could be reproduced, even on other hardware 
 (Pentium-M 1400).
 
 As our P3 boards did not support APIC-based timing (IIRC), your kernel 
 has correctly disabled the related kernel support. But the Pentium M 
 should be fine. So could you check if we are seeing some TSC clocks vs. 
 PIT timer rounding issue by enabling the local APIC on the Pentium M?

There is no difference in enabling the local APIC on the Pentium M WRT 
this bug.

 The usermode (-t0) did not show a drifting, but is influenced by a 
 test ran in kernelmode before.
 
 What do you mean with is influenced?

Cornelius saw the following behaviour: If the latency test was run in 
user space first, no drift appeared over time. If latency was run in 
kernel space (with the reported ngeative drift) a following latency test 
in user space showed also negative values but with no additional drift 
over time.

 I talked with Sebastian Smolorz about this and he builds his own 
 independent kernel-config to check. He got the same drifting-effect 
 with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several 
 hours. His kernel-config ist attached as 
 'config-2.6.24-xenomai-2.4.3__ssm'.

 Our kernel-configs are both based on a config used with Xenomai 2.3.4 
 and Linux 2.6.20.15 without any drifting effects.
 
 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is not 
 a PIC vs. APIC thing, but rather a rounding problem of larger TSC values 
 (that naturally show up when the system runs for a longer time).

This hint seems to point into the right direction. I tried out a 
modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old 
implementation in include/asm-generic/bits/pod.h was used. The drifting 
bug disappeared. So there seems so be a buggy x86-specific 
implementation of this routine.

-- 
Sebastian

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Jan Kiszka
Sebastian Smolorz wrote:
 Jan Kiszka wrote:
 Cornelius Köpp wrote:
 Hello,
 I run the latency test from testsuite on several hard and software
 configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results
 shows a strange behavior: In Kernel mode (-t1) the latencys
 constantly linear decrease. See attached plot
 'drifting_latencys_in_kernelmode.png' of latency test running 48h on
 Pentium3 700. This effect could be reproduced, even on other hardware
 (Pentium-M 1400).

 As our P3 boards did not support APIC-based timing (IIRC), your kernel
 has correctly disabled the related kernel support. But the Pentium M
 should be fine. So could you check if we are seeing some TSC clocks
 vs. PIT timer rounding issue by enabling the local APIC on the Pentium M?
 
 There is no difference in enabling the local APIC on the Pentium M WRT
 this bug.
 
 The usermode (-t0) did not show a drifting, but is influenced by a
 test ran in kernelmode before.

 What do you mean with is influenced?
 
 Cornelius saw the following behaviour: If the latency test was run in
 user space first, no drift appeared over time. If latency was run in
 kernel space (with the reported ngeative drift) a following latency test
 in user space showed also negative values but with no additional drift
 over time.
 
 I talked with Sebastian Smolorz about this and he builds his own
 independent kernel-config to check. He got the same drifting-effect
 with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
 hours. His kernel-config ist attached as
 'config-2.6.24-xenomai-2.4.3__ssm'.

 Our kernel-configs are both based on a config used with Xenomai 2.3.4
 and Linux 2.6.20.15 without any drifting effects.

 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
 not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
 values (that naturally show up when the system runs for a longer time).
 
 This hint seems to point into the right direction. I tried out a
 modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
 implementation in include/asm-generic/bits/pod.h was used. The drifting
 bug disappeared. So there seems so be a buggy x86-specific
 implementation of this routine.

Hmm, maybe even a conceptional issue: the multiply-shift-based
xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
may loose some bits, maybe too many bits...

It looks like this bites us in the kernel latency tests (-t2 should
suffer as well). Those recalculate their timeouts each round based on
absolute nanoseconds. In contrast, the periodic user mode task of -t0
uses a periodic timer that is forwarded via a tsc-based interval.

You (or Cornelius) could try to analyse the calculation path of the
involved timeouts, specifically to understand why the scheduled timeout
of the underlying task timer (which is tsc-based) tend to diverge from
the calculated one (ns-based).

TiA,
Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Gilles Chanteperdrix
On Wed, Apr 2, 2008 at 2:28 PM, Jan Kiszka [EMAIL PROTECTED] wrote:

 Sebastian Smolorz wrote:
   Jan Kiszka wrote:
   Cornelius Köpp wrote:
   Hello,
   I run the latency test from testsuite on several hard and software
   configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results
   shows a strange behavior: In Kernel mode (-t1) the latencys
   constantly linear decrease. See attached plot
   'drifting_latencys_in_kernelmode.png' of latency test running 48h on
   Pentium3 700. This effect could be reproduced, even on other hardware
   (Pentium-M 1400).
  
   As our P3 boards did not support APIC-based timing (IIRC), your kernel
   has correctly disabled the related kernel support. But the Pentium M
   should be fine. So could you check if we are seeing some TSC clocks
   vs. PIT timer rounding issue by enabling the local APIC on the Pentium M?
  
   There is no difference in enabling the local APIC on the Pentium M WRT
   this bug.
  
   The usermode (-t0) did not show a drifting, but is influenced by a
   test ran in kernelmode before.
  
   What do you mean with is influenced?
  
   Cornelius saw the following behaviour: If the latency test was run in
   user space first, no drift appeared over time. If latency was run in
   kernel space (with the reported ngeative drift) a following latency test
   in user space showed also negative values but with no additional drift
   over time.
  
   I talked with Sebastian Smolorz about this and he builds his own
   independent kernel-config to check. He got the same drifting-effect
   with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
   hours. His kernel-config ist attached as
   'config-2.6.24-xenomai-2.4.3__ssm'.
  
   Our kernel-configs are both based on a config used with Xenomai 2.3.4
   and Linux 2.6.20.15 without any drifting effects.
  
   2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
   not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
   values (that naturally show up when the system runs for a longer time).
  
   This hint seems to point into the right direction. I tried out a
   modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
   implementation in include/asm-generic/bits/pod.h was used. The drifting
   bug disappeared. So there seems so be a buggy x86-specific
   implementation of this routine.

  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
  xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
  may loose some bits, maybe too many bits...

If you want to know whether llmulshft implementation is broken on x86
or if there is a design issue, you can attempt to use the generic
implementation on x86.

-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Sebastian Smolorz
Gilles Chanteperdrix wrote:
 On Wed, Apr 2, 2008 at 2:28 PM, Jan Kiszka [EMAIL PROTECTED] wrote:
 Sebastian Smolorz wrote:
   Jan Kiszka wrote:
  
   2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
   not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
   values (that naturally show up when the system runs for a longer time).
  
   This hint seems to point into the right direction. I tried out a
   modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
   implementation in include/asm-generic/bits/pod.h was used. The drifting
   bug disappeared. So there seems so be a buggy x86-specific
   implementation of this routine.

  Hmm, maybe even a conceptional issue: the multiply-shift-based
  xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
  xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
  may loose some bits, maybe too many bits...
 
 If you want to know whether llmulshft implementation is broken on x86
 or if there is a design issue, you can attempt to use the generic
 implementation on x86.
 

You mean not using rthal_llmulshft() in arith_32.h and instead using 
__rthal_generic_llmulshft()? I tried this and it's also suffering from 
the drift although it seems that the drift per time unit is smaller in 
the generic case. I will try to get some numbers to compare the values 
returned from rthal_llmulshft(), __rthal_generic_llmulshft() and 
__rthal_generic_ullimd().

-- 
Sebastian

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Sebastian Smolorz
Jan Kiszka wrote:
 Sebastian Smolorz wrote:
 Jan Kiszka wrote:
 Cornelius Köpp wrote:
 Hello,
 I run the latency test from testsuite on several hard and software
 configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results
 shows a strange behavior: In Kernel mode (-t1) the latencys
 constantly linear decrease. See attached plot
 'drifting_latencys_in_kernelmode.png' of latency test running 48h on
 Pentium3 700. This effect could be reproduced, even on other hardware
 (Pentium-M 1400).
 As our P3 boards did not support APIC-based timing (IIRC), your kernel
 has correctly disabled the related kernel support. But the Pentium M
 should be fine. So could you check if we are seeing some TSC clocks
 vs. PIT timer rounding issue by enabling the local APIC on the Pentium M?
 There is no difference in enabling the local APIC on the Pentium M WRT
 this bug.

 The usermode (-t0) did not show a drifting, but is influenced by a
 test ran in kernelmode before.
 What do you mean with is influenced?
 Cornelius saw the following behaviour: If the latency test was run in
 user space first, no drift appeared over time. If latency was run in
 kernel space (with the reported ngeative drift) a following latency test
 in user space showed also negative values but with no additional drift
 over time.

Correction: The initial negative drift when starting user mode latency 
does not depend on a former run of latency in kernel mode but on the 
time passed between system start and the starting point of latency -t0. 
Or, as explained below, it depends on the value of the TSC.


 I talked with Sebastian Smolorz about this and he builds his own
 independent kernel-config to check. He got the same drifting-effect
 with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
 hours. His kernel-config ist attached as
 'config-2.6.24-xenomai-2.4.3__ssm'.

 Our kernel-configs are both based on a config used with Xenomai 2.3.4
 and Linux 2.6.20.15 without any drifting effects.
 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
 not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
 values (that naturally show up when the system runs for a longer time).
 This hint seems to point into the right direction. I tried out a
 modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
 implementation in include/asm-generic/bits/pod.h was used. The drifting
 bug disappeared. So there seems so be a buggy x86-specific
 implementation of this routine.
 
 Hmm, maybe even a conceptional issue: the multiply-shift-based
 xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
 xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
 may loose some bits, maybe too many bits...
 
 It looks like this bites us in the kernel latency tests (-t2 should
 suffer as well). Those recalculate their timeouts each round based on
 absolute nanoseconds. In contrast, the periodic user mode task of -t0
 uses a periodic timer that is forwarded via a tsc-based interval.
 
 You (or Cornelius) could try to analyse the calculation path of the
 involved timeouts, specifically to understand why the scheduled timeout
 of the underlying task timer (which is tsc-based) tend to diverge from
 the calculated one (ns-based).

So here comes the explanation. The error is inside the function 
rthal_llmulshft(). It returns wrong values which are too small - the 
higher the given TSC value the bigger the error. The function 
rtdm_clock_read_monotonic() calls rthal_llmulshft(). As 
rtdm_clock_read_monotonic() is called every time the latency kernel 
thread runs [1] the values reported by latency become smaller over time.

In contrast, the latency task in user space only uses the conversion 
from TSC to ns only once when calling rt_timer_inquire [2]. 
timer_info.date is too small, timer_info.tsc is right. So all calculated 
  deltas in [3] are shifted to a smaller value. This value is constant 
during the runtime of lateny in user space because no more conversion 
from TSC to ns occurs.


[1] 
http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/drivers/testing/timerbench.c#166
[2] 
http://www.rts.uni-hannover.de/xenomai/lxr/source/src/testsuite/latency/latency.c#076
[3] 
http://www.rts.uni-hannover.de/xenomai/lxr/source/src/testsuite/latency/latency.c#111


-- 
Sebastian

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Gilles Chanteperdrix
On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
[EMAIL PROTECTED] wrote:

 Jan Kiszka wrote:
   Sebastian Smolorz wrote:
   Jan Kiszka wrote:
   Cornelius Köpp wrote:
   Hello,
   I run the latency test from testsuite on several hard and software
   configurations. Running on Xenomai 2.4.2, Linux 2.6.24 the results
   shows a strange behavior: In Kernel mode (-t1) the latencys
   constantly linear decrease. See attached plot
   'drifting_latencys_in_kernelmode.png' of latency test running 48h on
   Pentium3 700. This effect could be reproduced, even on other hardware
   (Pentium-M 1400).
   As our P3 boards did not support APIC-based timing (IIRC), your kernel
   has correctly disabled the related kernel support. But the Pentium M
   should be fine. So could you check if we are seeing some TSC clocks
   vs. PIT timer rounding issue by enabling the local APIC on the Pentium M?
   There is no difference in enabling the local APIC on the Pentium M WRT
   this bug.
  
   The usermode (-t0) did not show a drifting, but is influenced by a
   test ran in kernelmode before.
   What do you mean with is influenced?
   Cornelius saw the following behaviour: If the latency test was run in
   user space first, no drift appeared over time. If latency was run in
   kernel space (with the reported ngeative drift) a following latency test
   in user space showed also negative values but with no additional drift
   over time.

  Correction: The initial negative drift when starting user mode latency
  does not depend on a former run of latency in kernel mode but on the
  time passed between system start and the starting point of latency -t0.
  Or, as explained below, it depends on the value of the TSC.



  
   I talked with Sebastian Smolorz about this and he builds his own
   independent kernel-config to check. He got the same drifting-effect
   with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
   hours. His kernel-config ist attached as
   'config-2.6.24-xenomai-2.4.3__ssm'.
  
   Our kernel-configs are both based on a config used with Xenomai 2.3.4
   and Linux 2.6.20.15 without any drifting effects.
   2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
   not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
   values (that naturally show up when the system runs for a longer time).
   This hint seems to point into the right direction. I tried out a
   modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
   implementation in include/asm-generic/bits/pod.h was used. The drifting
   bug disappeared. So there seems so be a buggy x86-specific
   implementation of this routine.
  
   Hmm, maybe even a conceptional issue: the multiply-shift-based
   xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
   xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
   may loose some bits, maybe too many bits...
  
   It looks like this bites us in the kernel latency tests (-t2 should
   suffer as well). Those recalculate their timeouts each round based on
   absolute nanoseconds. In contrast, the periodic user mode task of -t0
   uses a periodic timer that is forwarded via a tsc-based interval.
  
   You (or Cornelius) could try to analyse the calculation path of the
   involved timeouts, specifically to understand why the scheduled timeout
   of the underlying task timer (which is tsc-based) tend to diverge from
   the calculated one (ns-based).

  So here comes the explanation. The error is inside the function
  rthal_llmulshft(). It returns wrong values which are too small - the
  higher the given TSC value the bigger the error. The function
  rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
  rtdm_clock_read_monotonic() is called every time the latency kernel
  thread runs [1] the values reported by latency become smaller over time.

  In contrast, the latency task in user space only uses the conversion
  from TSC to ns only once when calling rt_timer_inquire [2].
  timer_info.date is too small, timer_info.tsc is right. So all calculated
   deltas in [3] are shifted to a smaller value. This value is constant
  during the runtime of lateny in user space because no more conversion
  from TSC to ns occurs.

latency does conversions from tsc to ns, but it converts time
differences, so the error is small relative to the results. In
contrast, doing substractions of conversion results is wrong. In other
words, doing:

start = rt_timer_tsc();
stop = rt_timer_tsc();
diffns = rt_timer_tsc2ns(stop - start);

is right. Whereas doing:

start = rt_timer_tsc2ns(rt_timer_tsc());
stop = rt_timer_tsc2ns(rt_timer_tsc());
diffns = stop - start;

is wrong.


-- 
 Gilles

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)

2008-04-02 Thread Sebastian Smolorz
Gilles Chanteperdrix wrote:
 On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
 [EMAIL PROTECTED] wrote:
 Jan Kiszka wrote:
   Sebastian Smolorz wrote:
   Jan Kiszka wrote:
   Cornelius Köpp wrote:
   I talked with Sebastian Smolorz about this and he builds his own
   independent kernel-config to check. He got the same drifting-effect
   with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over several
   hours. His kernel-config ist attached as
   'config-2.6.24-xenomai-2.4.3__ssm'.
  
   Our kernel-configs are both based on a config used with Xenomai 2.3.4
   and Linux 2.6.20.15 without any drifting effects.
   2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it is
   not a PIC vs. APIC thing, but rather a rounding problem of larger TSC
   values (that naturally show up when the system runs for a longer time).
   This hint seems to point into the right direction. I tried out a
   modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the old
   implementation in include/asm-generic/bits/pod.h was used. The drifting
   bug disappeared. So there seems so be a buggy x86-specific
   implementation of this routine.
  
   Hmm, maybe even a conceptional issue: the multiply-shift-based
   xnarch_tsc_to_ns is not as precise as the still multiply-divide-based
   xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, we
   may loose some bits, maybe too many bits...
  
   It looks like this bites us in the kernel latency tests (-t2 should
   suffer as well). Those recalculate their timeouts each round based on
   absolute nanoseconds. In contrast, the periodic user mode task of -t0
   uses a periodic timer that is forwarded via a tsc-based interval.
  
   You (or Cornelius) could try to analyse the calculation path of the
   involved timeouts, specifically to understand why the scheduled timeout
   of the underlying task timer (which is tsc-based) tend to diverge from
   the calculated one (ns-based).

  So here comes the explanation. The error is inside the function
  rthal_llmulshft(). It returns wrong values which are too small - the
  higher the given TSC value the bigger the error. The function
  rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
  rtdm_clock_read_monotonic() is called every time the latency kernel
  thread runs [1] the values reported by latency become smaller over time.

  In contrast, the latency task in user space only uses the conversion
  from TSC to ns only once when calling rt_timer_inquire [2].
  timer_info.date is too small, timer_info.tsc is right. So all calculated
   deltas in [3] are shifted to a smaller value. This value is constant
  during the runtime of lateny in user space because no more conversion
  from TSC to ns occurs.
 
 latency does conversions from tsc to ns, but it converts time
 differences, so the error is small relative to the results.

Of course. I wasn't precise with my last statement. It should be: No 
more conversions from *absolute* TSC values to ns occur.

-- 
Sebastian

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core