Jan Kiszka wrote:
Philippe Gerum wrote:

Here is likely why we have different levels of accuracy and performance,
firstly my version is bluntly based on the khz freq, secondly it
calculates the other way around, i.e. ns2tsc, so that tsc are keep in
the inner code, but more efficiently converted from ns counts passed to
the outer interface:

static unsigned long ns2cyc_scale;
#define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */

static inline void set_ns2cyc_scale(unsigned long cpu_khz)
   ns2cyc_scale = (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000;

static inline unsigned long long ns_2_cycles(unsigned long long ns)
   return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR;

Your version performs ~50% better than mine (outperforming the original
version by factor 7 on a 1 GHz box, vs. 4.8). I think you compared
non-optimised code, didn't you?

Nah, I'm not that drunk!

 Without -O2, I see 15 times better

Redone the check here on a Centrino 1.6Mhz, and still have roughly x20 improvement (a bit better actually). I'm using Debian/sarge gcc 3.3.5.

[Gilles variant yet refuses the get benchmarked.]




Xenomai-core mailing list

Reply via email to