Jan Kiszka wrote:
 > Philippe Gerum wrote:
 > > Here is likely why we have different levels of accuracy and performance,
 > >  firstly my version is bluntly based on the khz freq, secondly it
 > > calculates the other way around, i.e. ns2tsc, so that tsc are keep in
 > > the inner code, but more efficiently converted from ns counts passed to
 > > the outer interface:
 > > 
 > > static unsigned long ns2cyc_scale;
 > > #define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 > > 
 > > static inline void set_ns2cyc_scale(unsigned long cpu_khz)
 > > {
 > >     ns2cyc_scale = (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000;
 > > }
 > > 
 > > static inline unsigned long long ns_2_cycles(unsigned long long ns)
 > > {
 > >     return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR;
 > > }
 > 
 > Your version performs ~50% better than mine (outperforming the original
 > version by factor 7 on a 1 GHz box, vs. 4.8). I think you compared
 > non-optimised code, didn't you? Without -O2, I see 15 times better
 > performance.
 > 
 > [Gilles variant yet refuses the get benchmarked.]

Since we accept a smaller range, I think you should benchmark
nodiv_imuldiv instead of nodiv_ullimd. And it should perform better
since it uses 32 bits shifts which are not real shifts.

-- 


                                            Gilles Chanteperdrix.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to