Re: [Xenomai-core] ns vs. tsc as internal timer base

Philippe Gerum Tue, 13 Jun 2006 05:35:53 -0700

Jan Kiszka wrote:

Philippe Gerum wrote:

Jan Kiszka wrote:

Hi,

between some football half-times of the last days ;), I played a bit
with a hand-optimised xnarch_tsc_to_ns() for x86. Using scaled math, I
achieved between 3 (P-I 133 MHz) to 4 times (P-M 1.3 GHz) faster
conversions than with the current variant. While this optimisation only
saves a few ten nanoseconds on high-end, slow processors can gain
several hundreds of nanos per conversion (my P-133: -600 ns).


I did exactely the same a few weeks ago, based on Anzinger's scaled math



:) We should coordinate better.

The answer is published roadmap + todo list, but this requires someorganisation we have not been able to setup yet.

from i386/kernel/timers/timer_tsc.c. And indeed, I had x 20 performance
improvements in some cases.



Oops, that sounds like a bit too extreme optimisations. Is the original
version varying that much? I didn't observe this.

Here is my current version, BTW:

long tsc_scale;
unsigned int tsc_shift = 31;

static inline long long fast_tsc_to_ns(long long ts)
{
    long long ret;

    __asm__ (
        /* HI = HIWORD(ts) * tsc_scale */
        "mov  %%eax,%%ebx\n\t"
        "mov  %%edx,%%eax\n\t"
        "imull %2\n\t"
        "mov  %%eax,%%esi\n\t"
        "mov  %%edx,%%edi\n\t"

        /* LO = LOWORD(ts) * tsc_scale */
        "mov  %%ebx,%%eax\n\t"
        "mull %2\n\t"

        /* ret = (HI << 32) + LO */
        "add  %%esi,%%edx\n\t"
        "adc  $0,%%edi\n\t"

        /* ret = ret >> tsc_shift */
        "shrd %%cl,%%edx,%%eax\n\t"
        "shrd %%cl,%%edi,%%edx\n\t"
        : "=A"(ret)
        : "A" (ts), "m" (tsc_scale), "c" (tsc_shift)
        : "ebx", "esi", "edi");

    return ret;
}

void init_tsc(unsigned long cpu_freq)
{
    unsigned long long scale;

    while (1) {
        scale = do_div(1000000000LL << tsc_shift, cpu_freq);
        if (scale <= 0x7FFFFFFF)
            break;
        tsc_shift--;
    }
    tsc_scale = scale;
}

This version will use 31 (GHz cpu_freq) to 26 (~32 MHz) shifts, i.e. a
bit more than the Linux kernel's 22 bits.

Here is likely why we have different levels of accuracy and performance,firstly my version is bluntly based on the khz freq, secondly itcalculates the other way around, i.e. ns2tsc, so that tsc are keep inthe inner code, but more efficiently converted from ns counts passed tothe outer interface:


static unsigned long ns2cyc_scale;
#define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */

static inline void set_ns2cyc_scale(unsigned long cpu_khz)
{
    ns2cyc_scale = (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000;
}

static inline unsigned long long ns_2_cycles(unsigned long long ns)
{
    return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR;
}


TSC are not the whole nucleus time base, but only the timer management
one. The motivation to use TSCs in nucleus/timer.c was to pick a unit
which would not require any conversion beyond the initial one in
xntimer_start.



That helps strictly periodic application timers, not aperiodic ones like
timeouts.

It depends, periodic timers usually exhibit larger delays, so the gainis more significant with oneshot timings incurring smaller delays, hencea higher number of calculations.

Any pitfalls down the road (except introducing regressions)?


Well, pitfalls expected from changing the core idea of time of the timer
management code... :o>


You mean turning

rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,RTHAL_CPU_FREQ));

into

rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,1000000000));

Not really, it was a general remark about changing a code that mighthave some assumtions on using TSCs. Additionally, only x86 needs torescale TSC values to the timer frequency, other archs use the same uniton both sides, and such unit might even have nothing to do with any CPUaccounting (e.g. blackfin uses a free running timer, ppc uses theinternal timebase, etc).

This said, it should not have that many assumptions, and in any case,they should be confined to nucleus/timers.c. I think we should give thiskind of optimization a try.


--

Philippe.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] ns vs. tsc as internal timer base

Reply via email to