Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Hi,
>> between some football half-times of the last days ;), I played a bit
>> with a hand-optimised xnarch_tsc_to_ns() for x86. Using scaled math, I
>> achieved between 3 (P-I 133 MHz) to 4 times (P-M 1.3 GHz) faster
>> conversions than with the current variant. While this optimisation only
>> saves a few ten nanoseconds on high-end, slow processors can gain
>> several hundreds of nanos per conversion (my P-133: -600 ns).
> I did exactely the same a few weeks ago, based on Anzinger's scaled math

:) We should coordinate better.

> from i386/kernel/timers/timer_tsc.c. And indeed, I had x 20 performance
> improvements in some cases.

Oops, that sounds like a bit too extreme optimisations. Is the original
version varying that much? I didn't observe this.

Here is my current version, BTW:

long tsc_scale;
unsigned int tsc_shift = 31;

static inline long long fast_tsc_to_ns(long long ts)
    long long ret;

    __asm__ (
        /* HI = HIWORD(ts) * tsc_scale */
        "mov  %%eax,%%ebx\n\t"
        "mov  %%edx,%%eax\n\t"
        "imull %2\n\t"
        "mov  %%eax,%%esi\n\t"
        "mov  %%edx,%%edi\n\t"

        /* LO = LOWORD(ts) * tsc_scale */
        "mov  %%ebx,%%eax\n\t"
        "mull %2\n\t"

        /* ret = (HI << 32) + LO */
        "add  %%esi,%%edx\n\t"
        "adc  $0,%%edi\n\t"

        /* ret = ret >> tsc_shift */
        "shrd %%cl,%%edx,%%eax\n\t"
        "shrd %%cl,%%edi,%%edx\n\t"
        : "=A"(ret)
        : "A" (ts), "m" (tsc_scale), "c" (tsc_shift)
        : "ebx", "esi", "edi");

    return ret;

void init_tsc(unsigned long cpu_freq)
    unsigned long long scale;

    while (1) {
        scale = do_div(1000000000LL << tsc_shift, cpu_freq);
        if (scale <= 0x7FFFFFFF)
    tsc_scale = scale;

This version will use 31 (GHz cpu_freq) to 26 (~32 MHz) shifts, i.e. a
bit more than the Linux kernel's 22 bits.

>> This does not come for free: accuracy of very large values is slightly
>> worse, but that's likely negligible compared to the clock accuracy of
>> TSCs (does anyone have any real numbers on the latter, BTW?).
> We do start losing significant precision for 2 ms delays and above,
> IIRC. This could be an issue for some events in aperiodic mode, albeit
> we could use a plain divide for those. The cost of conditionally doing
> this remains to be evaluated though.

Maybe I tested (not calculated - math is too hard for me :o)) the wrong
values, but I didn't see such high regressions.

>> As we loose some bits the one way, converting back still requires "real"
>> division (i.e. the use of the existing slower xnarch_ns_to_tsc).
>> Otherwise, we would get significant errors already for small intervals.
>> To avoid loosing the optimisation again in ns_to_tsc, I thought about
>> basing the whole internal timer arithmetics on nanoseconds instead of
>> TSCs as it is now. Although I dug quite a lot in the current timer
>> subsystem the last weeks, I may still oversee aspects and I'm
>> x86-biased. Therefore my question before thinking or even patching
>> further this way: What was the motivation to choose TSCs as internal
>> time base?
> TSC are not the whole nucleus time base, but only the timer management
> one. The motivation to use TSCs in nucleus/timer.c was to pick a unit
> which would not require any conversion beyond the initial one in
> xntimer_start.

That helps strictly periodic application timers, not aperiodic ones like

>> Any pitfalls down the road (except introducing regressions)?
> Well, pitfalls expected from changing the core idea of time of the timer
> management code... :o>

You mean turning




e.g. ?


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to