Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> between some football half-times of the last days ;), I played a bit
>> with a hand-optimised xnarch_tsc_to_ns() for x86. Using scaled math, I
>> achieved between 3 (P-I 133 MHz) to 4 times (P-M 1.3 GHz) faster
>> conversions than with the current variant. While this optimisation only
>> saves a few ten nanoseconds on high-end, slow processors can gain
>> several hundreds of nanos per conversion (my P-133: -600 ns).
>>
> 
> I did exactely the same a few weeks ago, based on Anzinger's scaled math

:) We should coordinate better.

> from i386/kernel/timers/timer_tsc.c. And indeed, I had x 20 performance
> improvements in some cases.

Oops, that sounds like a bit too extreme optimisations. Is the original
version varying that much? I didn't observe this.

Here is my current version, BTW:

long tsc_scale;
unsigned int tsc_shift = 31;

static inline long long fast_tsc_to_ns(long long ts)
{
    long long ret;

    __asm__ (
        /* HI = HIWORD(ts) * tsc_scale */
        "mov  %%eax,%%ebx\n\t"
        "mov  %%edx,%%eax\n\t"
        "imull %2\n\t"
        "mov  %%eax,%%esi\n\t"
        "mov  %%edx,%%edi\n\t"

        /* LO = LOWORD(ts) * tsc_scale */
        "mov  %%ebx,%%eax\n\t"
        "mull %2\n\t"

        /* ret = (HI << 32) + LO */
        "add  %%esi,%%edx\n\t"
        "adc  $0,%%edi\n\t"

        /* ret = ret >> tsc_shift */
        "shrd %%cl,%%edx,%%eax\n\t"
        "shrd %%cl,%%edi,%%edx\n\t"
        : "=A"(ret)
        : "A" (ts), "m" (tsc_scale), "c" (tsc_shift)
        : "ebx", "esi", "edi");

    return ret;
}

void init_tsc(unsigned long cpu_freq)
{
    unsigned long long scale;

    while (1) {
        scale = do_div(1000000000LL << tsc_shift, cpu_freq);
        if (scale <= 0x7FFFFFFF)
            break;
        tsc_shift--;
    }
    tsc_scale = scale;
}

This version will use 31 (GHz cpu_freq) to 26 (~32 MHz) shifts, i.e. a
bit more than the Linux kernel's 22 bits.

> 
>> This does not come for free: accuracy of very large values is slightly
>> worse, but that's likely negligible compared to the clock accuracy of
>> TSCs (does anyone have any real numbers on the latter, BTW?).
>>
> 
> We do start losing significant precision for 2 ms delays and above,
> IIRC. This could be an issue for some events in aperiodic mode, albeit
> we could use a plain divide for those. The cost of conditionally doing
> this remains to be evaluated though.

Maybe I tested (not calculated - math is too hard for me :o)) the wrong
values, but I didn't see such high regressions.

> 
>> As we loose some bits the one way, converting back still requires "real"
>> division (i.e. the use of the existing slower xnarch_ns_to_tsc).
>> Otherwise, we would get significant errors already for small intervals.
>>
>> To avoid loosing the optimisation again in ns_to_tsc, I thought about
>> basing the whole internal timer arithmetics on nanoseconds instead of
>> TSCs as it is now. Although I dug quite a lot in the current timer
>> subsystem the last weeks, I may still oversee aspects and I'm
>> x86-biased. Therefore my question before thinking or even patching
>> further this way: What was the motivation to choose TSCs as internal
>> time base?
> 
> TSC are not the whole nucleus time base, but only the timer management
> one. The motivation to use TSCs in nucleus/timer.c was to pick a unit
> which would not require any conversion beyond the initial one in
> xntimer_start.

That helps strictly periodic application timers, not aperiodic ones like
timeouts.

> 
>> Any pitfalls down the road (except introducing regressions)?
> 
> Well, pitfalls expected from changing the core idea of time of the timer
> management code... :o>
> 

You mean turning

rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,RTHAL_CPU_FREQ));

into

rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,1000000000));

e.g. ?

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to