Re: [Xenomai-core] Timer optimisations, continued

2006-07-27 Thread Philippe Gerum
On Thu, 2006-07-27 at 15:54 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2006-07-27 at 14:42 +0200, Gilles Chanteperdrix wrote:
> >> Philippe Gerum wrote:
> >>  > >  o A further improvement should be achievable for scenarios 4 and 5 by
> >>  > >introducing absolute xntimers (more precisely: a flag to
> >>  > >differentiate between the mode on xntimer_start). I have an 
> >> outdated
> >>  > >patch for this in my repos, needs re-basing.
> >>  > > 
> >>  > 
> >>  > Grmblm... Well, I would have preferred that we don't add that kind of
> >>  > complexity to the nucleus interface, but I must admit that some
> >>  > important use cases are definitely better served by absolute timespecs,
> >>  > so I would surrender to this requirement, provided the implementation is
> >>  > confined to xnpod_suspend_thread() + xntimer_start().
> >>
> >> It would be nice if absolute timeouts were also available when using
> >> xnsynch_sleep_on. There are a few use cases in the POSIX skin.
> > 
> > Makes sense, since xnpod_suspend_thread() and xnsynch_sleep_on() are
> > tightly integrated interfaces.
> > 
> 
> Anyone any idea how to extend both function interfaces best to
> differentiate absolute/relative timeouts? I guess we need an additional
> argument to the functions, don't we?

Yes, I'm afraid we do. The other approach that would basically make the
timeout a non-scalar value in order to store the rel/abs qualifier would
be just overkill.

> 
> I had the weird idea of using the sign bit of the timeout value for
> this. But the potential side effects of halving the absolute time domain
> this way scares me.
> 

Same here, this looks like a very fragile solution to a general issue.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Timer optimisations, continued

2006-07-27 Thread Jan Kiszka
Philippe Gerum wrote:
> On Thu, 2006-07-27 at 14:42 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>  > >  o A further improvement should be achievable for scenarios 4 and 5 by
>>  > >introducing absolute xntimers (more precisely: a flag to
>>  > >differentiate between the mode on xntimer_start). I have an outdated
>>  > >patch for this in my repos, needs re-basing.
>>  > > 
>>  > 
>>  > Grmblm... Well, I would have preferred that we don't add that kind of
>>  > complexity to the nucleus interface, but I must admit that some
>>  > important use cases are definitely better served by absolute timespecs,
>>  > so I would surrender to this requirement, provided the implementation is
>>  > confined to xnpod_suspend_thread() + xntimer_start().
>>
>> It would be nice if absolute timeouts were also available when using
>> xnsynch_sleep_on. There are a few use cases in the POSIX skin.
> 
> Makes sense, since xnpod_suspend_thread() and xnsynch_sleep_on() are
> tightly integrated interfaces.
> 

Anyone any idea how to extend both function interfaces best to
differentiate absolute/relative timeouts? I guess we need an additional
argument to the functions, don't we?

I had the weird idea of using the sign bit of the timeout value for
this. But the potential side effects of halving the absolute time domain
this way scares me.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Timer optimisations, continued

2006-07-27 Thread Philippe Gerum
On Thu, 2006-07-27 at 14:42 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > >  o A further improvement should be achievable for scenarios 4 and 5 by
>  > >introducing absolute xntimers (more precisely: a flag to
>  > >differentiate between the mode on xntimer_start). I have an outdated
>  > >patch for this in my repos, needs re-basing.
>  > > 
>  > 
>  > Grmblm... Well, I would have preferred that we don't add that kind of
>  > complexity to the nucleus interface, but I must admit that some
>  > important use cases are definitely better served by absolute timespecs,
>  > so I would surrender to this requirement, provided the implementation is
>  > confined to xnpod_suspend_thread() + xntimer_start().
> 
> It would be nice if absolute timeouts were also available when using
> xnsynch_sleep_on. There are a few use cases in the POSIX skin.

Makes sense, since xnpod_suspend_thread() and xnsynch_sleep_on() are
tightly integrated interfaces.

> 
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Timer optimisations, continued

2006-07-27 Thread Gilles Chanteperdrix
Philippe Gerum wrote:
 > >  o A further improvement should be achievable for scenarios 4 and 5 by
 > >introducing absolute xntimers (more precisely: a flag to
 > >differentiate between the mode on xntimer_start). I have an outdated
 > >patch for this in my repos, needs re-basing.
 > > 
 > 
 > Grmblm... Well, I would have preferred that we don't add that kind of
 > complexity to the nucleus interface, but I must admit that some
 > important use cases are definitely better served by absolute timespecs,
 > so I would surrender to this requirement, provided the implementation is
 > confined to xnpod_suspend_thread() + xntimer_start().

It would be nice if absolute timeouts were also available when using
xnsynch_sleep_on. There are a few use cases in the POSIX skin.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Timer optimisations, continued

2006-07-27 Thread Philippe Gerum
On Tue, 2006-07-25 at 20:26 +0200, Jan Kiszka wrote:



> 
> To summarise these lengthy results:
> 
>  o ns-based xntimers are nice on first sight, but not on second. Most
>use-cases (except 5) require less conversions when we keep the
>abstraction as it is.
> 

The current approach was a deliberate choice to favour accuracy of
timers, at the - reasonably small - expense of not optimizing the
"timeout" use case. The net result is that the core timing code is
TSC-based, so that no time unit conversion occurs after a timer has been
started, except in the case where the hw timer has a different time unit
than the TSC used (this said, this last conversion before programmin
gthe hw timer would be needed regardless of the time unit maintained by
the timing core).

>  o Performance should be improvable by combining fast_tsc_to_ns for full
>64-bit conversions with nodiv_imuldiv for short relative ns-to-tsc.
>It should be ok to loose some accuracy wrt to long periods given that
>TSC are AFAIK not very accurate themselves. Nevertheless, to keep
>precision on 64-bit ns-to-tsc reverse conversions, those should
>remain implemented as they are:
>"if (ns <= ULONG_MAX) nodiv_imuldiv else xnarch_ns_to_tsc"
> 

I basically agree with that, including the 64/32 optimization on delay
ranges. IOW, we could optimize time conversions in the timing core
_locally_ (i.e. nucleus/timer.c exclusively) even at the expense of a
small loss of accuracy in the dedicated converters. In any case, we are
implicitely talking of the oneshot mode here, and as such, it would be
acceptable to trigger an early shot once in a while - i.e. due to the
loss of accuracy - that would cause the existing code to restart the
timer until it eventually elapses past the expected time, given that
this would only occur with large delays. But: we must leave the existing
converters as they are in the xnarch layer, keeping the most accurate
operations provided there, since a lot of code depends on their
accuracy.

>  o A further improvement should be achievable for scenarios 4 and 5 by
>introducing absolute xntimers (more precisely: a flag to
>differentiate between the mode on xntimer_start). I have an outdated
>patch for this in my repos, needs re-basing.
> 

Grmblm... Well, I would have preferred that we don't add that kind of
complexity to the nucleus interface, but I must admit that some
important use cases are definitely better served by absolute timespecs,
so I would surrender to this requirement, provided the implementation is
confined to xnpod_suspend_thread() + xntimer_start().

> To verify that we actually improve something with each of the changes
> above, some kind of fine-grained test suite will be required. The
> timerbench could be extended to support all 5 scenarios. But does
> someone have any quick idea how to evaluate the overall performances
> best? The new per-task statistics code is not accurate enough as it
> accounts IRQs mostly to the preempted task, not the preempting one. Mm,
> execution time of some long-running number-crunching Linux task in the
> background?

Better use a kernel-based low priority RT task running in the
background, limiting the sampling period to a duration that Linux could
bear with (maybe running multiple subsequent periods with warmup phases,
just to let the penguin breath). The effect of TLB misses would be much
lower, and no need to block the Linux IRQs using Xenomai's I-shield.

> Looking forward to feedback!
> 
> Jan
> 
> 
> PS: Finally, after stabilising the xntimers again, we will see a nice
> rtdm_timer API as well. But those patches need even more re-basing then...
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Timer optimisations, continued

2006-07-25 Thread Jan Kiszka
Hi all,

to continue the discussion about improving the timer subsystem,
specifically with respect to unit conversion overhead, I'm posting here
a (fairly long) report of my findings and consideration.

First of all I did some benchmarking of the various optimised conversion
routines that popped up. I stressed them on the different x86-platforms.
The numbers are for 1000 iterations (loop overhead compensated), used
compiler was gcc-4.1. Just to recall the actors:

xnarch_tsc_to_ns - original accurate 64-bit division for converting TSC
   ticks in nanoseconds (and vice versa)
fast_tsc_to_ns   - my scaled-math-based assembler variant, suffering
   from some inaccuracy for large intervals, still
   requires normal 64-bit muldiv for the ns-to-TSC
   return path
ns_2_cycles  - Philippe's similar version, a bit more inaccurate
nodiv_ullimd - Gilles' 64-bit conversion routine, only sometimes
   varying in the last bit from the original result
nodiv_imuldiv- Gilles' 32-bit div-less conversion for small
   intervals (haven't checked, but I assume it's as
   accurate as the 64-bit variant in the limited domain)

And here are the results (ugly test code available on request):

VIA C2, 600 MHz:
xnarch_tsc_to_ns:  160680 cycles /  267800 ns
fast_tsc_to_ns:119842 cycles /  199736 ns
ns_2_cycles:69376 cycles /  115626 ns
nodiv_ullimd:  179042 cycles /  298403 ns
nodiv_imuldiv:  41336 cycles /   68893 ns

P-III, 1GHz:
xnarch_tsc_to_ns:  108475 cycles /  107935 ns
fast_tsc_to_ns: 24127 cycles /   24006 ns
ns_2_cycles:21338 cycles /   21231 ns
nodiv_ullimd:   67974 cycles /   67635 ns
nodiv_imuldiv:  13269 cycles /   13202 ns

P-MMX, 266 MHz:
xnarch_tsc_to_ns:  131886 cycles /  495812 ns
fast_tsc_to_ns: 47697 cycles /  179312 ns
ns_2_cycles:43627 cycles /  164011 ns
nodiv_ullimd:  141915 cycles /  533515 ns
nodiv_imuldiv:  44761 cycles /  168274 ns

P-M, 1,3GHz:
xnarch_tsc_to_ns:  113219 cycles /   87091 ns
fast_tsc_to_ns: 26718 cycles /   20552 ns
ns_2_cycles:15024 cycles /   11556 ns
nodiv_ullimd:   49620 cycles /   38169 ns
nodiv_imuldiv:  17036 cycles /   13104 ns

Opteron 275 (32-bit mode), 1,8 GHz:
xnarch_tsc_to_ns:  112507 cycles /   62503 ns
fast_tsc_to_ns: 21857 cycles /   12142 ns
ns_2_cycles:12545 cycles /6969 ns
nodiv_ullimd:   41175 cycles /   22875 ns
nodiv_imuldiv:   7261 cycles /4033 ns

For sure, working with only 32-bit is the fastest variant on all
platforms. Other variants do not always perform well or have limited
accuracy. Unfortunately, 32-bit conversions cannot be applied on all
scenarios, we will see this below.


After hacking my fast_tsc_to_ns, my original plan was to switch the
internal timer base completely to nanoseconds in the hope to reduce the
number of conversions in the timer hot-paths. Luckily I decided to
analyse the typical scenarios first before starting the develop any
patch. I consider the following 5 scenarios for heavy timer usage. Both
TSC and nanoseconds as time base are analysed, also a potential
timer_start() variant that accepts absolute timeout values. The pseudo
code /should/ be self-explaining. If not do not hesitate to ask.


1. Periodic Timers
==

Start once, run continuously
=> hot-path is the timer IRQ


1.1 TSC-based
-

task_set_periodic(start, interval)  [rarely]
delay = start - get_time()
get_time(): tsc -> ns   [64-bit]
timer_start(delay, interval)
delay: ns -> tsc[32-bit candidate]
date = get_tsc() + delay
interval: ns -> tsc [32-bit candidate]
program_timer(date)
delay = date - get_tsc()
set_hw_timer(delay)
-or-
task_set_periodic(start, interval)  [rarely]
timer_start_abs(start, interval)
date: ns -> tsc [64-bit]
interval: ns -> tsc [32-bit candidate]
program_timer(date)
delay = date - get_tsc()
set_hw_timer(delay)

timer_irq() [hot-path]
date <= get_tsc()?
date = get_tsc() + interval
program_timer(date)
delay = date - get_tsc()
set_hw_timer(delay)


1.2 ns-based


task_set_periodic(start, interval)  [rarely]
delay = start - get_time()
get_time(): tsc -> ns   [64-bit]
timer_start(delay, interval)
date = get_time() + delay
get_time(): tsc -> ns   [64-bit]
program_timer(date)
date: ns -> tsc [64-bit]
delay = date - get