Re: [fusion] compiled-in aperiodic mode

Jan Kiszka Wed, 06 Jul 2005 19:19:40 +0200

Philippe Gerum wrote:

Jan Kiszka wrote:
...
I do not understand yet the portability problem of TSC values, but theinconsistency issue on SMP systems was new to me and make your choicea bit more comprehensible. Does moving to nanoseconds really solve theproblem? The clocks used in aperiodic mode will not run synchronisednor will they be started at (almost) the same time, will they? So, allabsolute times (taken on CPU A and applied on CPU B) might still beinconsistent.
We hold per-cpu timer lists, so TSC bases are never mixed, even iftimers can be migrated explicitely, but in such a case, they are alwaysre-inited. IOW, outstanding timers don't migrate.


Ok, you handle different relative times in a SMP-safe way, I'm convinced.

What about the absolute meaning of time stamps? We once measured a few
10 us drift per second between the TSCs of a Pentium II (400 MHz) and a
Pentium I (133 MHz). Maybe new hardware with almost identical CPUs can
provide slightly better results, but there will be an absolute drift
over the time which make time stamps taken on one CPU uncomparable with
those taken on others. Are there any re-synchronisation mechanisms in
fusion? Or has the user (of rt_timer_read e.g.) tell the different clock
sources apart?

I would personally vote for a more consistent way: either ticks ornanoseconds - in both modes. Actually, I would prefer the tickvariant at nucleus level.
What would such tick represent in aperiodic mode, time-wise?
For me, ticks are what the system timer uses internally in therespective mode: TSCs?! Then it becomes more understandable to thedeveloper that one always has to convert human-processable time tointernal units.
I feel that you would be doing the job of the interface, assuming poorlywritten conversion routines and particular patterns of use that wouldmake your app issue conversion requests at 50Khz. However, AFAIK,POSIX's timespecs do not deal with TSCs either, likely because thecommon pattern of use for timed services involves heavy-weight blockingoperations, and you don't do those inside a tight loop, unless yoursystem is going wild and never blocks. There, my perception is that thecost of converting nanoseconds to ticks is nothing compared to the costof grabbing the resource associated to a synchronization object or eventhe task being put to sleep.Now, for the particular case of a periodic task with say, a 20 us workcycle, the timeout as nanoseconds is not an issue, since the period isconverted once when the task is made periodic, then the internallyconverted value is used with no further conversion needed by thenucleus, in order to wait for the next release point in the CPU timeline.
Well, it's indeed just a matter of perception, and I understand thatpeople wanting to get the maximum of each CPU cycle on a 386 might findmy reasoning totally bugous. This said, everything is a question ofbalance, and I strongly feel that there are many other much moretime-consuming issues to solve, which unlike - the conversions - are onthe critical path. What I think we buy with the nanosecond-basedtimespecs is ease-of-use, portability and consistency. But hey, maybeit's just me returning -ENOBRAIN in wild loop?! o:) In any case, I willhave no problem recognizing that I've been wrong on this, but for that,I would need actual facts and figures on common real-world test cases.


Ok, here are some facts. We did some testings with recent fusion,
with ancient RTAI code and a the clock read function of upcoming RTDM.

(All number taken on a Pentium MMX 266, for us "advanced" low-end)

xnarch_tsc_to_ns():   <200 ns, typical ~100 ns
inlined count2nano(): <600 ns, typical ~400 ns

xnarch_tsc_to_ns():   <400 ns, typical ~250 ns
inlined count2nano(): <600 ns, typical ~400 ns

rtdm_clock_read():    <1300 ns, typical ~500 ns
(basically implements rt_timer_read(), but always returns nanoseconds)

A typical rdtsc takes about 50 ns on that box.

I think we are measuring a lot of cache noise here. Especially thequestion "inline or not inline" is hard to decide. The size ofrtdm_clock_read due to converting both tsc and ticks to ns issignificant enough to make it a real function as it is now. Maybe I willalso provide it as optional inline later, but this will require firstsome measuring on a real driver with real load.

Anyway, the numbers we had in mind were a bit higher (few microseconds)and likely taken on slower box (Pentium I 166). So, these dimensionsabove are encouraging, and we can likely live quite carefree withnanoseconds for the RTDM API.

Frankly, a really bothering issue regarding timer performance for me isclearly not conversion issues, but rather the way outstanding aperiodictimers are linked to a single queue, so that a braindead linear searchis required to add a timer to the list. That, is clearly something wewill work on as part of our scalability effort.

Good point, already any plans how to avoid this? Some kind of searchtree instead of a list? The problem is likely that you cannot make a lot

of helpful assumptions about the distribution of timeout values, can you?

Anyway, maybe future hardware will also help. The HPET spec contains atleast the option to have multiple events registered at the same time,not just one as with classic timers. Other architectures also tend tocome with an increasing number of event timers.

Well, I just read through xntimer_do_timers again and I remembered itcorrectly: you don't actually make use of nanoseconds as tickabstraction in aperiodic mode. Otherwise, you should store all timerdates as nanoseconds and convert the current time to ns beforestarting to compare it with the pending timers. This way it is nowshould not provide any advantage on SMP boxes, should it?
The advantage of using nanoseconds externally is meant for the user, notfor the timing code which only deals with TSCs internally, added to thefact that the per-cpu timer lists are used in SMP, so no timebasemismatch occurs.


To sum up: There might be a few microseconds to save under heavy load on
real low-end boxes by pre-converting ns to tsc. But this would break the
API portability to SMP systems. I'm convinced.

For me, the only issue remaining is that the meaning of the "official"
time unit passed to nucleus calls varies over the timer modes. But this
is something we can catch in the RTDM API: there are only 64 bits of
nanoseconds now. :)

Thanks for your patience,
Jan

PS: More news on RTDM will follow soon, we just have to test that newserial driver on a real hardware and fix remaining bugs in the RTDM layer.

Re: [fusion] compiled-in aperiodic mode

Reply via email to