Re: [fusion] compiled-in aperiodic mode

Philippe Gerum Wed, 13 Jul 2005 10:43:33 +0200

Jan Kiszka wrote:

Philippe Gerum wrote:
Jan Kiszka wrote:
...
I do not understand yet the portability problem of TSC values, butthe inconsistency issue on SMP systems was new to me and make yourchoice a bit more comprehensible. Does moving to nanoseconds reallysolve the problem? The clocks used in aperiodic mode will not runsynchronised nor will they be started at (almost) the same time, willthey? So, all absolute times (taken on CPU A and applied on CPU B)might still be inconsistent.
We hold per-cpu timer lists, so TSC bases are never mixed, even iftimers can be migrated explicitely, but in such a case, they arealways re-inited. IOW, outstanding timers don't migrate.
Ok, you handle different relative times in a SMP-safe way, I'm convinced.

What about the absolute meaning of time stamps? We once measured a few
10 us drift per second between the TSCs of a Pentium II (400 MHz) and a
Pentium I (133 MHz). Maybe new hardware with almost identical CPUs can
provide slightly better results, but there will be an absolute drift
over the time which make time stamps taken on one CPU uncomparable with
those taken on others. Are there any re-synchronisation mechanisms in
fusion? Or has the user (of rt_timer_read e.g.) tell the different clock
sources apart?


Nope.

I would personally vote for a more consistent way: either ticks ornanoseconds - in both modes. Actually, I would prefer the tickvariant at nucleus level.
What would such tick represent in aperiodic mode, time-wise?
For me, ticks are what the system timer uses internally in therespective mode: TSCs?! Then it becomes more understandable to thedeveloper that one always has to convert human-processable time tointernal units.
I feel that you would be doing the job of the interface, assumingpoorly written conversion routines and particular patterns of use thatwould make your app issue conversion requests at 50Khz. However,AFAIK, POSIX's timespecs do not deal with TSCs either, likely becausethe common pattern of use for timed services involves heavy-weightblocking operations, and you don't do those inside a tight loop,unless your system is going wild and never blocks. There, myperception is that the cost of converting nanoseconds to ticks isnothing compared to the cost of grabbing the resource associated to asynchronization object or even the task being put to sleep.Now, for the particular case of a periodic task with say, a 20 us workcycle, the timeout as nanoseconds is not an issue, since the period isconverted once when the task is made periodic, then the internallyconverted value is used with no further conversion needed by thenucleus, in order to wait for the next release point in the CPU timeline.
Well, it's indeed just a matter of perception, and I understand thatpeople wanting to get the maximum of each CPU cycle on a 386 mightfind my reasoning totally bugous. This said, everything is a questionof balance, and I strongly feel that there are many other much moretime-consuming issues to solve, which unlike - the conversions - areon the critical path. What I think we buy with the nanosecond-basedtimespecs is ease-of-use, portability and consistency. But hey, maybeit's just me returning -ENOBRAIN in wild loop?! o:) In any case, Iwill have no problem recognizing that I've been wrong on this, but forthat, I would need actual facts and figures on common real-world testcases.
Ok, here are some facts. We did some testings with recent fusion,
with ancient RTAI code and a the clock read function of upcoming RTDM.

(All number taken on a Pentium MMX 266, for us "advanced" low-end)

xnarch_tsc_to_ns():   <200 ns, typical ~100 ns
inlined count2nano(): <600 ns, typical ~400 ns

xnarch_tsc_to_ns():   <400 ns, typical ~250 ns
inlined count2nano(): <600 ns, typical ~400 ns

rtdm_clock_read():    <1300 ns, typical ~500 ns
(basically implements rt_timer_read(), but always returns nanoseconds)

A typical rdtsc takes about 50 ns on that box.
I think we are measuring a lot of cache noise here. Especially thequestion "inline or not inline" is hard to decide. The size ofrtdm_clock_read due to converting both tsc and ticks to ns issignificant enough to make it a real function as it is now. Maybe I willalso provide it as optional inline later, but this will require firstsome measuring on a real driver with real load.
Anyway, the numbers we had in mind were a bit higher (few microseconds)and likely taken on slower box (Pentium I 166). So, these dimensionsabove are encouraging, and we can likely live quite carefree withnanoseconds for the RTDM API.
Frankly, a really bothering issue regarding timer performance for meis clearly not conversion issues, but rather the way outstandingaperiodic timers are linked to a single queue, so that a braindeadlinear search is required to add a timer to the list. That, is clearlysomething we will work on as part of our scalability effort.
Good point, already any plans how to avoid this?

Yes, it's part of the undergoing scalability effort. Fact is that fusionis often used to migrate code running over traditional RTOS, and someapps have a slew of tasks and outstanding timers, so we need to addressthat. Dmitry is working on this.


 Some kind of search

tree instead of a list? The problem is likely that you cannot make a lot
of helpful assumptions about the distribution of timeout values, can you?


No, indeed. Some kind of RB-tree perhaps might help.

Anyway, maybe future hardware will also help. The HPET spec contains atleast the option to have multiple events registered at the same time,not just one as with classic timers. Other architectures also tend tocome with an increasing number of event timers.

This said, we still need a generic answer to this issue so that we canprovide a faily efficient service regardless of the hw architecture.Which should not prevent us from doing per-arch optims, but still, weneed to fix the common case first.

Well, I just read through xntimer_do_timers again and I remembered itcorrectly: you don't actually make use of nanoseconds as tickabstraction in aperiodic mode. Otherwise, you should store all timerdates as nanoseconds and convert the current time to ns beforestarting to compare it with the pending timers. This way it is nowshould not provide any advantage on SMP boxes, should it?
The advantage of using nanoseconds externally is meant for the user,not for the timing code which only deals with TSCs internally, addedto the fact that the per-cpu timer lists are used in SMP, so notimebase mismatch occurs.
To sum up: There might be a few microseconds to save under heavy load on
real low-end boxes by pre-converting ns to tsc. But this would break the
API portability to SMP systems. I'm convinced.

For me, the only issue remaining is that the meaning of the "official"
time unit passed to nucleus calls varies over the timer modes. But this
is something we can catch in the RTDM API: there are only 64 bits of
nanoseconds now. :)

Agreed; my preferred option would have been to use ns everywhere,regardless of the mode. I kept the ticks/ns split for compatibilityreasons with RTAI/classic, and because most traditional RTOS are usingticks internally since they only provide for periodic timing mode.

Thanks for your patience,
Jan
PS: More news on RTDM will follow soon, we just have to test that newserial driver on a real hardware and fix remaining bugs in the RTDM layer.
_______________________________________________
Rtai-dev mailing list
[EMAIL PROTECTED]
https://mail.gna.org/listinfo/rtai-dev



--

Philippe.

Re: [fusion] compiled-in aperiodic mode

Reply via email to