Jan Kiszka wrote:
Philippe Gerum wrote:

Jan Kiszka wrote:

...
I do not understand yet the portability problem of TSC values, but the inconsistency issue on SMP systems was new to me and make your choice a bit more comprehensible. Does moving to nanoseconds really solve the problem? The clocks used in aperiodic mode will not run synchronised nor will they be started at (almost) the same time, will they? So, all absolute times (taken on CPU A and applied on CPU B) might still be inconsistent.



We hold per-cpu timer lists, so TSC bases are never mixed, even if timers can be migrated explicitely, but in such a case, they are always re-inited. IOW, outstanding timers don't migrate.


Ok, you handle different relative times in a SMP-safe way, I'm convinced.

What about the absolute meaning of time stamps? We once measured a few
10 us drift per second between the TSCs of a Pentium II (400 MHz) and a
Pentium I (133 MHz). Maybe new hardware with almost identical CPUs can
provide slightly better results, but there will be an absolute drift
over the time which make time stamps taken on one CPU uncomparable with
those taken on others. Are there any re-synchronisation mechanisms in
fusion? Or has the user (of rt_timer_read e.g.) tell the different clock
sources apart?


Nope.



I would personally vote for a more consistent way: either ticks or nanoseconds - in both modes. Actually, I would prefer the tick variant at nucleus level.





What would such tick represent in aperiodic mode, time-wise?




For me, ticks are what the system timer uses internally in the respective mode: TSCs?! Then it becomes more understandable to the developer that one always has to convert human-processable time to internal units.


I feel that you would be doing the job of the interface, assuming poorly written conversion routines and particular patterns of use that would make your app issue conversion requests at 50Khz. However, AFAIK, POSIX's timespecs do not deal with TSCs either, likely because the common pattern of use for timed services involves heavy-weight blocking operations, and you don't do those inside a tight loop, unless your system is going wild and never blocks. There, my perception is that the cost of converting nanoseconds to ticks is nothing compared to the cost of grabbing the resource associated to a synchronization object or even the task being put to sleep. Now, for the particular case of a periodic task with say, a 20 us work cycle, the timeout as nanoseconds is not an issue, since the period is converted once when the task is made periodic, then the internally converted value is used with no further conversion needed by the nucleus, in order to wait for the next release point in the CPU timeline.

Well, it's indeed just a matter of perception, and I understand that people wanting to get the maximum of each CPU cycle on a 386 might find my reasoning totally bugous. This said, everything is a question of balance, and I strongly feel that there are many other much more time-consuming issues to solve, which unlike - the conversions - are on the critical path. What I think we buy with the nanosecond-based timespecs is ease-of-use, portability and consistency. But hey, maybe it's just me returning -ENOBRAIN in wild loop?! o:) In any case, I will have no problem recognizing that I've been wrong on this, but for that, I would need actual facts and figures on common real-world test cases.


Ok, here are some facts. We did some testings with recent fusion,
with ancient RTAI code and a the clock read function of upcoming RTDM.

(All number taken on a Pentium MMX 266, for us "advanced" low-end)

xnarch_tsc_to_ns():   <200 ns, typical ~100 ns
inlined count2nano(): <600 ns, typical ~400 ns

xnarch_tsc_to_ns():   <400 ns, typical ~250 ns
inlined count2nano(): <600 ns, typical ~400 ns

rtdm_clock_read():    <1300 ns, typical ~500 ns
(basically implements rt_timer_read(), but always returns nanoseconds)

A typical rdtsc takes about 50 ns on that box.

I think we are measuring a lot of cache noise here. Especially the question "inline or not inline" is hard to decide. The size of rtdm_clock_read due to converting both tsc and ticks to ns is significant enough to make it a real function as it is now. Maybe I will also provide it as optional inline later, but this will require first some measuring on a real driver with real load.

Anyway, the numbers we had in mind were a bit higher (few microseconds) and likely taken on slower box (Pentium I 166). So, these dimensions above are encouraging, and we can likely live quite carefree with nanoseconds for the RTDM API.

Frankly, a really bothering issue regarding timer performance for me is clearly not conversion issues, but rather the way outstanding aperiodic timers are linked to a single queue, so that a braindead linear search is required to add a timer to the list. That, is clearly something we will work on as part of our scalability effort.


Good point, already any plans how to avoid this?

Yes, it's part of the undergoing scalability effort. Fact is that fusion is often used to migrate code running over traditional RTOS, and some apps have a slew of tasks and outstanding timers, so we need to address that. Dmitry is working on this.

 Some kind of search
tree instead of a list? The problem is likely that you cannot make a lot
of helpful assumptions about the distribution of timeout values, can you?


No, indeed. Some kind of RB-tree perhaps might help.

Anyway, maybe future hardware will also help. The HPET spec contains at least the option to have multiple events registered at the same time, not just one as with classic timers. Other architectures also tend to come with an increasing number of event timers.


This said, we still need a generic answer to this issue so that we can provide a faily efficient service regardless of the hw architecture. Which should not prevent us from doing per-arch optims, but still, we need to fix the common case first.


Well, I just read through xntimer_do_timers again and I remembered it correctly: you don't actually make use of nanoseconds as tick abstraction in aperiodic mode. Otherwise, you should store all timer dates as nanoseconds and convert the current time to ns before starting to compare it with the pending timers. This way it is now should not provide any advantage on SMP boxes, should it?


The advantage of using nanoseconds externally is meant for the user, not for the timing code which only deals with TSCs internally, added to the fact that the per-cpu timer lists are used in SMP, so no timebase mismatch occurs.


To sum up: There might be a few microseconds to save under heavy load on
real low-end boxes by pre-converting ns to tsc. But this would break the
API portability to SMP systems. I'm convinced.

For me, the only issue remaining is that the meaning of the "official"
time unit passed to nucleus calls varies over the timer modes. But this
is something we can catch in the RTDM API: there are only 64 bits of
nanoseconds now. :)

Agreed; my preferred option would have been to use ns everywhere, regardless of the mode. I kept the ticks/ns split for compatibility reasons with RTAI/classic, and because most traditional RTOS are using ticks internally since they only provide for periodic timing mode.


Thanks for your patience,
Jan


PS: More news on RTDM will follow soon, we just have to test that new serial driver on a real hardware and fix remaining bugs in the RTDM layer.

_______________________________________________
Rtai-dev mailing list
[EMAIL PROTECTED]
https://mail.gna.org/listinfo/rtai-dev


--

Philippe.

Reply via email to