Philippe Gerum wrote:
Jan Kiszka wrote:
...
I do not understand yet the portability problem of TSC values, but the
inconsistency issue on SMP systems was new to me and make your choice
a bit more comprehensible. Does moving to nanoseconds really solve the
problem? The clocks used in aperiodic mode will not run synchronised
nor will they be started at (almost) the same time, will they? So, all
absolute times (taken on CPU A and applied on CPU B) might still be
inconsistent.
We hold per-cpu timer lists, so TSC bases are never mixed, even if
timers can be migrated explicitely, but in such a case, they are always
re-inited. IOW, outstanding timers don't migrate.
Ok, you handle different relative times in a SMP-safe way, I'm convinced.
What about the absolute meaning of time stamps? We once measured a few
10 us drift per second between the TSCs of a Pentium II (400 MHz) and a
Pentium I (133 MHz). Maybe new hardware with almost identical CPUs can
provide slightly better results, but there will be an absolute drift
over the time which make time stamps taken on one CPU uncomparable with
those taken on others. Are there any re-synchronisation mechanisms in
fusion? Or has the user (of rt_timer_read e.g.) tell the different clock
sources apart?
I would personally vote for a more consistent way: either ticks or
nanoseconds - in both modes. Actually, I would prefer the tick
variant at nucleus level.
What would such tick represent in aperiodic mode, time-wise?
For me, ticks are what the system timer uses internally in the
respective mode: TSCs?! Then it becomes more understandable to the
developer that one always has to convert human-processable time to
internal units.
I feel that you would be doing the job of the interface, assuming poorly
written conversion routines and particular patterns of use that would
make your app issue conversion requests at 50Khz. However, AFAIK,
POSIX's timespecs do not deal with TSCs either, likely because the
common pattern of use for timed services involves heavy-weight blocking
operations, and you don't do those inside a tight loop, unless your
system is going wild and never blocks. There, my perception is that the
cost of converting nanoseconds to ticks is nothing compared to the cost
of grabbing the resource associated to a synchronization object or even
the task being put to sleep.
Now, for the particular case of a periodic task with say, a 20 us work
cycle, the timeout as nanoseconds is not an issue, since the period is
converted once when the task is made periodic, then the internally
converted value is used with no further conversion needed by the
nucleus, in order to wait for the next release point in the CPU timeline.
Well, it's indeed just a matter of perception, and I understand that
people wanting to get the maximum of each CPU cycle on a 386 might find
my reasoning totally bugous. This said, everything is a question of
balance, and I strongly feel that there are many other much more
time-consuming issues to solve, which unlike - the conversions - are on
the critical path. What I think we buy with the nanosecond-based
timespecs is ease-of-use, portability and consistency. But hey, maybe
it's just me returning -ENOBRAIN in wild loop?! o:) In any case, I will
have no problem recognizing that I've been wrong on this, but for that,
I would need actual facts and figures on common real-world test cases.
Ok, here are some facts. We did some testings with recent fusion,
with ancient RTAI code and a the clock read function of upcoming RTDM.
(All number taken on a Pentium MMX 266, for us "advanced" low-end)
xnarch_tsc_to_ns(): <200 ns, typical ~100 ns
inlined count2nano(): <600 ns, typical ~400 ns
xnarch_tsc_to_ns(): <400 ns, typical ~250 ns
inlined count2nano(): <600 ns, typical ~400 ns
rtdm_clock_read(): <1300 ns, typical ~500 ns
(basically implements rt_timer_read(), but always returns nanoseconds)
A typical rdtsc takes about 50 ns on that box.
I think we are measuring a lot of cache noise here. Especially the
question "inline or not inline" is hard to decide. The size of
rtdm_clock_read due to converting both tsc and ticks to ns is
significant enough to make it a real function as it is now. Maybe I will
also provide it as optional inline later, but this will require first
some measuring on a real driver with real load.
Anyway, the numbers we had in mind were a bit higher (few microseconds)
and likely taken on slower box (Pentium I 166). So, these dimensions
above are encouraging, and we can likely live quite carefree with
nanoseconds for the RTDM API.
Frankly, a really bothering issue regarding timer performance for me is
clearly not conversion issues, but rather the way outstanding aperiodic
timers are linked to a single queue, so that a braindead linear search
is required to add a timer to the list. That, is clearly something we
will work on as part of our scalability effort.
Good point, already any plans how to avoid this? Some kind of search
tree instead of a list? The problem is likely that you cannot make a lot
of helpful assumptions about the distribution of timeout values, can you?
Anyway, maybe future hardware will also help. The HPET spec contains at
least the option to have multiple events registered at the same time,
not just one as with classic timers. Other architectures also tend to
come with an increasing number of event timers.
Well, I just read through xntimer_do_timers again and I remembered it
correctly: you don't actually make use of nanoseconds as tick
abstraction in aperiodic mode. Otherwise, you should store all timer
dates as nanoseconds and convert the current time to ns before
starting to compare it with the pending timers. This way it is now
should not provide any advantage on SMP boxes, should it?
The advantage of using nanoseconds externally is meant for the user, not
for the timing code which only deals with TSCs internally, added to the
fact that the per-cpu timer lists are used in SMP, so no timebase
mismatch occurs.
To sum up: There might be a few microseconds to save under heavy load on
real low-end boxes by pre-converting ns to tsc. But this would break the
API portability to SMP systems. I'm convinced.
For me, the only issue remaining is that the meaning of the "official"
time unit passed to nucleus calls varies over the timer modes. But this
is something we can catch in the RTDM API: there are only 64 bits of
nanoseconds now. :)
Thanks for your patience,
Jan
PS: More news on RTDM will follow soon, we just have to test that new
serial driver on a real hardware and fix remaining bugs in the RTDM layer.