[Xenomai-core] Getting the clock model right

2007-04-06 Thread Jan Kiszka
Hi,

recent announcement of some new TSC synchronisation feature in RTAI made
me stick my nose into this and think about the whole issue of clock
synchronisation again. Well, let's not talk about RTAI details here, but
they got one thing right: as long as we cannot handle unsynch'ed TSC on
SMP, we need some detection and alarming as the bare minimum.

Why can't we handle such cases yet? First, there seems to be still some
bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
local time stamp to start a remote timer). While this can be addressed
by reviewing the code and fixing what is wrong, the more severe issue is
that we cannot help the application or driver developer to cope with
unsynch'ed time stamps properly. We either have to propagate dates as a
tuple of nanoseconds (or ticks) and clock ID (TSC of CPU x, RTC, remote
clock, etc.), or we should try harder to provide consistent time across
the whole system. The former means API breakage in many places, so I'm
more and more convinced that it is the _wrong_ path. That leaves us with
option 2 (please point me to any other alternative, I don't see them).


Let's take a step back again and look at why we currently claim that
unsynch'ed per-CPU clocks is the official model in Xenomai: certain
multi-processor or multi-core systems (specifically x86 and x86_64) do
not provide synchronised TSCs across all nodes, neither with respect to
their offsets nor regarding drifts due to transient freezing of TSCs.
That's a pity for now, but it will not remain so on the long-term. On
one side, there are alternatives, specifically HPET. On the other, CPU
manufactures realised that TSCs are used for timekeeping these days and
promise to fix the issue in hardware [1].

So we should really forget about designing around this shortcoming of
today's hardware and rather look for viable workarounds until the sun
breaks though again. That means we need

 A) drift detection and alarming (highest prio to-do)

 B) offset and drift compensation where feasible

 C) support for alternatives (= HPET-based clock source)

Regarding B): The issue should actually be not that tricky for most
reasonable systems. We already rely on consistent, monotonic CPU-local
TSCs (which implies switching off power management e.g.). Thus we should
see only small drifts in reality that should be manageable, no?

Comments and thoughts are welcome. I would really like to see a clear
roadmap for this (IMHO) important issue before 2.4 gets on the road.
Also, I would like to draw a line and add things like timers to the next
RTDM revision - also before 2.4.


BTW, there is another to-do regarding the time subsystem: optimised
tsc-to-ns conversion (and vice versa), including uninlining of those
huge functions. When looking at this, considering to implant some means
for smoothly adjusting clocks during runtime would be great. I'm
thinking about a generic infrastructure to synchronise the Xenomai time
on external sources (=distributed clocks).

Jan

[1] http://developer.amd.com/article_print.jsp?id=92



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Hi,
 
 recent announcement of some new TSC synchronisation feature in RTAI made
 me stick my nose into this and think about the whole issue of clock
 synchronisation again. Well, let's not talk about RTAI details here, but
 they got one thing right: as long as we cannot handle unsynch'ed TSC on
 SMP, we need some detection and alarming as the bare minimum.
 
 Why can't we handle such cases yet? First, there seems to be still some
 bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
 local time stamp to start a remote timer).

Reading the code, there seem to be only two places where the local tsc
is used to set a remote timer, it is xntimer_start_aperiodic, and
xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left
with only one bug: starting a timer on the remote CPU, this could easily
be implemented with a queue which would be handled by the timer IPI.


 (...)
 for smoothly adjusting clocks during runtime would be great. I'm
 thinking about a generic infrastructure to synchronise the Xenomai time
 on external sources (=distributed clocks).

What do you want, NTP ? Or calling xnpod_settime periodically ?

-- 
 Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 recent announcement of some new TSC synchronisation feature in RTAI made
 me stick my nose into this and think about the whole issue of clock
 synchronisation again. Well, let's not talk about RTAI details here, but
 they got one thing right: as long as we cannot handle unsynch'ed TSC on
 SMP, we need some detection and alarming as the bare minimum.

 Why can't we handle such cases yet? First, there seems to be still some
 bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
 local time stamp to start a remote timer).
 
 Reading the code, there seem to be only two places where the local tsc
 is used to set a remote timer, it is xntimer_start_aperiodic, and
 xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left
 with only one bug: starting a timer on the remote CPU, this could easily
 be implemented with a queue which would be handled by the timer IPI.
 

As I said: fixable based on thorough review - but only a minor part of
the problem.

 
 (...)
 for smoothly adjusting clocks during runtime would be great. I'm
 thinking about a generic infrastructure to synchronise the Xenomai time
 on external sources (=distributed clocks).
 
 What do you want, NTP ? Or calling xnpod_settime periodically ?
 

More like NTP: monotonic clocks that can be derived from custom
synchronisation signals, maybe optimised/simplified with fact in mind
that those signals should have bounded worst-case jitters (on a hard
real-time system like Xenomai).

We just implemented such synchronisation based on CAN and serial
null-modem signals. RTnet comes with a high-quality distributed clock as
well. We have not yet implemented a smart clock adjustment (specifically
because our application only needs about 1 millisecond accuracy).

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 
Jan Kiszka wrote:

Hi,

recent announcement of some new TSC synchronisation feature in RTAI made
me stick my nose into this and think about the whole issue of clock
synchronisation again. Well, let's not talk about RTAI details here, but
they got one thing right: as long as we cannot handle unsynch'ed TSC on
SMP, we need some detection and alarming as the bare minimum.

Why can't we handle such cases yet? First, there seems to be still some
bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
local time stamp to start a remote timer).

Reading the code, there seem to be only two places where the local tsc
is used to set a remote timer, it is xntimer_start_aperiodic, and
xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left
with only one bug: starting a timer on the remote CPU, this could easily
be implemented with a queue which would be handled by the timer IPI.

 
 
 As I said: fixable based on thorough review - but only a minor part of
 the problem.

I fail to see the remaining part of the problem.

-- 
 Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:

 Jan Kiszka wrote:

 Hi,

 recent announcement of some new TSC synchronisation feature in RTAI made
 me stick my nose into this and think about the whole issue of clock
 synchronisation again. Well, let's not talk about RTAI details here, but
 they got one thing right: as long as we cannot handle unsynch'ed TSC on
 SMP, we need some detection and alarming as the bare minimum.

 Why can't we handle such cases yet? First, there seems to be still some
 bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
 local time stamp to start a remote timer).
 Reading the code, there seem to be only two places where the local tsc
 is used to set a remote timer, it is xntimer_start_aperiodic, and
 xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left
 with only one bug: starting a timer on the remote CPU, this could easily
 be implemented with a queue which would be handled by the timer IPI.


 As I said: fixable based on thorough review - but only a minor part of
 the problem.
 
 I fail to see the remaining part of the problem.
 

Consider a simple scenario consisting of a shared communication device
over which packets arrive and get time-stamped. Now, if applications
that receive those packets sit on different, unsynchronised CPUs, they
have to know on which CPU the time stamps were taken in order to relate
them to other events correctly. Basically the same issue you have on
distributed systems as well.

If we leave the user with broken local clocks, we _must_ provide the
information about the clock source. There are always scenarios where you
_cannot_ separate your applications in a way that they run totally
independent on different CPUs. And then we should provide means to
synchronise the clocks, or the user has to re-invent the wheel over and
over again. Given the latter, doing this inside the core in a
transparent manner is far smarter.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:

 Jan Kiszka wrote:

 Hi,

 recent announcement of some new TSC synchronisation feature in RTAI made
 me stick my nose into this and think about the whole issue of clock
 synchronisation again. Well, let's not talk about RTAI details here, but
 they got one thing right: as long as we cannot handle unsynch'ed TSC on
 SMP, we need some detection and alarming as the bare minimum.

 Why can't we handle such cases yet? First, there seems to be still some
 bugs hidden in the core (one example: xntimer_start_aperiodic() uses the
 local time stamp to start a remote timer).
 Reading the code, there seem to be only two places where the local tsc
 is used to set a remote timer, it is xntimer_start_aperiodic, and
 xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left
 with only one bug: starting a timer on the remote CPU, this could easily
 be implemented with a queue which would be handled by the timer IPI.


 As I said: fixable based on thorough review - but only a minor part of
 the problem.


 (...)
 for smoothly adjusting clocks during runtime would be great. I'm
 thinking about a generic infrastructure to synchronise the Xenomai time
 on external sources (=distributed clocks).
 What do you want, NTP ? Or calling xnpod_settime periodically ?


 More like NTP: monotonic clocks that can be derived from custom
 synchronisation signals, maybe optimised/simplified with fact in mind
 that those signals should have bounded worst-case jitters (on a hard
 real-time system like Xenomai).

 We just implemented such synchronisation based on CAN and serial
 null-modem signals. RTnet comes with a high-quality distributed clock as
 well. We have not yet implemented a smart clock adjustment (specifically
 because our application only needs about 1 millisecond accuracy).
 
 What about emitting Adeos events when receiving NTP corrections ? This
 way, we would avoid reinventing NTP ?
 

IIRC, NTP was designed to synchronise clocks over unreliable and slow
media. The math behind it /may/ be useful (though it may also turn out
to be too heavy), but beyond that... Already seen a NTP-over-CAN
realisation, e.g.?

And what would NTP over standard network buy us on a Xenomai system when
the accuracy of NTP time stamps taken under Linux gets additionally
degraded by Xenomai activity? I thought about NTP for our problem for a
short while, but then quickly dropped the idea due to lacking guarantees.

Likely, we rather need something like IEEE 1588, but there are
unfortunate patents around that protocol.



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Getting the clock model right

2007-04-06 Thread Jan Kiszka
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 What about emitting Adeos events when receiving NTP corrections ? This
 way, we would avoid reinventing NTP ?

 
 IIRC, NTP was designed to synchronise clocks over unreliable and slow
 media. The math behind it /may/ be useful (though it may also turn out
 to be too heavy), but beyond that... Already seen a NTP-over-CAN
 realisation, e.g.?
 
 And what would NTP over standard network buy us on a Xenomai system when
 the accuracy of NTP time stamps taken under Linux gets additionally
 degraded by Xenomai activity? I thought about NTP for our problem for a
 short while, but then quickly dropped the idea due to lacking guarantees.
 
 Likely, we rather need something like IEEE 1588, but there are
 unfortunate patents around that protocol.
 

Hmm, I think I just got distracted from my original idea.

IEEE 1588, NTP, whatever, those are synchronisation protocols, designed
for specific media. What we probably need in the Xenomai nucleus is a
generic infrastructure to tune the local clock based on some offset and
drift factor, however those were obtained. That leaves the door open for
any protocol and communication media to exchange the required sync
information.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core