[Xenomai-core] Getting the clock model right
Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). While this can be addressed by reviewing the code and fixing what is wrong, the more severe issue is that we cannot help the application or driver developer to cope with unsynch'ed time stamps properly. We either have to propagate dates as a tuple of nanoseconds (or ticks) and clock ID (TSC of CPU x, RTC, remote clock, etc.), or we should try harder to provide consistent time across the whole system. The former means API breakage in many places, so I'm more and more convinced that it is the _wrong_ path. That leaves us with option 2 (please point me to any other alternative, I don't see them). Let's take a step back again and look at why we currently claim that unsynch'ed per-CPU clocks is the official model in Xenomai: certain multi-processor or multi-core systems (specifically x86 and x86_64) do not provide synchronised TSCs across all nodes, neither with respect to their offsets nor regarding drifts due to transient freezing of TSCs. That's a pity for now, but it will not remain so on the long-term. On one side, there are alternatives, specifically HPET. On the other, CPU manufactures realised that TSCs are used for timekeeping these days and promise to fix the issue in hardware [1]. So we should really forget about designing around this shortcoming of today's hardware and rather look for viable workarounds until the sun breaks though again. That means we need A) drift detection and alarming (highest prio to-do) B) offset and drift compensation where feasible C) support for alternatives (= HPET-based clock source) Regarding B): The issue should actually be not that tricky for most reasonable systems. We already rely on consistent, monotonic CPU-local TSCs (which implies switching off power management e.g.). Thus we should see only small drifts in reality that should be manageable, no? Comments and thoughts are welcome. I would really like to see a clear roadmap for this (IMHO) important issue before 2.4 gets on the road. Also, I would like to draw a line and add things like timers to the next RTDM revision - also before 2.4. BTW, there is another to-do regarding the time subsystem: optimised tsc-to-ns conversion (and vice versa), including uninlining of those huge functions. When looking at this, considering to implant some means for smoothly adjusting clocks during runtime would be great. I'm thinking about a generic infrastructure to synchronise the Xenomai time on external sources (=distributed clocks). Jan [1] http://developer.amd.com/article_print.jsp?id=92 signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Jan Kiszka wrote: Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). Reading the code, there seem to be only two places where the local tsc is used to set a remote timer, it is xntimer_start_aperiodic, and xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left with only one bug: starting a timer on the remote CPU, this could easily be implemented with a queue which would be handled by the timer IPI. (...) for smoothly adjusting clocks during runtime would be great. I'm thinking about a generic infrastructure to synchronise the Xenomai time on external sources (=distributed clocks). What do you want, NTP ? Or calling xnpod_settime periodically ? -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). Reading the code, there seem to be only two places where the local tsc is used to set a remote timer, it is xntimer_start_aperiodic, and xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left with only one bug: starting a timer on the remote CPU, this could easily be implemented with a queue which would be handled by the timer IPI. As I said: fixable based on thorough review - but only a minor part of the problem. (...) for smoothly adjusting clocks during runtime would be great. I'm thinking about a generic infrastructure to synchronise the Xenomai time on external sources (=distributed clocks). What do you want, NTP ? Or calling xnpod_settime periodically ? More like NTP: monotonic clocks that can be derived from custom synchronisation signals, maybe optimised/simplified with fact in mind that those signals should have bounded worst-case jitters (on a hard real-time system like Xenomai). We just implemented such synchronisation based on CAN and serial null-modem signals. RTnet comes with a high-quality distributed clock as well. We have not yet implemented a smart clock adjustment (specifically because our application only needs about 1 millisecond accuracy). Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). Reading the code, there seem to be only two places where the local tsc is used to set a remote timer, it is xntimer_start_aperiodic, and xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left with only one bug: starting a timer on the remote CPU, this could easily be implemented with a queue which would be handled by the timer IPI. As I said: fixable based on thorough review - but only a minor part of the problem. I fail to see the remaining part of the problem. -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). Reading the code, there seem to be only two places where the local tsc is used to set a remote timer, it is xntimer_start_aperiodic, and xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left with only one bug: starting a timer on the remote CPU, this could easily be implemented with a queue which would be handled by the timer IPI. As I said: fixable based on thorough review - but only a minor part of the problem. I fail to see the remaining part of the problem. Consider a simple scenario consisting of a shared communication device over which packets arrive and get time-stamped. Now, if applications that receive those packets sit on different, unsynchronised CPUs, they have to know on which CPU the time stamps were taken in order to relate them to other events correctly. Basically the same issue you have on distributed systems as well. If we leave the user with broken local clocks, we _must_ provide the information about the clock source. There are always scenarios where you _cannot_ separate your applications in a way that they run totally independent on different CPUs. And then we should provide means to synchronise the clocks, or the user has to re-invent the wheel over and over again. Given the latter, doing this inside the core in a transparent manner is far smarter. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, recent announcement of some new TSC synchronisation feature in RTAI made me stick my nose into this and think about the whole issue of clock synchronisation again. Well, let's not talk about RTAI details here, but they got one thing right: as long as we cannot handle unsynch'ed TSC on SMP, we need some detection and alarming as the bare minimum. Why can't we handle such cases yet? First, there seems to be still some bugs hidden in the core (one example: xntimer_start_aperiodic() uses the local time stamp to start a remote timer). Reading the code, there seem to be only two places where the local tsc is used to set a remote timer, it is xntimer_start_aperiodic, and xntimer_move_aperiodic, which is used by xntimer_migrate. So we are left with only one bug: starting a timer on the remote CPU, this could easily be implemented with a queue which would be handled by the timer IPI. As I said: fixable based on thorough review - but only a minor part of the problem. (...) for smoothly adjusting clocks during runtime would be great. I'm thinking about a generic infrastructure to synchronise the Xenomai time on external sources (=distributed clocks). What do you want, NTP ? Or calling xnpod_settime periodically ? More like NTP: monotonic clocks that can be derived from custom synchronisation signals, maybe optimised/simplified with fact in mind that those signals should have bounded worst-case jitters (on a hard real-time system like Xenomai). We just implemented such synchronisation based on CAN and serial null-modem signals. RTnet comes with a high-quality distributed clock as well. We have not yet implemented a smart clock adjustment (specifically because our application only needs about 1 millisecond accuracy). What about emitting Adeos events when receiving NTP corrections ? This way, we would avoid reinventing NTP ? IIRC, NTP was designed to synchronise clocks over unreliable and slow media. The math behind it /may/ be useful (though it may also turn out to be too heavy), but beyond that... Already seen a NTP-over-CAN realisation, e.g.? And what would NTP over standard network buy us on a Xenomai system when the accuracy of NTP time stamps taken under Linux gets additionally degraded by Xenomai activity? I thought about NTP for our problem for a short while, but then quickly dropped the idea due to lacking guarantees. Likely, we rather need something like IEEE 1588, but there are unfortunate patents around that protocol. signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Getting the clock model right
Jan Kiszka wrote: Gilles Chanteperdrix wrote: What about emitting Adeos events when receiving NTP corrections ? This way, we would avoid reinventing NTP ? IIRC, NTP was designed to synchronise clocks over unreliable and slow media. The math behind it /may/ be useful (though it may also turn out to be too heavy), but beyond that... Already seen a NTP-over-CAN realisation, e.g.? And what would NTP over standard network buy us on a Xenomai system when the accuracy of NTP time stamps taken under Linux gets additionally degraded by Xenomai activity? I thought about NTP for our problem for a short while, but then quickly dropped the idea due to lacking guarantees. Likely, we rather need something like IEEE 1588, but there are unfortunate patents around that protocol. Hmm, I think I just got distracted from my original idea. IEEE 1588, NTP, whatever, those are synchronisation protocols, designed for specific media. What we probably need in the Xenomai nucleus is a generic infrastructure to tune the local clock based on some offset and drift factor, however those were obtained. That leaves the door open for any protocol and communication media to exchange the required sync information. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core