On 23 Dec, 2011, at 22:47 , Paul Sobey wrote: >>> I appreciate these may appear to be silly questions with obvious answers >>> - I am grateful in advance for your patience, and any research sources >>> you may direct me to. >> >> The best (and probably only possible) solution that does give you >> single-digit us is to route a PPS signal to each and every server, then use >> the network for approximate (~100 us) timing, with the PPS doing the last >> two orders of magnitude. > > Our problem will be that running coax around many sites to lots of machines, > many of which don't have serial ports (think blades), is both highly time > consuming and maintenance intensive. If we have to do it then we will but I'd > like a clear idea as to the whys before I start down that particular path. > > In particular at this stage I'm trying to understand more about the > theoretical accuracies obtainable under ideal conditions, and most important, > how to independently verify the results of any tweaks we might apply. Say I > have coalesence turned on a nic and I disable it - I'd like to be able to > determine the effect, if any of that change. Is it possible for ntpd (or > ptpd) to accurately determine its own accuracy, if that makes sense? If not > what techniques might I use to independently measure?
If you really want to do this, with either NTP (the protocol, maybe not ntpd the implementation) or PTP, then I think the place you need to start is, unfortunately, with your operating system kernels. I have a board which implements a clock which can be synchronized to the 10 MHz and 1 PPS outputs from a GPS receiver. The board's clock resolution is about 3 ns (i.e. a 320 MHz internal clock) and the PIO interface to the board is designed so that it should be possible to transfer time from the board clock to the computer's clock with no more than +/- 10 ns or so of ambiguity (call it +/- 20 ns to be safe). When I first used this with a stock NetBSD kernel (whose clock code I think was copied from FreeBSD at some point) I was a little bit surprised to find that, despite the low tens of nanoseconds of accuracy the hardware was capable of, sampling the card against system timestamps gave me a result which jittered by on the order of several microseconds. After looking at why this was, I found that the jitter was in fact coming from the system clock itself and was caused by the way clock adjustments are applied at clock interrupt time (I believe some of the complaints about "interrupt latency" of the serial PPS driver are in fact seeing this system clock jitter and blaming it on something else; my very brief measurement of that driver found that while the fixed interrupt latency is 100's of nanoseconds it is also relatively constant, with outliers which are fairly easy to filter). Needless to say, if you can't get your system clock stable to better than microseconds you are unlikely to be able to synchronize it to a network source at that level. I fixed this by replacing the clock code, instead computing the time as a linear function of the value of the underlying counter, and getting rid of the clock interrupt discrete adjustments altogether (except when the NTP adjustment interface is in use, though that's a whole other story), so now my system clock doesn't jitter. The second operating system issue that's useful to address, whether the data is coming from NTP or PTP, is the clock adjustment system call interface. In particular, there are huge advantages to be gained by having a system call interface which allows you to make both clock frequency (i.e. rate of clock advance) and time offset adjustments, and which makes the adjustments you tell it to with great precision (or at least, tells you precisely what it did). The reason this is advantageous would require a long explanation, but the summary is that it allows you to treat the clock control process as solely a measurement process, rather than a feedback control process, and this makes it possible to begin to look at a broader variety of filtering procedures for incoming data to try to maximize the signal while minimizing the noise, without the additional burden of having to consider the stability (in the control system sense) of the adjustment process. The adjustments can be done open-loop. I believe that the operating system work described above, plus maybe some work on your ethernet card drivers, is necessary to achieve what you want with either with NTP or PTP. With my own implementation of the NTP daemon I can generally keep a client machine within 10 us of a server (measured with one of cards mentioned above in each machine) separated by (I think) 4 gigabit ethernet switches, I think with one 10 Gbps circuit in there, carrying company network traffic, with a 16 second polling interval. Note that I haven't tested this with ntpd yet, mostly because I don't like the way I had to jam support for the NTP system call interface into an otherwise very clean kernel time implementation but haven't yet had the time to try converting ntpd to use the native adjustment interface. I would note, however, that ntpd probably has some additional burden that it bears which makes this harder. In particular, while ntpd operates by essentially making a series of frequency adjustments to the system clock to bring it into synchronization, it also makes the assumption that the frequency adjustments it is asking the kernel to make may not be accurately implemented by the kernel. This is the fundamental reason it implements the control process as a PLL/FLL; it assumes it needs to correct not only the errors the underlying hardware clock is making but also the additional errors caused by unpredictably inaccurate implementation of the adjustments it is telling the kernel to make. This is why, even though it is possible to determine the system clock's frequency error as accurately as it is possible to know it in 10 or 15 minutes (according to the Allan variance typical of the process), ntpd can take hours and hours to work through a large frequency error. When ntpd does get to a point where it has integrated out this correction it should track better (assuming the changes in system clock frequency are small), but I haven't been able to test whether it does as well as a more straight forward procedure that takes advantage of the fact that the operating system makes accurate adjustments. In any case, if you want fine timing I think you need to work on your operating systems first. That is the low hanging fruit. Dennis Ferguson _______________________________________________ questions mailing list [email protected] http://lists.ntp.org/listinfo/questions
