I finally extracted a little meaning out of the sysinfo surveys I've been doing of NTP pool servers. I looked at the reported root dispersion for the 800 NTP pool servers that answered my sysinfo requests. The answer is we're just fine. Here's a histogram:
http://192.168.0.64/plotteer/url/~nelson/tmp/dispersions2.csv/histogram.png?bins=101&width=600&height=300&xlabel=Root%20Dispersion&dataColor=808000

This graph tells us that we're doing pretty well for time accuracy. The vast majority of our servers have a root dispersion under 0.1 seconds (100ms). They should be just fine for casual Internet synchronization.

We have a few clocks that look bad, including 6 that have root dispersion more than 500ms. I took a look by hand at the three worst. The pool scoring/offset system had already discarded two of them, it looks like they just lost their synchronization entirely. The third one is in the pool rotation with a high score, but if I'm reading the peer scoreboard correctly it's having real trouble talking to its stratum 1 sources.

In any event almost all our servers look just fine and are good for synchronization easily to 100ms, maybe even better. Not too worried about the few bad servers; NTP is pretty good at sorting out good time even with a few bad clocks in the mix. FWIW, the histogram looked just about the same a few days ago. I may try tracking the median dispersion over time to see how it trends, not sure if that'll be very interesting.

One detail I hope I have right; is "root dispersion" a true upper bound on time measurement error? As I understand it, root dispersion is NTP's own measure of a clock's inaccuracy, an upper bound on error that is a complex calculation of network delay and jitter up the chain to the stratum 0 source. A root dispersion of 0.1 seconds means that the time that server reports should be within 0.1 seconds of the root time. I'm a fuzzy on whether the reported root dispersion includes the error of the stratum 0 time source itself. If it does, then does root dispersion of 0.1 second mean you can measure time accurately to within 100ms of true time?



Is this little survey interesting? What other statistics would you like to know from our pool servers? I've got sysinfo data going back almost a month now for about 800 servers. Reports once every hour or two like this:

Sun Oct 28 23:20:09 UTC 2007
system peer:          139.78.133.139
system peer mode:     client
leap indicator:       00
stratum:              2
precision:            -20
root distance:        0.01172 s
root dispersion:      0.03511 s
reference ID:         [139.78.133.139]
reference time:       cacf9420.7a704897  Sun, Oct 28 2007 23:05:04.478
system flags:         auth monitor ntp kernel stats
jitter:               0.004410 s
stability:            0.000 ppm
broadcastdelay:       0.003998 s
authdelay:            0.000001 s



_______________________________________________
timekeepers mailing list
[email protected]
https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers

Reply via email to