Paul Sobey wrote:
 Our internal testing to this point is that a stock ntpd pointed against
 a stratum 1 clock on a low contention gigabit ethernet (stratum 1 source
 and client less than 1ms apart) reports its own accuracy at approx 200
 microseconds. Further tuning the ntp config by adding the minpoll 4,
 maxpoll 6 and burst keywords result in ntpd reported accuracy dropping
 to within 10-20 microseconds (as reported by ntpq -p and borne out by
 loopstats). Further improvements can be made running ntpd in the RT
 priority class.

Good, you've done your homework! :-)

I've been trying! It's a challenging subject to get to grips with! A long way to go I think...

 My questions to you all, if you've read through the above waffle are:

 - what is a sensible expected accuracy of ntpd if pointed at several
 stratum 1 time sources across a low jitter gigabit network (we'd
 probably spread them over several UK and US sites for resiliency but all
 paths are low jitter and highly deterministic latency)

Gbit and low jitter is not quite compatible: 100 Mbit switches were using cut-through, while (afaik) all Gbit and up switches use store & forward, leading to higher latency and jitter.

There are several varieties of cut-through gig/10GB switch available now - but noting that store and forward adds latency is a good point. Presumably layer 3 switching (routing) is always store and forward since we're effectively writing a new packet. It's all done in hardware though - negligible jitter.

 - are there any obvious tunables to improve accuracy other than
 minpoll/burst and process scheduling class, and how agressive can the
 polling cycles be sensible made?

 - can ntpd's own reported offset (ntpq -p or loopstats) be trusted
 (assuming high priority means it gets scheduled as desired)? I've quoted
 our apparent numbers at several people and the response is always 'pfft
 you can't trust ntpd to know its own offset' - but nobody can ever tell
 me why

You can use ntpd's internal numbers to verify the maximum possible offset (half the round trip time), you should be able to use statistics to show that the jitter is quite low as well.

At the risk of pushing my luck, can you expand on this?

 I appreciate these may appear to be silly questions with obvious answers
 - I am grateful in advance for your patience, and any research sources
 you may direct me to.

The best (and probably only possible) solution that does give you single-digit us is to route a PPS signal to each and every server, then use the network for approximate (~100 us) timing, with the PPS doing the last two orders of magnitude.

Our problem will be that running coax around many sites to lots of machines, many of which don't have serial ports (think blades), is both highly time consuming and maintenance intensive. If we have to do it then we will but I'd like a clear idea as to the whys before I start down that particular path.

In particular at this stage I'm trying to understand more about the theoretical accuracies obtainable under ideal conditions, and most important, how to independently verify the results of any tweaks we might apply. Say I have coalesence turned on a nic and I disable it - I'd like to be able to determine the effect, if any of that change. Is it possible for ntpd (or ptpd) to accurately determine its own accuracy, if that makes sense? If not what techniques might I use to independently measure?

On a related note, I'm aware that there are various methods for querying the system time, some of which involve a context switch, and some of which can be done cheaply in userspace. I'm not sure whether the same is true for setting the time. Is anyone aware of how much ntpd operation involves context switches, which obviously would place quite a high ceiling on accuracy since we're at the mercy of the OS scheduler?

Cheers, Paul

_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Reply via email to