On Tue, Jun 1, 2010 at 00:10, unruh <[email protected]> wrote:
>> If we assume there is a private subnet that has two GPS reference >> clocks to synchronize the rest of the machines, what would be the >> expected failure mode where one of the stratum 1 servers go crazy, and >> having three GPS clocks actually makes a difference? > > The gps falls off the roof and is burried in shrub, but still uses its > internal clock to deliver PPS pulses is an example. That of course could happen, but the scenario requires some really convenient failure in the GPS unit. If the GPS lock is lost, at least the GPS units I have tested will indicate their clock is freewheeling. The time is then discarded by gpsd as invalid and will not be fed to the NTP reference clock driver. There are countless other possible failure scenarios, each of them more or less fatal to the application. The Ethernet interface in some computer in the network could start jamming the whole subnet by constantly broadcasting something. A more probable failure I have witnessed a couple of times is a cheap Ethernet switch starting to corrupt frames randomly or flapping its links fast, but this usually only causes long random delays, not undetectable bad data. >From my experience, hardware or well designed and tested software going crazy (i.e. outputting completely invalid data) without any safeguards noticing it usually requires quite bizarre double failures in the system at the same time. Thinking this I sometimes wonder the reasoning for the high numbers of servers suggested here. It's often three or four, but some seem to suggest even five or seven servers so that quite a lot of them can fail. It may be wise with internet servers and it also doesn't bring any costs adding more pool servers, but with private subnets with own reference clocks it seems like an overkill for at least many applications. That said, I have seen an embedded computer whose internal clock somehow run twice as fast as it should have. It wasn't used with NTP, but I'd be curious to know how NTP would have handled the situation, especially if that computer would have been attached to a reference clock. _______________________________________________ questions mailing list [email protected] http://lists.ntp.org/listinfo/questions
