I was recently tasked with fixing a time synchronization setup on a closed 
network.  We have two old TrueTime GPS XL receivers each connected over IRIG-B 
to two TrueTime NTS-100 (560-5151) NTP time servers (four total NTS-100s).  I 
don't know which version of the ntp daemon the NTS-100s run, but they were 
released in 1996.  They cannot peer with each other and are locked in mode 4 
(server) operation.  Each NTS-100 has an ethernet connection.  From the client 
side, we have several hundred DEC Alpha workstations, each running Digital UNIX 
4.0d and xntpd 3.4x.  I've tested a number of synchronization configurations, 
but for this example, assume the following:

#ntp.conf

driftfile /etc/ntp.driftfile

server time001 version 3
server time002 version 3
server time003 version 3
server time004 version 3

#Some logging directives


Each client thus runs at stratum 2.  We have a problem with large offsets, 
though.  Several weeks ago, one of the NTS-100s went down due to a power 
failure.  When it came back up, it had incremented its year (IRIG-B does not 
contain year information, so on these NTS-100s, the admin must set the year via 
RS-232).  Within a few days, all of the daemons had detected the 1000s+ offset 
and committed suicide (we aren't using "-g" or any other command line option).  
The admins were not aware of this behavior and thus did not detect the 
failures.  The machines started drifting, etc.

I've read the NTP RFCs and most of Dr. Mills' website.  From my understanding, 
the intersection algorithm, if given a sufficient number of low stratum 
samples, should eliminate falsetickers.  In this scenario, three of the time 
servers remained within a millisecond of each other, while the fourth was a 
year and a day off.  I've run a number of tests: using 8+ stratum 1 hosts per 
client, using peering between clients, adding a layer of stratum 2 servers and 
forcing clients to stratum 3, implementing a local clock driver.  Without 
exception, I observe the following behavior in ntpq after restarting xntpd 
(delete drift files and logs, run ntpdate to step time, start xntpd with no 
flags):

1) Clients select one of the four sources and synchronize their clocks.  All 
client daemons operating correctly.
2) I manually increment the year on one of the servers.
3) All client daemons detect the large offset and switch to another server.  
ntpq shows an "x" by the faulty server, indicating elimination by the 
intersection algorithm.
4) Eventually, I see an "x" by every server/peer.  xntpd writes 
"Synchronisation lost" in the syslog and then aborts.

I've searched the list archives and the internet about large offsets; most 
sources say that xntpd detects falsetickers with an appropriate number of 
sources.  Is there a bug in this version of xntpd?  I cannot update the 
daemon's due to a configuration freeze on these systems.  I've tried many 
different synchronization subtrees to no avail.  Our program cannot purchase 
new time server equipment that consistently stores year information.  Does this 
intersection algorithm fail for large offsets?  I'm at a total loss on this 
one.  Does anyone have experience with this issue?


Thanks in advance,
Spence


_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to