Excellent description!

There is no fundamental problem I can recall with xntpd 3.4x on
Digital UNIX 4.0D, but there is one thing you might check, especially
if xntpd is running with the "-x" option. If the clients are configured
only to slew, large time jumps on the servers such as you have had may
well have caused the clients to acquire and institutionalize an absurdly
large drift rate in /etc/ntp.driftfile that they are now laboring to correct.
Even if the servers are now perfectly well behaved, this residual
problem on the clients could cause them to run persistently
fast or slow, drifting out of synch in one direction, and unable
to be stepped back to the correct time.

Look at a few of those drift files and see if they have unusually large
numbers (in the hundreds). As I recall, in that generation of the
code, you could get drift rates of 900 or so. Anything above 50 on
an Alpha is seriously wrong, and it's unusual to see anything above 20.

If that's what you have, you will have to do the following on
each of the clients to get them back to normal within your lifetime. :-)

1) /sbin/init.d/xntpd stop
2) rm /etc/ntp.driftfile
3) /sbin/init.d/settime start (runs "ntpdate -b [server] [server] ..."}
4) /sbin/init.d/xntpd start

-Tom

Spence Green wrote:
I was recently tasked with fixing a time synchronization setup on a closed 
network.  We have two old TrueTime GPS XL receivers each connected over IRIG-B 
to two TrueTime NTS-100 (560-5151) NTP time servers (four total NTS-100s).  I 
don't know which version of the ntp daemon the NTS-100s run, but they were 
released in 1996.  They cannot peer with each other and are locked in mode 4 
(server) operation.  Each NTS-100 has an ethernet connection.  From the client 
side, we have several hundred DEC Alpha workstations, each running Digital UNIX 
4.0d and xntpd 3.4x.  I've tested a number of synchronization configurations, 
but for this example, assume the following:

#ntp.conf

driftfile /etc/ntp.driftfile

server time001 version 3
server time002 version 3
server time003 version 3
server time004 version 3

#Some logging directives


Each client thus runs at stratum 2.  We have a problem with large offsets, though.  
Several weeks ago, one of the NTS-100s went down due to a power failure.  When it came 
back up, it had incremented its year (IRIG-B does not contain year information, so on 
these NTS-100s, the admin must set the year via RS-232).  Within a few days, all of the 
daemons had detected the 1000s+ offset and committed suicide (we aren't using 
"-g" or any other command line option).  The admins were not aware of this 
behavior and thus did not detect the failures.  The machines started drifting, etc.

I've read the NTP RFCs and most of Dr. Mills' website.  From my understanding, 
the intersection algorithm, if given a sufficient number of low stratum 
samples, should eliminate falsetickers.  In this scenario, three of the time 
servers remained within a millisecond of each other, while the fourth was a 
year and a day off.  I've run a number of tests: using 8+ stratum 1 hosts per 
client, using peering between clients, adding a layer of stratum 2 servers and 
forcing clients to stratum 3, implementing a local clock driver.  Without 
exception, I observe the following behavior in ntpq after restarting xntpd 
(delete drift files and logs, run ntpdate to step time, start xntpd with no 
flags):

1) Clients select one of the four sources and synchronize their clocks.  All 
client daemons operating correctly.
2) I manually increment the year on one of the servers.
3) All client daemons detect the large offset and switch to another server.  ntpq shows 
an "x" by the faulty server, indicating elimination by the intersection 
algorithm.
4) Eventually, I see an "x" by every server/peer.  xntpd writes "Synchronisation 
lost" in the syslog and then aborts.

I've searched the list archives and the internet about large offsets; most 
sources say that xntpd detects falsetickers with an appropriate number of 
sources.  Is there a bug in this version of xntpd?  I cannot update the 
daemon's due to a configuration freeze on these systems.  I've tried many 
different synchronization subtrees to no avail.  Our program cannot purchase 
new time server equipment that consistently stores year information.  Does this 
intersection algorithm fail for large offsets?  I'm at a total loss on this 
one.  Does anyone have experience with this issue?


Thanks in advance,
Spence


_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions


_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to