Apologies for a long post, but I was unable to make it shorter. I have been monitoring timekeeping performance on an environment which contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2. The stratum 1s use time which is derived originally from GPS, but fed to the stratum 1 clocks via IRIG.
The monitoring is carried out from a single Solaris system which takes time from all seven servers. Normally all clocks show times within +/- 4ms, but every 7-8 days I see an event where all 7 clocks drift out by about 10-18 ms over a period of 2-3 hours before they are corrected. I am interpreting this as being due to drift in the local clock on the Solaris box which is doing trhe monitoring, I would expect the stratum 2 servers to lag the stratum 1s if the time on the stratum 1 servers was drifting due to some common-mode problem with their time reference. I am concerned about the length of time it takes before NTP starts correcting the local clock on the Solaris server. I have a graph which you can see at <http://www.flickr.com/photos/[EMAIL PROTECTED]/2477948892/sizes/o/in/ set-72157604959850048/> The above graph shows offset against time for all seven clocks. An hour of steady state operation is shown before the beginning of the drift event, the system has been in steady state for some days prior to the drift event. The poll interval is initially 1024 seconds. The drift event starts about an hour into the graph, the offset increases by about 15ms in about 2 hours (roughly 2ppm) then a correction is applied and the clock drifts back to zero offset at about the 3.5 hour mark. I am concerned that the drift went uncorrected for so long, and am trying to understand the cause. Is the clock-filter algorithm rejecting updated timestamps which are not the lowest of the most recent eight? From my reading of the book and the RFCs, this is what should happen, but that means that the clock can drift significantly before a new timestamp passes through the clock filter algorithm. To illustrate, here are the timestamp values for the three stratum 1 clocks over the period of the drift and the beginning of the correction. The time base is the same as that of the graph. Stratum 1 A Time Offset Delay Dispersion 00:00:00 -0.000052 0.000600 0.000200 * Lowest delay of the most recent 8 values. 00:17:04 0.000394 0.001850 0.000370 00:34:08 -0.000174 0.000630 0.000400 00:51:12 0.000908 0.000580 0.000890 * New lowest delay - drift begins about here 01:08:16 0.002661 0.000630 0.002180 01:25:20 0.004790 0.000750 0.003190 01:42:24 0.007350 0.000600 0.004120 01:59:28 0.010072 0.000610 0.004750 02:16:32 0.012666 0.000600 0.004910 02:33:36 0.015004 0.000610 0.004730 02:50:40 0.017115 0.000600 0.004390 02:59:12 0.018362 0.001970 0.003390 * The 000580 delay has now expired, there are three timestamps with 000600 delays in the shift register, which is chosen? Whichever is chosen, the offset has drifted significantly since the last timestamp was passed from the clock-filter. 03:06:00 0.017913 0.000600 0.001630 * Correction has begun 03:10:16 0.017275 0.000610 0.001080 03:14:05 0.015433 0.000580 0.002120 * New lowest delay 03:16:13 0.013812 0.000630 0.002610 Stratum 1 B Time Offset Delay Dispersion 00:12:47 -0.000637 0.010160 0.000260 * Lowest delay in shift register is 0.009900 00:29:51 -0.000810 0.010330 0.000320 00:46:55 0.000029 0.010180 0.000690 01:03:59 0.001683 0.010240 0.002000 Drift begins about here 01:21:03 0.003762 0.010220 0.003050 01:38:07 0.006200 0.010220 0.003940 01:55:11 0.008894 0.010130 0.004610 * New lowest delay 02:12:15 0.011507 0.010030 0.004880 * New lowest delay 02:29:19 0.013935 0.010190 0.004810 02:46:23 0.016025 0.010150 0.004430 02:54:55 0.016739 0.010210 0.002870 03:03:27 0.017224 0.010160 0.001850 03:07:43 0.016871 0.010380 0.000870 * Correction has begun 03:11:59 0.016221 0.010100 0.000850 * New lowest delay 03:14:07 0.014934 0.010240 0.001620 03:16:15 0.013274 0.010150 0.002430 Stratum 1 C (Selected as Sync Server during the whole of this time) Time Offset Delay Dispersion 00:01:52 -0.000076 0.009250 0.000200 *Lowest delay in shift register is 0.009090 00:18:56 -0.000287 0.009230 0.000310 00:36:00 -0.000091 0.009160 0.000150 00:53:04 0.001073 0.009310 0.001190 * Delay of 0.009090 expires, new lowest delay is 0.009160 Drift begins about here 01:10:08 0.002899 0.009410 0.002400 01:27:12 0.005351 0.009630 0.003630 01:44:16 0.007630 0.009220 0.004070 02:01:20 0.010348 0.009250 0.004700 02:18:24 0.012981 0.009250 0.004910 02:35:28 0.015285 0.009200 0.004700 02:52:31 0.017373 0.009250 0.004360 * Delay of 0.009160 expires, new lowest delay is 0.009200 03:01:03 0.017929 0.009230 0.002690 03:06:32 0.018002 0.009190 0.001340 * New lowest delay 03:10:48 0.017277 0.009290 0.000990 * Correction has begun 03:13:14 0.016549 0.009190 0.001100 03:15:22 0.014858 0.009280 0.002150 Why is the polling interval maintained at 1024s for so long in the presence of the drift? Apart from reducing the maximum polling interval, what else could I do to hasten the response to this kind of clock drift? The offsets from the set of clocks normally remains within +/- 4ms, which is sufficient for our needs, but a drift out beyond 15 ms is a cause for concern. We are hoping to be able to maintain time to within +/- 5ms of UTC on our NTP clients. The drift rate seen here is about 2ppm. If the drift rate were about 6ppm and we saw the same slow response to the drift, the clock could drift out by 50ms before the correction begins, this would definitely be regarded as poor timekeeping, and would cause alarms to be raised. I would be grateful for any comments or advice. Regards, Mike _______________________________________________ questions mailing list [email protected] https://lists.ntp.org/mailman/listinfo/questions
