[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline

Joe Harvell Fri, 22 Sep 2006 11:45:18 -0700

David L. Mills wrote:
<snip>

5. If for some reason the server(s) are not reachable at startup and theapplications must start, then I would assume the applications wouldfail, since the time is not synchronized. If the applications use theNTP system primatives, the synchronization condition is readily apparentin the return code. Since they can't run anyway, there is no harm instepping the clock, no matter what the initial offset. Forcing a slew inthis case would seem highly undesirable, unless the application cantolerate large differences between clocks and, in that case, using ntpdis probably a poor choice in the first place.


I agree that the condition of no time servers reachable on startup is the most 
common case where a large offset will eventually be observed.  I agree that the 
application should detect this and fail before starting up.  I am concerned 
about clock and network failure scenarios that cause an NTP client to see two 
different NTP servers with very different times.

This actually happened in a testbed for our application. NTP stats show that over the 
course of 22 days, the offsets of two configured NTP servers (both ours) serving one of 
our NTP clients started diverging up to a maximum distance of 800 seconds.  During this 
time, our NTP client stepped its clock forward 940 times and backwards 803 times, with 
increasing magnitudes up to ~400 seconds.  The problem went away when someone "added 
an IP address to the configuration of one of the NTP servers."  (I am still trying 
to determine exactly what happened).  The ntp.conf files of the NTP client, the stats, 
and a nice graph of the offsets is found at http://dingo.dogpad.net/ntpProblem/.

I concede that only having 2 NTP servers for our host made this problem more 
likely to occur.  But considering the mayhem caused by jerking the clock back 
and forth every 15 minues for 22 days, I think it is worth investigating 
whether to eliminate stepping altogether.

I still don't understand why the clock was being stepped back and forth.  One 
of the NTP servers showed up with 80f4 (unreachable) status every 15 minutes 
for the entire 22 days, but with 90f4 (reject) and 96f4 (sys.peer) in between.  
Oddly, this server was one of two servers, but the *other* server was the 
preferred peer.  I wonder why this peer would ever be selected as the sys.peer 
since the prefer peer was only reported unreachable 10 times over this 22 day 
period.  Would this be because the selection algorithm finds no intersection?

Maybe the behavior I saw was a bug, and not the expected consequence of a 
failure scenario in which 2 NTP servers have diverging clocks.

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline

Reply via email to