To close this parenthesis I did the test for leap second only being propagated by 1 of three servers and Bill’s hypothesis is confirmed with a couple of precisions that I would like to share as it might just be a real life case.
a) To start off , in my test all three servers to my one client are sync’d to the same time. One of them has a leap file modified for my test. As UTC is defined WITH leap seconds, although all servers are sync’d, this is the ONLY one serving UTC. It correctly advertises the upcoming leap. b) When the leap occurs, the server with the leap file correctly inserts the leap, as does the client. The client’s NTP correctly detects the step and after a few polls correctly flags the UTC server as falsticker as the majority are consistently in disagreement with the now updated clock. Thu Jan 1 01:06:05 CET 2015 remote refid st t when poll reach delay offset jitter ============================================================================== *192.168.1.15 .GPS1. 1 u 42 64 377 0.495 999.894 534.506 +192.168.1.17 .GPS1. 1 u 39 64 377 0.564 999.899 654.645 x192.168.1.18 .GPS1. 1 u 66 64 377 0.575 -0.066 0.029 Now we have the full story and the « good » clock has been declared falsticker as not part of the majority but the story doesn't end there. A bit later the clients clock, which is at the time on UTC with leap second, gets stepped forward 1 sec to be in agreement with the majority. This is expected, but we have a client which now has not got good time. Thu Jan 1 01:11:27 CET 2015 remote refid st t when poll reach delay offset jitter ============================================================================== 192.168.1.15 .GPS1. 1 u 14 16 3 0.488 -0.039 0.038 *192.168.1.17 .GPS1. 1 u 22 64 1 0.516 -0.044 0.031 192.168.1.18 .GPS1. 1 u 14 16 3 0.566 -999.99 0.052 Thu Jan 1 01:12:31 CET 2015 remote refid st t when poll reach delay offset jitter ============================================================================== Final status with the UTC server redeclared as a falsticker. Thu Jan 1 01:15:38 CET 2015 remote refid st t when poll reach delay offset jitter ============================================================================== +192.168.1.15 .GPS1. 1 u 46 64 77 0.488 -0.039 0.047 *192.168.1.17 .GPS1. 1 u 17 64 37 0.520 -0.054 0.032 x192.168.1.18 .GPS1. 1 u 47 64 77 0.575 -999.99 0.053 This test was to verify a worst case scenario but shows that when administrators are preparing for a leap, they need to make sure that a majority of servers will be making the leap and propagate that info. This is not always easy as query commands are routinely blocked by some internet servers. Note : There is a possible bug or RFE required somewhere as the clock variable tai is not correctly set on the client. On the server that has the leap file we have the correct update rom 35 to 36 : mike@raspB4 ~ $ ntpq -c "rv 0 tai" tai=36 But on the client which has no leap file (and probably because of this) tai has been set to 1. So I think that what is happening is that the server notion of tai is not propagated to clients. mike@cubieez2:~$ ntpq -c "rv 0 tai" tai=1 There will most likely be a leap declared for the end of Jul 1 2015 or latest Jan 1 2016 so we have a bit of time yet to clean up the park. > Le 9 déc. 2014 à 14:20, Mike Cook <michael.c...@sfr.fr> a écrit : > > <snip> >> >> >>> >>>> Three are fine, as long as only one dies or goes nuts. >>> >>> Again, define "goes nuts". You don't seem to like the term >>> "falseticker", so how do you define "goes nuts"? If one "goes nuts" or >>> even goes offline, if the remaining two do not agree then it is like >>> having no server at all. >> >> No, it is like having two, with one being out. >> falseticker is a term with a very specific internal definition. Thus a >> server whose time is right on UTC could be a falseticker, because the >> other two servers were both exactly 3 days out, with tiny jitter estimates. >> I would say then that you had two servers going nuts, and one good, even >> though ntpd would say there were two good and one false ticker. > > In fact this does not happen. I just tested the hypothesis. > What happens depends on how the two wayward get there exaggerated offset: > a) someone,something resets the date: > result: ntp on both those servers crashes due to the panic_stop limit. > > So in this case the client has only one reference and continues using that. > It is not flagged as a falsticker. > That is normal. > > b) someone restarts ntp on the servers with the wrong date. Here the servers > ntpd has no way of knowing that it has bad time and so continues serving > normally. > On the client. The running ntp sees immediately a huge offset and huge > jitter. > > Tue Dec 9 13:15:04 CET 2014 > remote refid st t when poll reach delay offset jitter > ============================================================================== > *192.168.1.15 .GPS1. 1 u 320 64 360 0.549 0.040 0.037 > +192.168.1.16 .GPS2. 1 u 37 64 377 0.606 0.006 0.028 > +192.168.1.17 .GPS1. 1 u 309 64 360 0.576 0.027 0.025 > Tue Dec 9 13:16:08 CET 2014 > remote refid st t when poll reach delay offset jitter > ============================================================================== > 192.168.1.15 .GPS1. 1 u 55 64 341 0.565 0.042 9660780 > *192.168.1.16 .GPS2. 1 u 37 64 377 0.606 0.006 0.024 > 192.168.1.17 .GPS1. 1 u 42 64 341 0.579 0.041 9660773 > > After 5 mins the client is unable to resolve this and declares all clock > falsetickers and then panics. I did not have ntpd in debug mode here, but it > is reasonable to assume that it panics due to the selected clock being too > far out and hitting the panic limit. > > Tue Dec 9 13:23:37 CET 2014 > remote refid st t when poll reach delay offset jitter > ============================================================================== > 192.168.1.15 .GPS1. 1 u 45 64 377 0.596 -255600 155.539 > *192.168.1.16 .GPS2. 1 u 25 64 377 0.614 0.024 0.008 > 192.168.1.17 .GPS1. 1 u 30 64 377 0.583 -255600 52.806 > Tue Dec 9 13:24:41 CET 2014 > remote refid st t when poll reach delay offset jitter > ============================================================================== > x192.168.1.15 .GPS1. 1 u 43 64 377 0.596 -255600 179.609 > x192.168.1.16 .GPS2. 1 u 23 64 377 0.614 0.024 0.008 > x192.168.1.17 .GPS1. 1 u 27 64 377 0.618 -255599 6.009 > /usr/local/bin/ntpq: read: Connection refused > Tue Dec 9 13:25:45 CET 2014 > /usr/local/bin/ntpq: read: Connection refused > > This is exactly what happens if the client is restarted. > > clock_filter: n 1 off -255599.997967 del 0.000662 dsp 7.937502 jit 0.000002 > select: endpoint -1 -255600.000806 > select: endpoint 1 -255599.995128 > select: survivor 192.168.1.17 0.002839 > select: combine offset -255599.997967134 jitter 0.000000000 > event at 1 192.168.1.17 903a 8a sys_peer > clock_update: at 1 sample 1 associd 18641 > event at 1 0.0.0.0 c617 07 panic_stop -255600 s; set clock manually within > 1000 s. > event at 1 0.0.0.0 c61d 0d kern kernel time sync disabled > > So ntp does NOT continue in your test case. Your case may be better if the > time difference is less than the panic limit. Say if the two servers do not > insert a leap second, but the « correct » one does. I’ll try that for my own > satisfaction if I can figure how to do it. >> >> > >>> >>> >>> Brian Utterback >> >> _______________________________________________ >> questions mailing list >> questions@lists.ntp.org >> http://lists.ntp.org/listinfo/questions > _______________________________________________ > questions mailing list > questions@lists.ntp.org > http://lists.ntp.org/listinfo/questions _______________________________________________ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions