Ted Beatie wrote:

  server <one or more servers, external or internal>
  server <one or more other gateways, using the back-end addresses>

Add iburst to the end of each server line. This speeds up synchronization.

To all of the server lines, or just the internal-to-our-system servers?

  server <two or more gateways, using the back-end addresses>

Three servers are an absolute minimum because 2 means it has no way of knowing which is providing better information. Let's leave aside the question of the meaning of the word "better", it's a very complicated subject.

As I mentioned to Tom, what if we can't guarantee that?  As near as I
can tell, whereas more is better, the only actual requirement is for one
server.  In some cases, we're lucky if we get even one, so we either
need to believe that one, or we need to set the time manually.

Based on the above the internal NTP server has a stratum of 2 and will almost always be used over a stratum of 4. Is that internal NTP server getting its data from a stratum 1 server and is it internal or external?

It is internal, and looks like it gets it's time from other internal machines;

portal-01:~# ntptrace -n
127.0.0.1: stratum 3, offset 0.000006, synch distance 15.20248
10.16.4.1: stratum 2, offset -2.558634, synch distance 1.00000
10.16.4.100: stratum 2, offset -2.571121, synch distance 1.00000
10.16.100.2: stratum 2, offset -2.520537, synch distance 0.04373
132.163.4.101:  *Timeout*

By obfuscating the addresses it's hard to know if you've also removed the Tally Codes which indicates what gateway1 thinks of the servers. Since you are using the private address space for this it really doesn't matter if they're seen. If you don't want to show the names, just add a -n and it won't translate the IP addresses.

As I mentioned in the post, the tally codes were spaces.

portal-01:~# ntpq -nc pe localhost
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
10.16.4.1       10.16.4.100      2 u   40   64  377    0.280  -2558.0   4.447
10.123.123.2    10.123.123.1     4 u  810 1024  377    0.172  -1849.0   2.014
10.123.123.3    0.0.0.0         16 u  679 1024    0    0.000    0.000 4000.00

This only has two servers and you need at least 3. As it is gateway1 and gateway2 are at two different stratum levels. However you need to fix the problem first on the gateways.

Despite the spec, that seems to be a consistent interpretation.  If
everything internal is fully meshed, and there is only one external time
source, will everything sync up to that external source, no matter the skew?

Looking at the debugging techniques, and seeing that the tally code is
a space, and delving deeper, I see;

  gateway1:~# ntpq -c as localhost
  ind assID status  conf reach auth condition  last_event cnt
  ===========================================================
  1 47900  9014   yes   yes  none    reject   reachable  1
  2 47901  9014   yes   yes  none    reject   reachable  1
  3 47902  8000   yes   yes  none    reject

  storage-node2:~# ntpq -c as localhost
  ind assID status  conf reach auth condition  last_event cnt
  ===========================================================
  1 16076  9064   yes   yes  none    reject   reachable  6
  2 16077  9064   yes   yes  none    reject   reachable  6

Usually you will see these kinds of results when the server you are looking at has just started. You really need to give it time to synchronize.

Not in this case;

portal-01:~# ps aux|grep ntp;for i in 2 51 52 53 54; do ssh -1
10.123.123.$i ps aux; done | grep ntp
root   11283  0.0  0.1  2328 2320 ?        SL   Sep30   0:05 /usr/sbin/ntpd
root   17856  0.0  0.1  2328 2320 ?        SL   Sep30   0:04 /usr/sbin/ntpd
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     383  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     389  0.0  0.1  2328 2320 ?        SL   Jun13   0:05 /usr/sbin/ntpd -g

(the Sep30 processes are on the two gateways, the Jun13 processes are on
the servers.  I had recently manually stopped ntpd, resync'd the times,
and restarted ntpd on the gateways)

This appears to indicate it received just one packet which is not enough to synchronize anything. How long did you wait for the server after it was started to interrogate this server? You need to wait at least 15-20 minutes when you don't use iburst.

How long would it take with iburst set?  How can we deal with the fact
that the gateways and servers all generally come up at the same time?

            --ted

--
Ted Beatie                         Permabit, Inc.             [EMAIL PROTECTED]
Sr. Systems Engineer       One Kendall Sq, Cambridge, MA       +1-617-995-9317

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Like it or not, you have a dependency tree. Ntpd on the clients is not going to work until there is at least one server running and synchronized The server is not going to synchronize with an external server until the network is up.

With iburst specified in the server statements, you can be synchronized in a couple of minutes. After the first five replies are received from the upstream server(s) ntpd has enough information to START synchronizing your clock. That's ten or twelve seconds after ntpd starts. If it's a "warm start" (you have a drift file) and the power has not been off for very long, synchronization can be very fast. If it's a cold start; e.g. you have no drift file and/or power has been off long enough for the internal temperature of the machine to change substantially, ntpd can bring your clock within twenty or thirty milliseconds in the first two or three minutes. To get as good as it can get, will require several hours.

You will need to bring up the network, your routers and switches, first. Then bring up your ntp server(s) Then start your clients. Yes, it's probably going to take five to ten minutes to get all the clocks in rough synchronization (within twenty or thirty milliseconds of the correct time) this way. Most sites minimize the problem by minimizing shutdowns. If the site is only used three times a month and powered down between uses, you are just going to have to be patient and wait for things to synch up well enough to satisfy you.

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to