I have a question that seems somewhat similiar to one that was just asked, but there are a couple of differences, so I figured I'd ask mine as well. Apologies for the long post, but I'm trying to skip the "more info please" phase. :-)
I have a product that is comprised of a cluster of Linux nodes, with the cluster ranging in size from 4 to over 100 nodes. To date, we've used the version of NTP included in the OS (SLES 10) to maintain internal time synchronization in the cluster, but without associations to any external NTP servers nor any hardware based time sources. While this has worked satisfactorily, it does allow for a gradual drift from UTC over time, so we'd like to extend the product to eliminate this. What this means in terms of requirements is that we still must maintain a stabile internal "cluster time" with sub-second tolerance. This should be trivial for NTP to maintain, as that is a rather loose tolerance compared to many others I've seen discussed. The requirement to match true UTC is even looser, as all we're trying to do is enable the use of an external reference to stop what can be a perpetual drift. Just to give it a number though, let's say we'd like it to be within 60 seconds of UTC. The topology of our cluster has two tiers. All of the nodes are interconnected over a private network, and some subset of the nodes also have external connections to the LAN where it is deployed. The subset is always at least 2 nodes, and can be as high as 25% of the total number of nodes. Prior to extending the product to allow use of an external (to the cluster) NTP server or servers, those nodes with external connections were configured as peer servers to the internal cluster, with all other nodes pure clients. After adding support for external NTP servers, we kept something like the same config: The nodes with external connections were still servers to the internal network, and were peers of each other. But now they were also clients of one or more external servers. I understand that requiring three or more would be better, and we can do that, but we still have to ensure stability of the internal cluster time even if a reduced set of servers (including the null set) were reachable. Our configuration did not work, because we were able to cause instability in the internal cluster time with perturbations in the external server. And we have to guarantee stabililty even with bad inputs. What happened was that some (but not all) of those externally connected nodes deemed the external server a false ticker, and stopped believing it. But some of the other externally connected nodes did not, and as a result there was time divergence between members of this group. It is this divergence that I'm referring to when I speak of a lack of stability. So before I go into configuration details, is there a known "best way" to handle the sort of requirements I described? It sounds like orphan mode might provide functionality I'm looking for, but I figured in parallel with emperical experimentation, I'd pursue the analytical approach and ask people who know more than me. :) thanks, Tim _______________________________________________ questions mailing list [email protected] https://lists.ntp.org/mailman/listinfo/questions
