At 7:53 AM +1000 2005-09-12, Joel Reicher wrote:

 The possibility exists for requesting pool servers to do a cut down
 version of this. Each server could, over time, do a traceroute to every
 other server, and report back either the number of hops or the whole
 result. Pool HQ could then do a clustering and we'd have zones of
 some real use. It's a significant data crunching exercise, however,
 and it's not obvious how you'd figure out which `zone' a client might
 be in. At least it's a minimum of bother to the client.

I don't see the real benefit of doing this. Assuming you do get a full-mesh picture of the network at any one particular instant, that picture is going to change in the next instant. There are many different routes between any two given points, any one of which may be better or worse than any of the others, at any given time. And things like traceroute aren't going to show you lower-level network issues, such as VPNs, MPLS, ATM clouds, etc.... Some route-paths may be load-balanced, so that different packets take different routes over interfaces that share the same beginning and ending termination points, but which might have different loading.

Give me a complete and total picture of the Internet. Now give it to me again. And again. And again. No matter how quickly you are able to take those complete pictures, there will be significant variations between them. Which of those variations are important and which ones are not? Moreover, generally speaking past behaviour is usually only a fair predictor of the future, at best. There are many other factors involved, of which past behaviour is only one.

In addition, it's not clear how the clients could make use of any of this information. Assuming you did have a good picture of the overall network topology between each and every one of the servers, how does that help the client? There are an almost infinite number of different routes on the Internet between any two points, and even with a complete map of all your servers, that doesn't tell you anything about the route between that client and any of those servers.


Of course, there are also scalability problems -- as you add more servers, each server has more other servers it has to monitor, until you come to the point where each machine is totally overloaded just by trying to keep an eye on all the other machines -- and the progression will be exponential, so you'll get to that point quite rapidly.

        Such is the nature of full-mesh or even partial-mesh networks.


I think we need to keep our server monitoring relatively light in weight, and done from a relatively few centralized monitoring points. Yes, there are bits of information that we're going to miss by doing that, but I don't see any manageable way around that problem.

 Perhaps we should start advising client writers to reresolve names
 periodically to get new servers from the round robin. They can combine
 the new servers with the old, sort according to hops, trim, and end
 up, eventually, with a list of close servers. We could then provide
 a reverse DNS facility for such clients to ensure that their close
 servers are still in the pool, as this is still important.

Maybe if one or more of your configured servers went down, or was considered insane, it might be appropriate to see if you could replace it/them, but you'd need to track where you got what information about what servers, and if it was a pool server that died or went insane, then you could easily replace it with a different pool server -- unless you're using a pool zone that doesn't have enough pool servers in it.

But if one of your explicitly configured servers went down or insane, and there is only the one server IP address returned for that name, then you shouldn't try to replace it. You might be able to bring up additional associations in order to compensate, but that would depend on the rest of your configuration file.


I know that Dr. Mills and Brian Utterback have discussed some ideas along these lines, and I think they could be useful. And I think that this is a good example of trying to keep the whole picture in mind as we think about the future of the pool.


However, overall, I don't think it would be a good idea to re-resolve this kind of information on a frequent basis. The important thing for an NTP client is consistency -- you really, really want to avoid clock-hopping -- and that would mean that you shouldn't re-resolve these kinds of names unless you've got really good reason to do so, like shutdown & reboot, waking from sleep, change of network configuration, excessive numbers of upstream servers down, etc....

Automatic re-adaption should definitely be done, but I think that this is something that needs to be done on an exception basis, not as a general rule.

--
Brad Knowles, <[EMAIL PROTECTED]>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

  SAGE member since 1995.  See <http://www.sage.org/> for more info.
_______________________________________________
timekeepers mailing list
[email protected]
https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers

Reply via email to