Re: [time] What do our users expect?

Brad Knowles Sun, 11 Sep 2005 15:52:37 -0700

At 7:53 AM +1000 2005-09-12, Joel Reicher wrote:

 The possibility exists for requesting pool servers to do a cut down
 version of this. Each server could, over time, do a traceroute to every
 other server, and report back either the number of hops or the whole
 result. Pool HQ could then do a clustering and we'd have zones of
 some real use. It's a significant data crunching exercise, however,
 and it's not obvious how you'd figure out which `zone' a client might
 be in. At least it's a minimum of bother to the client.

I don't see the real benefit of doing this. Assuming you do geta full-mesh picture of the network at any one particular instant,that picture is going to change in the next instant. There are manydifferent routes between any two given points, any one of which maybe better or worse than any of the others, at any given time. Andthings like traceroute aren't going to show you lower-level networkissues, such as VPNs, MPLS, ATM clouds, etc.... Some route-paths maybe load-balanced, so that different packets take different routesover interfaces that share the same beginning and ending terminationpoints, but which might have different loading.

Give me a complete and total picture of the Internet. Now giveit to me again. And again. And again. No matter how quickly youare able to take those complete pictures, there will be significantvariations between them. Which of those variations are important andwhich ones are not? Moreover, generally speaking past behaviour isusually only a fair predictor of the future, at best. There are manyother factors involved, of which past behaviour is only one.

In addition, it's not clear how the clients could make use of anyof this information. Assuming you did have a good picture of theoverall network topology between each and every one of the servers,how does that help the client? There are an almost infinite numberof different routes on the Internet between any two points, and evenwith a complete map of all your servers, that doesn't tell youanything about the route between that client and any of those servers.

Of course, there are also scalability problems -- as you add moreservers, each server has more other servers it has to monitor, untilyou come to the point where each machine is totally overloaded justby trying to keep an eye on all the other machines -- and theprogression will be exponential, so you'll get to that point quiterapidly.


        Such is the nature of full-mesh or even partial-mesh networks.

I think we need to keep our server monitoring relatively light inweight, and done from a relatively few centralized monitoring points.Yes, there are bits of information that we're going to miss by doingthat, but I don't see any manageable way around that problem.

 Perhaps we should start advising client writers to reresolve names
 periodically to get new servers from the round robin. They can combine
 the new servers with the old, sort according to hops, trim, and end
 up, eventually, with a list of close servers. We could then provide
 a reverse DNS facility for such clients to ensure that their close
 servers are still in the pool, as this is still important.

Maybe if one or more of your configured servers went down, or wasconsidered insane, it might be appropriate to see if you couldreplace it/them, but you'd need to track where you got whatinformation about what servers, and if it was a pool server that diedor went insane, then you could easily replace it with a differentpool server -- unless you're using a pool zone that doesn't haveenough pool servers in it.

But if one of your explicitly configured servers went down orinsane, and there is only the one server IP address returned for thatname, then you shouldn't try to replace it. You might be able tobring up additional associations in order to compensate, but thatwould depend on the rest of your configuration file.

I know that Dr. Mills and Brian Utterback have discussed someideas along these lines, and I think they could be useful. And Ithink that this is a good example of trying to keep the whole picturein mind as we think about the future of the pool.

However, overall, I don't think it would be a good idea tore-resolve this kind of information on a frequent basis. Theimportant thing for an NTP client is consistency -- you really,really want to avoid clock-hopping -- and that would mean that youshouldn't re-resolve these kinds of names unless you've got reallygood reason to do so, like shutdown & reboot, waking from sleep,change of network configuration, excessive numbers of upstreamservers down, etc....

Automatic re-adaption should definitely be done, but I think thatthis is something that needs to be done on an exception basis, not asa general rule.


--
Brad Knowles, <[EMAIL PROTECTED]>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

  SAGE member since 1995.  See <http://www.sage.org/> for more info.
_______________________________________________
timekeepers mailing list
[email protected]
https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers

Re: [time] What do our users expect?

Reply via email to