At 7:53 AM +1000 2005-09-12, Joel Reicher wrote:
The possibility exists for requesting pool servers to do a cut down
version of this. Each server could, over time, do a traceroute to every
other server, and report back either the number of hops or the whole
result. Pool HQ could then do a clustering and we'd have zones of
some real use. It's a significant data crunching exercise, however,
and it's not obvious how you'd figure out which `zone' a client might
be in. At least it's a minimum of bother to the client.
I don't see the real benefit of doing this. Assuming you do get
a full-mesh picture of the network at any one particular instant,
that picture is going to change in the next instant. There are many
different routes between any two given points, any one of which may
be better or worse than any of the others, at any given time. And
things like traceroute aren't going to show you lower-level network
issues, such as VPNs, MPLS, ATM clouds, etc.... Some route-paths may
be load-balanced, so that different packets take different routes
over interfaces that share the same beginning and ending termination
points, but which might have different loading.
Give me a complete and total picture of the Internet. Now give
it to me again. And again. And again. No matter how quickly you
are able to take those complete pictures, there will be significant
variations between them. Which of those variations are important and
which ones are not? Moreover, generally speaking past behaviour is
usually only a fair predictor of the future, at best. There are many
other factors involved, of which past behaviour is only one.
In addition, it's not clear how the clients could make use of any
of this information. Assuming you did have a good picture of the
overall network topology between each and every one of the servers,
how does that help the client? There are an almost infinite number
of different routes on the Internet between any two points, and even
with a complete map of all your servers, that doesn't tell you
anything about the route between that client and any of those servers.
Of course, there are also scalability problems -- as you add more
servers, each server has more other servers it has to monitor, until
you come to the point where each machine is totally overloaded just
by trying to keep an eye on all the other machines -- and the
progression will be exponential, so you'll get to that point quite
rapidly.
Such is the nature of full-mesh or even partial-mesh networks.
I think we need to keep our server monitoring relatively light in
weight, and done from a relatively few centralized monitoring points.
Yes, there are bits of information that we're going to miss by doing
that, but I don't see any manageable way around that problem.
Perhaps we should start advising client writers to reresolve names
periodically to get new servers from the round robin. They can combine
the new servers with the old, sort according to hops, trim, and end
up, eventually, with a list of close servers. We could then provide
a reverse DNS facility for such clients to ensure that their close
servers are still in the pool, as this is still important.
Maybe if one or more of your configured servers went down, or was
considered insane, it might be appropriate to see if you could
replace it/them, but you'd need to track where you got what
information about what servers, and if it was a pool server that died
or went insane, then you could easily replace it with a different
pool server -- unless you're using a pool zone that doesn't have
enough pool servers in it.
But if one of your explicitly configured servers went down or
insane, and there is only the one server IP address returned for that
name, then you shouldn't try to replace it. You might be able to
bring up additional associations in order to compensate, but that
would depend on the rest of your configuration file.
I know that Dr. Mills and Brian Utterback have discussed some
ideas along these lines, and I think they could be useful. And I
think that this is a good example of trying to keep the whole picture
in mind as we think about the future of the pool.
However, overall, I don't think it would be a good idea to
re-resolve this kind of information on a frequent basis. The
important thing for an NTP client is consistency -- you really,
really want to avoid clock-hopping -- and that would mean that you
shouldn't re-resolve these kinds of names unless you've got really
good reason to do so, like shutdown & reboot, waking from sleep,
change of network configuration, excessive numbers of upstream
servers down, etc....
Automatic re-adaption should definitely be done, but I think that
this is something that needs to be done on an exception basis, not as
a general rule.
--
Brad Knowles, <[EMAIL PROTECTED]>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
_______________________________________________
timekeepers mailing list
[email protected]
https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers