Hi The problem with your approach is that you can depart from "normal" for very long periods of time. Consider my home network, running NTP to external sources. Around 4 in the afternoon all the kids get home and start streaming video. At 7 I get home and start doing the same thing. We each keep this up for 5 hours. Past midnight, the bit torrent fires up and it runs through 5 AM. Mid day, there's a video conference that runs from home for an hour or two.
Each of these things creates a non-zero load ahead of the NTP at some point. Given network congestion and re-transmission the load will really pile up at various times. Given the high level of transmit / receive asymmetry in my cable modem, it will be pretty hard for me to figure out what's going on. The net result will be that my NTP hops around a bit during the day. Bob On Jan 1, 2013, at 8:57 PM, Dennis Ferguson <[email protected]> wrote: > > On 27 Dec, 2012, at 15:13 , Magnus Danielson <[email protected]> > wrote: >> On GE, a full-length packet is about 12 us, so a single packets head-of-line >> blocking can be anything up to that amount, multiple packets... well, it >> keeps adding. Knowing how switches works doesn't really help as packets >> arrive in a myriad of rates, they interact and cross-modulate and create >> strange patterns and dance in interesting ways that is ever changing in >> unpredictable fashion. > > I wanted to address this bit because it seems like most > people base their expectations for NTP on this complexity, > as does the argument being made above, but the holiday > intervened. While I suspect many people are thoroughly > bored of this topic by now I can't resist completing the > thought. > > Yes, the delay of a sample packet through an output queue > will be proportional to the number of untransmitted bits in > the queue ahead of it, yes, the magnitude of that delay can > be very large and largely variable and, even, yes, the > statistics governing that delay may often be unpredictable and > non-gaussian, exhibiting dangerously heavy tails. The thing is, > though, that this doesn't necessarily have to matter so much. A > better approach might avoid relying on the things you can't know. > > To see how, consider a different question: what is the > probability that any two samples sent through that queue > will experience precisely the same delay (i.e. find precisely > the same number of bits queued in front of it when it > gets there)? I think it is fairly conservative to predict > that the probability that two samples will arrive at a non-empty > output queue with exactly the same number of bits in front of > them will be fairly small; the number of bits in the queue will > be continuously changing, so the delay through a non-empty queue > should have a near-continuous (and unpredictable) probability > distribution, as you point out, and if the sampling is uncorrelated > with the competing traffic it is unlikely that any pair of > samples will find exactly the same point on that distribution. > > The exception to this, of course, is a queue length of > precisely 0 bits (which is precisely why the behaviour > of a switch with no competing traffic is interesting). The > vast majority of queues in the vast majority of network > devices in real networks are no where near continuously > occupied for long periods. The time-averaged fractional load > on the circuit a queue is feeding is also the probability of > finding the queue not-empty. If the average load on the > output circuit is less than 100% then multiple samples are > probably going to find that queue precisely empty; if the > average load on the output circuit is 50% (and that would be > an unusually high number in a LAN, though maybe less > unusual in other contexts) then 50% of the samples that pass > through that queue are going to find it empty. Since samples > that found the queue empty will have experienced pretty much > identical delays, the "results" (for some value of "result") > from those samples will cluster closely together. The > results from samples which experienced a delay will > differ from that cluster but, as discussed above, will also > differ from each other and generally won't form a cluster > somewhere else. The cluster marks the good spot independent > of the precise (and precisely unknowable) nature of the statistics > governing the distribution of samples outside the cluster. If > we can find the cluster we have a result which does not depend > on understanding the precise behaviour of samples outside the > cluster. > > Given this it is also worth while to consider "jitter", which > intuition based on a normal distribution assumption might suggest > should be predictive of the quality of the result derived from a > collection of samples. In the situation above, however, the > dominant contributors to "jitter", however measured, are going > to be the samples outside the cluster since they are the ones > that are "jittering" (it is that property we are relying on to > define the cluster). If jitter mostly measures information > about the samples the estimate doesn't rely on then it tells you > little about the samples the estimate does rely on, and hence > can provide no prediction about the quality of an estimate > derived from those samples alone. In fact, in a true perversion > of normal intuition, high jitter and heavy-tailed probability > distributions might even make it easier to get a good result > by making it easier to identify the cluster. Saying "I see > a lot of jitter" doesn't necessarily tell you anything about > what is possible. > > While the argument gets a lot more complex in a hurry, and > too much to attempt here (the above is too much already), I > believe this general approach can scale to a whole large network > of devices with queues (though even the single-switch case has real > life relevance too). That is, I think it is possible to find a > sample "result" for which there is a strong tendency for "good" > samples to cluster together while "bad" samples are unlikely to do > so, with the quality of the result depending on the population and > nature of variability of the cluster but hardly at all on the > outliers, and with the lack of a measurable cluster telling you > when you might be better off relying on your local clock rather > than the network. The approach relies on the things we do know > about networks and networking equipment while avoiding reliance on > things we can't know: it mostly avoids making gaussian statistical > assumptions about distributions that may not be gaussian. The field > of robust statistics provides tools addressing this which might > be of use. > > I guess it is worth completing this by mentioning what it > says about ntpd. First, ntpd knows all of the above, probably > much, much better than I do, though it might not put it in > quite the same terms. If you make the assumption that the > stochastic delays experienced by samples are evenly distributed > between the outbound and inbound paths (this is not a good match > for the real world, by the way, but there are constraints...) then > round trip delay becomes a stand-in measure of "cluster", and ntpd > does what it can with this. The fundamental constraint that limits > what ntpd can do, in a couple of ways, is the fact that the final > stage of its filter is a PLL. The integrator in a PLL assumes > that the errors in the samples it is being fed are zero-mean and > normally distributed, and will fail to arrive at a correct answer if > this is not the case, so if you want to filter samples for which > this is unlikely to be the case you need to do it before they get > to the PLL. The problem with doing this well, however, is that a > PLL is also destabilised by adding delays to its feedback path, > causing errors of a different nature, so anything done before the > PLL is severely limited in the amount of time it can spend doing > that, and hence the number of samples it can look at to do that. > Doing better probably requires replacing the PLL; the "replace > it with what?" question is truly interesting. > > I suspect I've gone well off topic for this list, however, and for > that I apologize. I just wanted to make sure it was understood that > there is an argument for the view that we do not yet know of any > fundamental limits on the precision that NTP, or a network time > protocol like NTP, might achieve, so any effort to build NTP servers > and clients which can make their measurements more precisely is not > a waste of time. It instead is what is required to make progress > in understanding how to do this better. > > Dennis Ferguson > _______________________________________________ > time-nuts mailing list -- [email protected] > To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts > and follow the instructions there. _______________________________________________ time-nuts mailing list -- [email protected] To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts and follow the instructions there.
