I was trying to understand how "discard monitor N" works in ntpd 4.2.8p15. I don't have a high-volume server -- this was purely academic interest. In the process I think I ran across a bug in how it works. Please correct me where I'm wrong.
The academic question was: what are the valid range and units of N? The documentation says "discard monitor N" determines the "probability of being recorded for packets that overflow the MRU list size limit". Similarly Dr Mills described it as the "probability that a packet that overflows the internal LRU list is discarded". Naively I would have expected a probability to be expressed as [0..1) or [0..100), but actually it is expressed in seconds. Internally it is mon_age, and the default is 3000. ntp_monitor.c: int mon_age = 3000; /* preemption limit */ When a packet comes in from a new client that would overflow the MRU list, that means ntpd has already checked that the MRU list is full, can't be extended, and all the entries are too young to age out (mru maxage). In this "dire" situation, the new information can be discarded, or it can be recorded over the oldest list entry. The choice is made by chance. The probability that the oldest entry will be recorded over depends on the age of the oldest entry, and is calculated as: oldest_age / mon_age This means the probability of being recorded over is very low when the oldest entry is only 1 second old, and very high when the oldest entry is nearly 3000 seconds old. And an oldest entry older than the threshold will always be recorded over. The relevant code is: ntp_monitor.c: /* Preempt from the MRU list if old enough. */ } else if (ntp_random() / (2. * FRAC) > (double)oldest_age / mon_age) { return ~(RES_LIMITED | RES_KOD) & flags; } else { mon_reclaim_entry(oldest); Now here ntpd is generating a random real number by: ntp_random() / (2. * FRAC) This looks like an arithmetic error to me. It returns a [0..0.25) random number where you would expect a [0..1) random number. To get a [0..1) random number, you would want ntp_random() * 2. / FRAC and you do find that elsewhere in the code. FRAC represents 2^32. But ntp_random() returns a random integer in the range 0 .. 2^31 - 1, and this must be doubled (not halved) to get a [0..1) random number. So contrary to what I believe is the intent, "discard monitor 3000" currently sets the age threshold to 3000 รท 4 = 750 s, beyond which the oldest MRU list entry is always recorded over in case of overflow. Related Bugzilla bug 3640: ntp.conf: missing documentation for "discard monitor" default value <https://bugs.ntp.org/show_bug.cgi?id=3640>. I did not find a bug describing wrong behavior. Cheers! Edward -- This is questions@lists.ntp.org Subscribe: questions+subscr...@lists.ntp.org Unsubscribe: questions+unsubscr...@lists.ntp.org