I was trying to understand how "discard monitor N" works in ntpd 4.2.8p15. I 
don't have a high-volume server -- this was purely academic interest. In the 
process I think I ran across a bug in how it works. Please correct me where I'm 
wrong.

The academic question was: what are the valid range and units of N?

The documentation says "discard monitor N" determines the "probability of being 
recorded for packets that overflow the MRU list size limit". Similarly Dr Mills 
described it as the "probability that a packet that overflows the internal LRU 
list is discarded". Naively I would have expected a probability to be expressed 
as [0..1) or [0..100), but actually it is expressed in seconds. Internally it 
is mon_age, and the default is 3000.

ntp_monitor.c:
  int     mon_age = 3000;         /* preemption limit */

When a packet comes in from a new client that would overflow the MRU list, that 
means ntpd has already checked that the MRU list is full, can't be extended, 
and all the entries are too young to age out (mru maxage). In this "dire" 
situation, the new information can be discarded, or it can be recorded over the 
oldest list entry. The choice is made by chance. The probability that the 
oldest entry will be recorded over depends on the age of the oldest entry, and 
is calculated as:

    oldest_age / mon_age

This means the probability of being recorded over is very low when the oldest 
entry is only 1 second old, and very high when the oldest entry is nearly 3000 
seconds old. And an oldest entry older than the threshold will always be 
recorded over.

The relevant code is:

ntp_monitor.c:
                /* Preempt from the MRU list if old enough. */
                } else if (ntp_random() / (2. * FRAC) >
                           (double)oldest_age / mon_age) {
                        return ~(RES_LIMITED | RES_KOD) & flags;
                } else {
                        mon_reclaim_entry(oldest);

Now here ntpd is generating a random real number by:

    ntp_random() / (2. * FRAC)

This looks like an arithmetic error to me. It returns a [0..0.25) random number 
where you would expect a [0..1) random number. To get a [0..1) random number, 
you would want

    ntp_random() * 2. / FRAC

and you do find that elsewhere in the code. FRAC represents 2^32. But 
ntp_random() returns a random integer in the range 0 .. 2^31 - 1, and this must 
be doubled (not halved) to get a [0..1) random number. So contrary to what I 
believe is the intent, "discard monitor 3000" currently sets the age threshold 
to 3000 รท 4 = 750 s, beyond which the oldest MRU list entry is always recorded 
over in case of overflow.

Related Bugzilla bug 3640: ntp.conf: missing documentation for "discard 
monitor" default value <https://bugs.ntp.org/show_bug.cgi?id=3640>. I did not 
find a bug describing wrong behavior.

Cheers!
Edward
-- 
This is questions@lists.ntp.org
Subscribe: questions+subscr...@lists.ntp.org
Unsubscribe: questions+unsubscr...@lists.ntp.org




Reply via email to