On Fri, Feb 10, 2012 at 03:07, A C <[email protected]> wrote: > So far so good, it's running with reasonable offsets and jitter (PPS not yet > enabled). > > But this is new to me in the logs: > > 10 Feb 01:41:37 ntpd[242]: ts_min 1328838097.185664676 ts_prev > 1328838097.185550676 ts 1328838097.189745353 > 10 Feb 01:41:37 ntpd[242]: sys_fuzz 114000 nsec, this fuzz 0.000099410 > 10 Feb 01:41:37 ntpd[242]: get_systime prev result 0xd2def451.30cccb9b is > 0.000780166 later than 0xd2def451.3099aa99
This is from recently-introduced code in ntp-dev that is reporting a hoped-to-be-impossible situation of reading the clock using ntpd routine get_systime() where the fuzzed result is earlier than the previous get_systime()'s result. These messages are not really intended to be scrutable by those not familiar with the internals of that routine, and my hope was they would not be seen at all. Briefly, ntpd "fuzzes" each timestamp's bits below sys_fuzz. The display is inconveniently mixed between decimal and hexadecimal representations, as those were the forms at hand and I didn't want to try to convert them and risk bugs in such conversion confusing results. Until recently, the fuzzing was below the reported precision, or minimum increment observed between subsequent clock readings. That should be no problem on typically high-precision unix system clocks, but led to predictable "time went backwards" issues with low-precision clocks such as Windows ntpd using the native system clock ticking every 0.5 or 1.0 msec. The new code measures not only the minimum increment between subsequent readings, but also the minimum time to read the clock, which is not the same with a low-precision system clock. If ntpd reads the system clock at the same value 100 times in a row, then it increments by 1 msec, the code infers the time to read the clock is 1 msec / 100 or 10 usec, and sets sys_fuzz to 10 usec. Later when reading the system clock and converting to NTP 32:32 fixed-point representation, a random value up to 10 usec will be added to the value from the OS, where previously up to 1000 usec / 1 msec would be added. At the same time fuzzing approach was changed, paranoid checks to be sure both the OS clock and the converted and fuzzed 32:32 is always advancing as well as the fuzzed conversion always increasing. So either those paranoid checks are wrong, or the code to measure the fuzz at startup is wrong, leading to the logically impossible result of the clock appearing to move backward relative to the prior reading. In this specific case, the minimum time to read the clock was measured at ntpd startup to be 114 usec, so each raw OS clock reading is expected to be no less than 114 usec later than the prior. The first line mentioning three ts_ variable names is displaying the raw clock readings from the OS, which are ordered as expected and not less than 114 usec apart, so far so good. The second line shows the measured minimum time to read the clock which sets the fuzz threshold, and the randomly selected fuzz value for this reading, which is a random number between 0 and sys_fuzz (114 usec). The last line shows the resulting fuzzed clock reading is 780 usec earlier than the prior fuzzed reading, which should not be possible. This code is green and could well be wrong. I encourage others seeing such messages in syslog or a configured ntpd log file to share a few examples here or in a bug report. Those who followed the introduction of a high-precision clock into FreeBSD many years ago might recall similar OS-level warnings reporting that time appeared to run backwards, seen for example when using processor cycle counters on SMP systems where the counters are not synchronized across processors. I hope these messages will be tamed by a bug fix and the new paranoid checks can continue to be used, with more comprehensible warning text when the checks fail. Cheers, Dave Hart Cheers, Dave Hart _______________________________________________ questions mailing list [email protected] http://lists.ntp.org/listinfo/questions
