Hal Murray <[EMAIL PROTECTED]> wrote: > >>On recent Linux kernels, I think the drift file is always bad after reboot. >>HZ=100, no dynamic ticks aka tickless system (CONFIG_NO_HZ not set). I think >>I even tried with a kernel command line option lpj= but it didn't help. >>If the system is rebooted, ntpd stabilizes to a new different drift value. > >That's a bug in the TSC calibration code. > >grep your /var/log/messages* for "Detected". You will find things like thsi: > Jan 4 11:21:49 shuksan kernel: Detected 2793.137 MHz processor. > Jan 4 21:30:43 shuksan kernel: Detected 2793.209 MHz processor. > Jan 22 09:32:20 shuksan kernel: Detected 2793.139 MHz processor.
Yes, you are right. I had looked at some other lines changing in every boot, different values even for the two separate hyper threading "cores" of the same p4 processor: Jan 9 08:30:06 elektroni kernel: Calibrating delay using timer specific routine.. 6388.50 BogoMIPS (lpj=31942516) Jan 9 08:30:06 elektroni kernel: Calibrating delay using timer specific routine.. 6384.21 BogoMIPS (lpj=31921075) Jan 9 08:30:06 elektroni kernel: Total of 2 processors activated (12772.71 BogoMIPS). Jan 9 08:46:16 elektroni kernel: Calibrating delay using timer specific routine.. 6388.46 BogoMIPS (lpj=31942340) Jan 9 08:46:16 elektroni kernel: Calibrating delay using timer specific routine.. 6384.19 BogoMIPS (lpj=31920985) Jan 9 08:46:16 elektroni kernel: Total of 2 processors activated (12772.66 BogoMIPS). I had already forbidden this calibration and given lpj a constant value but when I now also forced the processor MHz calibration value (actually cpu_khz in tsc.c) to a constant value, the problem vanished. I did some tests to see how nptd behaves in different cases. 32-bit Linux kernel 2.6.23.14 without any patches and ntpd 4.2.4p4 without any patches. ntpd gets time from the internet (WAN). 1. Without an initial drift file, time set to a correct value with ntpdate before starting ntpd. The first frequency drift value in ntp-loopstats is 89 ppm and it grows to 92 ppm before starting to get lower again. The time offset in ntp-loopstats immediately grows to +77 ms and then starts to lower but overshoots badly to -112 ms and then finally steps ("time reset -0.135405") four hours after starting ntpd. Frequency is still 91 ppm. The time offset continues to lower from zero to -20 ms (freq. 88 ppm) until it starts to go to the right direction again. It takes over 7 hours to get the the offset to -10 ms and 16 hours to get it to -1 ms. I reboot after 34 hours when frequency drift value is 72 ppm and the offset is about 0.1 ms. 2. After reboot, the Linux kernel thinks the processor clock is 3192.182 MHz instead of 3192.210 MHz as before booting. So, while we have a drift file, it's not quite correct. Again, time is set to a correct value with ntpdate before starting ntpd. The frequency gets lower from the initial 72 ppm and the time offset grows (negative). After 40 minutes the time offset is largest -9 ms and after that it starts going to the right direction. After nine hours, the offset is better that -1 ms. Finally the frequency is 64 ppm. (About 8 ppm lower than in test 1. The calibrated processor frequency was 9 ppm lower than in test1. So, the connection is clear.) 3. If I force cpu_khz in the Linux kernel to a constant value, the problem goes away. (Just for fun, I lowered the frequency given by the kernel calibration routine by the frequency offset given by the ntp drift file and put that to the cpu_khz variable. So now the drift value stabilizes very near zero.) Now the absolute value of the time offset always stays below 1 ms even after reboot. (BTW, in linux-2.6.24 the variable has moved. It's now in file arch/x86/kernel/tsc_32.c) 4. Even if I allow the Linux kernel calibrate cpu_khz itself, I can also get good results by calibrating the drift value before starting ntpd, with a script I sent to this thread earlier, no need for any previous drift file. Basically, it stepped time with ntpdate, slept 100 seconds and stepped time again with ntpdate. From the time adjustment, the script calculated the drift value and put that to the drift file. Again, the time offset always stays below 1 ms. _______________________________________________ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions