On Sat, Apr 14, 2012 at 07:43, A C <agcarver+...@acarver.net> wrote: > I did notice this in the current Internet only configuration, one Internet > clock went crazy just now: > > peerstats: > 56031 27012.604 130.207.165.28 9374 99542.105309349 0.000122070 > 0.015549461 99542.097097054 > > ntpq line: > -130.207.165.28 130.207.244.240 2 u 190 1024 377 0.122 9954210 > 9954209
I think it's safe to say we know a response did not arrive 27+ hours after a poll, particularly as NTP ignores responses which don't contain a the most recent poll's timestamp. I regret beating this drum again, but I suspect your floating-point hardware and/or software is at fault. The SPARC architecture has partial hardware floating-point support, the rest is done via software provided by the system (OS and/or compiler). It would be interesting to enable rawstats and find the corresponding rawstats entry when an insane peerstats offset is observed for a network source, in hopes that sheds more light on where the calculation runs off the rails. > There was a fuzz shortly afterwards (the above was at 07:30:12) : > 14 Apr 07:30:48 : ts_min 1334388647.362400444 ts_prev 1334388647.362295452 > ts 1334388648.362303446 > 14 Apr 07:30:48 : sys_fuzz 104992 nsec, this fuzz 0.000053869, prior > 0.000074835 The system clock was observed at startup to take 105 usec to read, which is not coincidentally the difference ts_min - ts_prev. The time to read is the minimum nonzero difference between 12 successive clock readings. As ts > ts_min by 999 msec, the underlying OS clock is behaving correctly. ntpd's fuzz for this clock reading is randomly chosen from 0-105 usec and is 54 usec. The fuzz was 75 usec for the prior clock reading. > 14 Apr 07:30:48 : get_systime prev result 0xd333a628.5d03feac is 0.000984891 > later than 0xd333a628.5cc372f1 And this is why the fuzz message was logged -- somehow the sum of ts_prev and previous fuzz converted to NTP's timescale appears to be nearly a millisecond later than the sum of ts and current fuzz. In short, the output of get_systime() failed to increase over time as it should (except once at NTP era rollover every 136 years). Notice the seconds part of ts is one greater than ts_prev. Yet the seconds part of the previous result is the same as the seconds part of the latest result. Please provide more examples so I can see if this is a pattern. I'm suspicious first of the conversion by ntpd between Unix and NTP timescales, and secondarily of the floating point hardware+software. > Note the delay, it seems rather curious that the delay is exactly the > computed jitter of my system. That's the same value used in the jitter > field of my PPS source: > > peerstats: > 56031 27118.381 127.127.22.0 9024 0.004945328 0.000000000 0.000353156 > 0.000122070 > ntpq: > 127.127.22.0 .PPS. 0 l 9 16 377 0.000 5.019 0.122 > > Coincidence or did something accidentally jumble some memory addresses? Your system took 105 usec to read the clock.using get_systime() before fuzzing was enabled. ntpd converts this to the nearest larger power of two to determine the precision, which yields 2**-13 seconds or 0.000122. ntpd knows it cannot believe differences less than 122 usec between readings of your clock and so forces the delay and jitter values to be no less than that value. This is entirely by design and can be seen in a more exaggerated form running ntpd on Windows Vista/7 without interpolation with a 1 msec system clock -- the delays and jitter are 0.977 msec or higher. If ntpd used clock differences less than 105 usec, it would be giving unwarranted credence to the normally-distributed random noise referred to as fuzz whjch is intentionally added to local clock readings. Cheers, Dave Hart _______________________________________________ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions