Re: [ntp:questions] ntpd wedged again

A C Mon, 13 Feb 2012 13:43:09 -0800

On 2/13/2012 01:53, David Lord wrote:

A C wrote:

On 2/12/2012 16:38, David Lord wrote:

A C wrote:

I'm not exactly sure what happened but I may have caught the system in
the act of trying to run away. Below is the last two lines from the
peers log for one of the peers and below that is the output of ntpq
showing what happened to that peer immediately after. Note the offset.
This somewhat matches what I had been seeing previously where my
system would be reporting sane numbers and then very suddenly go
haywire.


55969 77832.471 67.23.181.241 9324 0.006359155 0.081262776 0.001634487
0.008837966
55969 77897.557 67.23.181.241 9324 399326.864663492 0.000122070
0.001346708 399326.861631493


remote refid st t when poll reach delay offset jitter

==============================================================================


127.127.22.0 .PPS. 0 l 5 16 377 0.000 0.095 0.122
127.127.28.0 .GPSD. 4 l 36 128 377 0.000 6.399 10.748
-67.223.228.71 18.26.4.105 2 u 4 64 377 95.580 -3.863 14.410
+74.118.152.85 69.36.224.15 2 u 20 64 377 41.318 0.492 3.692
67.23.181.241 128.4.1.1 2 u 53 64 377 0.122 3993268 3993268
+130.207.165.28 130.207.244.240 2 u 7 64 377 77.872 2.837 0.328
*131.144.4.10 130.207.244.240 2 u 1 64 377 84.182 1.434 10.710


After a couple minutes things went back to normal as if nothing had
happened:

55969 78417.519 67.23.181.241 9024 0.005904355 0.081140840 0.005271221
0.000546629
55969 78545.470 67.23.181.241 9324 0.006413290 0.081305361 0.002089938
0.011478833


remote refid st t when poll reach delay offset jitter
==============================================================================


127.127.22.0 .PPS. 0 l 15 16 377 0.000 0.092 0.122
127.127.28.0 .GPSD. 4 l 94 128 377 0.000 11.432 8.820
-67.223.228.71 18.26.4.105 2 u 58 64 377 95.580 -3.863 8.613
+74.118.152.85 69.36.224.15 2 u 9 64 377 41.034 0.488 3.691
-67.23.181.241 128.4.1.1 2 u 44 64 377 81.141 5.904 0.547
+130.207.165.28 130.207.244.240 2 u 62 64 377 78.036 3.062 0.628
*131.144.4.10 130.207.244.240 2 u 59 64 377 84.182 1.434 10.717


It's the fist time that I've actually been able to see this happen
while staring at the monitor. I'm not certain but I suspect that the
last few times this happened where the system did run away the peer
that experienced the strange behavior happened to also be the selected
system peer. I can't explain the behavior though because everything
looks fine on my system otherwise and the remote server also seems to
be fine given that its offset was normal just before the update.


What remote server settings do you have?

The distances from your sources also don't fit with
having a short poll interval unless you know for
certain the routes are symmetric and with constant
delay.

Apart from that other sites have their own problems
so when any of my selected sources goes bad it is
usually replaced with a better source.

Ntpd usually handles the bad sources fairly well.


I posted the entire configuration in my previous email. Other than
iburst I have no settings for the servers. No min or max poll. I'm not
selecting the 64 second poll interval, ntpd is.


That's normal for ntpd starting up and if there is no good
drift file it can take many hours to establish. As the drift
value is adjusted the poll intervals normally increase and
finer adjustments are made. With a stable temperature I see
poll intervals of 1024.

It has been over 24 hours now of ntpd running (hooray, new record) butthe poll times are still at 64. The refclock seems to be clamping itbut I don't know why. The PPM adjustment has been fairly stable for thelast 12 hours. It hovers around -78.40 and drifts up and down by a fewpp-billion as the room temperature changes.


In this case ntpd successfully deselected the errant source and then
accepted it again after the strange behavior was noted. But I contend
that this may have happened with the system peer and ntpd may nto have
been so graceful about the sudden jump.


I use 'tos minsane 3' in my ntp.conf and have occasionally
seen my stratum jump to 16 when there has been a problem
with my internet connection.

A stratum change I can see if something has gone wrong but this is notjust a stratum change but a complete skewing of the clock by manyseconds. Initially it appears the skew is just an error internal tontpd where it only *thinks* the clock is off by many seconds. However,my suspicion is that if this error happens to fall upon the clock thathas been christened the system peer then it causes havoc and really*does* skew the system clock to try and compensate for a perceiveddrift. Unfortunately that sends the algorithm off the rails and theentire clock ends up slammed in one direction ticking way off as isevidenced by restarting ntpd shortly after the fatal flaw and observingit forcing a clock step of many seconds.


PS: I get a fuzz report once every two hours, is this what is supposed
to be in the code or is it supposed to report fuzz when it is detected?


Sorry I've never had a fuzz report. If you are running
ntp-dev it may have code that gives extra information.

Yeah, Dave hart already described the fuzz code to me. I was morecurious if it was a cyclic thing or if it was reported on an as detectedbasis.

_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Re: [ntp:questions] ntpd wedged again

Reply via email to