On 2/11/2012 06:51, Dave Hart wrote:
On Sat, Feb 11, 2012 at 09:21, A C<[email protected]> wrote:
So ntpd has been behaving reasonably well with the snprintf fix. I had good
results with only internet servers. My PPS and SHM refclocks were set to
noselect.
I removed the noselect on the PPS refclock and left flag3 set to zero (no
kernel discipline).
Everything seemed fine and then:
Sat Feb 11 01:12:10 PST 2012
remote refid st t when poll reach delay offset
jitter
==============================================================================
x127.127.22.0 .PPS. 0 l - 16 377 0.000 -111.40
351.464
127.127.28.0 .GPSD. 4 l 49 128 377 0.000 -14655.
2814.64
169.229.70.201 169.229.128.214 3 u 103 512 377 39.347 -9274.2
6597.61
72.14.179.211 127.67.113.92 2 u 79 512 377 57.746 -14699.
10685.0
24.124.0.251 132.236.56.250 3 u 521 512 377 77.930 -9835.0
7451.10
130.207.165.28 130.207.244.240 2 u 153 512 377 79.131 -9155.6
6554.15
131.144.4.10 130.207.244.240 2 u 142 512 377 86.537 -9102.3
6526.3
Did you forget to mention you commented out the NMEA refclock at the
same time you removed noselect from the atom/PPS and SHM drivers?
I am a bit tired right now, so forgive me for latching onto a nit
rather than the juicy part, but I want to be as clear as possible.
You say everything was fine until you made some changes, without
specifying the previous state, and when I try to infer what that
earlier state was based on the two changes, I'm left with a setup with
no refclocks, which is obviously not particularly comparable. I'm
also hesitating to point a finger at the gpsd+SHM combo, particularly
because I suspect it's racy especially on non-x86 systems and have on
my to-do list rewriting it to use a safer shared memory access
protocol...
So first, let's be clear about what you're reporting. Was the change
from 3 refclock drivers with 2 marked noselect to 2 selectable
drivers?
No problem. SHM has been disabled by noselect for a while. It is still
currently disabled by noselect (but not commented out so I can still
observe its relative offset). During the snprintf testing from this
week, ATOM has also been disabled by noselect (also so I could continue
to observe its relative offset) so I was left with only the internet
servers (five total) as my time sources.
For an entire week I ran with ATOM and SHM in noselect and things looked
fine. Offsets for all internet servers settled down to 1-2ms and the
reported ATOM offset also stayed in that same range without straying
away (again, this is reported offset but the clock wasn't being used
because it was still noselect).
I removed the noselect from ATOM only (not SHM) so now I had the
internet servers (five) plus ATOM. Everything looked fine for a few
hours after I restarted ntpd with ATOM enabled again (allowed to be
selected). But after a few hours, the clock went crazy and started
slewing very quickly. When I restarted ntpd, it had to step the clock
backwards by 16.6 seconds to bring it into agreement. The clock gained
16 seconds in a matter of about 5 minutes (the amount of time I let ntpd
run in this crazy state).
_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions