On 2/11/2012 06:51, Dave Hart wrote:
On Sat, Feb 11, 2012 at 09:21, A C<[email protected]>  wrote:
So ntpd has been behaving reasonably well with the snprintf fix.  I had good
results with only internet servers.  My PPS and SHM refclocks were set to
noselect.

I removed the noselect on the PPS refclock and left flag3 set to zero (no
kernel discipline).

Everything seemed fine and then:

Sat Feb 11 01:12:10 PST 2012
     remote           refid      st t when poll reach   delay   offset
  jitter

==============================================================================
x127.127.22.0    .PPS.            0 l    -   16  377    0.000  -111.40
351.464
  127.127.28.0    .GPSD.           4 l   49  128  377    0.000  -14655.
2814.64
  169.229.70.201  169.229.128.214  3 u  103  512  377   39.347  -9274.2
6597.61
  72.14.179.211   127.67.113.92 2 u   79  512  377   57.746  -14699.
10685.0
  24.124.0.251    132.236.56.250   3 u  521  512  377   77.930  -9835.0
7451.10
  130.207.165.28  130.207.244.240  2 u  153  512  377   79.131  -9155.6
6554.15
  131.144.4.10    130.207.244.240  2 u  142  512  377   86.537  -9102.3
6526.3

Did you forget to mention you commented out the NMEA refclock at the
same time you removed noselect from the atom/PPS and SHM drivers?

I am a bit tired right now, so forgive me for latching onto a nit
rather than the juicy part, but I want to be as clear as possible.
You say everything was fine until you made some changes, without
specifying the previous state, and when I try to infer what that
earlier state was based on the two changes, I'm left with a setup with
no refclocks, which is obviously not particularly comparable.  I'm
also hesitating to point a finger at the gpsd+SHM combo, particularly
because I suspect it's racy especially on non-x86 systems and have on
my to-do list rewriting it to use a safer shared memory access
protocol...

So first, let's be clear about what you're reporting.  Was the change
from 3 refclock drivers with 2 marked noselect to 2 selectable
drivers?

No problem. SHM has been disabled by noselect for a while. It is still currently disabled by noselect (but not commented out so I can still observe its relative offset). During the snprintf testing from this week, ATOM has also been disabled by noselect (also so I could continue to observe its relative offset) so I was left with only the internet servers (five total) as my time sources.

For an entire week I ran with ATOM and SHM in noselect and things looked fine. Offsets for all internet servers settled down to 1-2ms and the reported ATOM offset also stayed in that same range without straying away (again, this is reported offset but the clock wasn't being used because it was still noselect).

I removed the noselect from ATOM only (not SHM) so now I had the internet servers (five) plus ATOM. Everything looked fine for a few hours after I restarted ntpd with ATOM enabled again (allowed to be selected). But after a few hours, the clock went crazy and started slewing very quickly. When I restarted ntpd, it had to step the clock backwards by 16.6 seconds to bring it into agreement. The clock gained 16 seconds in a matter of about 5 minutes (the amount of time I let ntpd run in this crazy state).

_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Reply via email to