Re: ntp problems stratum 2 to 14?

2020-03-05 Thread Dewayne Geraghty
> The interrupted system calls (EINTR returned from select()) are normal.
> Notice that each time it happens, it's associated with a SIGALARM being
> delivered to ntpd.  Ntpd uses SIGALARM at 1hz to periodically get
> control and decide whether it's time to poll peers and do other
> periodic work.
>
> You say 10.0.7.6 syncs with some atomic clocks, but in your initial
> posting it was sync'd to its own LOCL clock at stratum 14, which is why
> the ntpd you were asking about refused to sync to it and also fell back
> to its own LOCL clock.  Eventually 10.0.7.6 sync'd to 203.35.83.242,
> then the system you were asking about sync'd to 10.0.7.6.
>
> -- Ian
>
>
>
Thankyou for your insight; comforting to know that its normal/expected, so
I don't need to follow that down.

Yes, I probably caught 10.0.7.6  restarting ntp as I was trying to
determine if there was some new incompatability or I changed some firewall
rule, so I may have inadvertently blocked it.  I apologise for complicating
the original situation/post.  Typically 10.0.7.6 is rebooted when the UPS
needs new batteries, acting as firewall and time server.

-- 
*** *NOTICE *This email and any attachments may contain legally privileged
or confidential information and may be protected by copyright. You must not
use or disclose them other than for the purposes for which they were
supplied. The privilege or confidentiality attached to this message and
attachments is not waived by reason of mistaken delivery to you. If you are
not the intended recipient, you must not use, disclose, retain, forward or
reproduce this message or any attachments. If you receive this message in
error please notify the sender by return email or telephone, and destroy
and delete all copies. ***
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntp problems stratum 2 to 14?

2020-03-05 Thread Ian Lepore
On Fri, 2020-03-06 at 07:49 +1100, Dewayne Geraghty wrote:
> Ian,  Good points.  I did remove the fudge and 127.127.1.1 lines from the
> config with the same result as below.  Interestingly the clock at 10.0.7.6
> isn't really unreliable, as its been my time source since 2005, and serves
> clients, so it is pretty ok. Without a local clock, named fails (its linked
> with kerberos).  Yes I'd never seen my clock server become st 14, which
> prompted me to seek help. I haven't repeated that scenario, but I
> continuously get "interrupted system call" and I haven't been able to sync
> while running debug or ktrace - so I dont have "what good looks like".
> 
> Thanks Peter. I've rebuilt net/ntpd in various ways including all
> defaults.  Rebuilt the kernel (& world), to the latest 12 stable.
> Reset almost all sysctl's (I change 91 of them).  I keep getting
> interrupted system call at 1 sec intervals, which I suspect is a problem
> 
> For the reader: a stratum 2 clocks 10.0.7.6 syncs with some atomic clocks
> within city; a server 10.0.7.91 running ntpd 4.2.8p14 on FreeBSD 12.1
> r358565M irregularly and usually wont sync, and experiences "interrupted
> system calls".
> 
> The jump to stratum 14 was a surprise, but not repeatable.  Sometimes the
> ntpd port starts and uses the next hop time server, but usually within 20
> minutes returns to LOCL though more often goes straight to LOCL.  During
> ktrace's I've observed:
> 
>  66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
>  66894 ntpd GIO   fd 1 wrote 46 bytes
>"select() returned -1: Interrupted system call
> ...
>"poll_update: at 1 10.0.7.6 poll 4 burst 0 retry 2 head 14 early 2
> next 16
>"
>  66894 ntpd RET   write 74/0x4a
>  66894 ntpd CALL  select(0x19,0x7fffde50,0,0,0)
>  66894 ntpd RET   select -1 errno 4 Interrupted system call
>  66894 ntpd PSIG  SIGALRM caught handler=0x80072f600 mask=0x0
> code=SI_KERNEL
>  66894 ntpd CALL  sigprocmask(SIG_SETMASK,0x7fffd7a4,0)
>  66894 ntpd RET   sigprocmask 0
>  66894 ntpd CALL  sigreturn(0x7fffd3d0)
>  66894 ntpd RET   sigreturn JUSTRETURN
>  66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
>  66894 ntpd GIO   fd 1 wrote 46 bytes
>"select() returned -1: Interrupted system call
> ...
>"select() returned -1: Interrupted system call
>"
>  66894 ntpd RET   write 46/0x2e
>  66894 ntpd CALL  select(0x19,0x7fffde50,0,0,0)
>  66894 ntpd RET   select -1 errno 4 Interrupted system call
>  66894 ntpd PSIG  SIGALRM caught handler=0x80072f600 mask=0x0
> code=SI_KERNEL
>  66894 ntpd CALL  sigprocmask(SIG_SETMASK,0x7fffd7a4,0)
>  66894 ntpd RET   sigprocmask 0
>  66894 ntpd CALL  sigreturn(0x7fffd3d0)
>  66894 ntpd RET   sigreturn JUSTRETURN
>  66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
>  66894 ntpd GIO   fd 1 wrote 46 bytes
>"select() returned -1: Interrupted system call
> but I have no idea whether or not these interrupted system calls are normal
> or not.
> 
> and with debug (-D5) this is what I see
> 
>  5 Mar 18:26:50 ntpd[86274]: select(): nfound=-1, error: Interrupted system
> call
> alarming: normal
>  5 Mar 18:26:51 ntpd[86274]: select(): nfound=-1, error: Interrupted system
> call
> poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 0 early 2 next 16
> sendpkt(21, dst=10.0.7.6, src=10.0.7.91, ttl=0, len=72)
> peer_xmit: at 17 10.0.7.91->10.0.7.6 mode 3 keyid 232f len 72 index 0
> read_network_packet: fd=21 length 72 from 10.0.7.6
> fetch_timestamp: system bintime network time stamp: 1583393211.408612992
> restrictions: looking up 10.0.7.6
> match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
> ippeerlimit -4
> match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
> ippeerlimit -1
> match_restrict4_addr: Checking 10.0.7.91, port 123 ... doesn't match:
> ippeerlimit -4
> match_restrict4_addr: Checking 10.0.7.6, port 123 ... MATCH: ippeerlimit -1
> receive: at 17 10.0.7.91<-10.0.7.6 ippeerlimit -1 mode 4 iflags
> up,broadcast,multicast restrict nomodify,nopeer,noquery,notrap org
> 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b
> MRU: interval 16 headway 8 limit 64
> receive: at 17 10.0.7.91<-10.0.7.6 mode 4/server:AM_PROCPKT keyid 232f
> len 72 auth 1 org 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b MAC
> receive: MATCH_ASSOC dispatch: mode 4/server:AM_PROCPKT
> poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 14 early 2 next 16
> clock_filter: n 2 off -3.283397 del 0.000382 dsp 3.937561 jit 0.000551
> alarming: normal
>  5 Mar 18:26:52 ntpd[86274]: select(): nfound=-1, error: Interrupted system
> call
> alarming: normal
>  5 Mar 18:26:53 ntpd[86274]: select(): nfound=-1, error: Interrupted system
> call
> alarming: normal
> 
> I have rebuild with ntp-4.2.8p14 and with no additional compile rules and
> the port's defaults. I keep getting
> 
>  6 Mar 07:05:16 ntpd[98682]: select(): nfound=-1, error: 

Re: ntp problems stratum 2 to 14?

2020-03-05 Thread Dewayne Geraghty
Ian,  Good points.  I did remove the fudge and 127.127.1.1 lines from the
config with the same result as below.  Interestingly the clock at 10.0.7.6
isn't really unreliable, as its been my time source since 2005, and serves
clients, so it is pretty ok. Without a local clock, named fails (its linked
with kerberos).  Yes I'd never seen my clock server become st 14, which
prompted me to seek help. I haven't repeated that scenario, but I
continuously get "interrupted system call" and I haven't been able to sync
while running debug or ktrace - so I dont have "what good looks like".

Thanks Peter. I've rebuilt net/ntpd in various ways including all
defaults.  Rebuilt the kernel (& world), to the latest 12 stable.
Reset almost all sysctl's (I change 91 of them).  I keep getting
interrupted system call at 1 sec intervals, which I suspect is a problem

For the reader: a stratum 2 clocks 10.0.7.6 syncs with some atomic clocks
within city; a server 10.0.7.91 running ntpd 4.2.8p14 on FreeBSD 12.1
r358565M irregularly and usually wont sync, and experiences "interrupted
system calls".

The jump to stratum 14 was a surprise, but not repeatable.  Sometimes the
ntpd port starts and uses the next hop time server, but usually within 20
minutes returns to LOCL though more often goes straight to LOCL.  During
ktrace's I've observed:

 66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
 66894 ntpd GIO   fd 1 wrote 46 bytes
   "select() returned -1: Interrupted system call
...
   "poll_update: at 1 10.0.7.6 poll 4 burst 0 retry 2 head 14 early 2
next 16
   "
 66894 ntpd RET   write 74/0x4a
 66894 ntpd CALL  select(0x19,0x7fffde50,0,0,0)
 66894 ntpd RET   select -1 errno 4 Interrupted system call
 66894 ntpd PSIG  SIGALRM caught handler=0x80072f600 mask=0x0
code=SI_KERNEL
 66894 ntpd CALL  sigprocmask(SIG_SETMASK,0x7fffd7a4,0)
 66894 ntpd RET   sigprocmask 0
 66894 ntpd CALL  sigreturn(0x7fffd3d0)
 66894 ntpd RET   sigreturn JUSTRETURN
 66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
 66894 ntpd GIO   fd 1 wrote 46 bytes
   "select() returned -1: Interrupted system call
...
   "select() returned -1: Interrupted system call
   "
 66894 ntpd RET   write 46/0x2e
 66894 ntpd CALL  select(0x19,0x7fffde50,0,0,0)
 66894 ntpd RET   select -1 errno 4 Interrupted system call
 66894 ntpd PSIG  SIGALRM caught handler=0x80072f600 mask=0x0
code=SI_KERNEL
 66894 ntpd CALL  sigprocmask(SIG_SETMASK,0x7fffd7a4,0)
 66894 ntpd RET   sigprocmask 0
 66894 ntpd CALL  sigreturn(0x7fffd3d0)
 66894 ntpd RET   sigreturn JUSTRETURN
 66894 ntpd CALL  write(0x1,0x80078e000,0x2e)
 66894 ntpd GIO   fd 1 wrote 46 bytes
   "select() returned -1: Interrupted system call
but I have no idea whether or not these interrupted system calls are normal
or not.

and with debug (-D5) this is what I see

 5 Mar 18:26:50 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
 5 Mar 18:26:51 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 0 early 2 next 16
sendpkt(21, dst=10.0.7.6, src=10.0.7.91, ttl=0, len=72)
peer_xmit: at 17 10.0.7.91->10.0.7.6 mode 3 keyid 232f len 72 index 0
read_network_packet: fd=21 length 72 from 10.0.7.6
fetch_timestamp: system bintime network time stamp: 1583393211.408612992
restrictions: looking up 10.0.7.6
match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
ippeerlimit -4
match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
ippeerlimit -1
match_restrict4_addr: Checking 10.0.7.91, port 123 ... doesn't match:
ippeerlimit -4
match_restrict4_addr: Checking 10.0.7.6, port 123 ... MATCH: ippeerlimit -1
receive: at 17 10.0.7.91<-10.0.7.6 ippeerlimit -1 mode 4 iflags
up,broadcast,multicast restrict nomodify,nopeer,noquery,notrap org
0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b
MRU: interval 16 headway 8 limit 64
receive: at 17 10.0.7.91<-10.0.7.6 mode 4/server:AM_PROCPKT keyid 232f
len 72 auth 1 org 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b MAC
receive: MATCH_ASSOC dispatch: mode 4/server:AM_PROCPKT
poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 14 early 2 next 16
clock_filter: n 2 off -3.283397 del 0.000382 dsp 3.937561 jit 0.000551
alarming: normal
 5 Mar 18:26:52 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
 5 Mar 18:26:53 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal

I have rebuild with ntp-4.2.8p14 and with no additional compile rules and
the port's defaults. I keep getting

 6 Mar 07:05:16 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
 6 Mar 07:05:17 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
 6 Mar 07:05:18 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
 6 Mar 07:05:19 ntpd[98682]: 

Re: ntp problems stratum 2 to 14?

2020-03-05 Thread Ian Lepore
On Wed, 2020-02-26 at 16:37 +1100, Dewayne Geraghty wrote:
> I usually run ntpd with both aslr and as user ntpd.  While testing I
> noticed that my server with a direct network cable to my main time keeper,
> jumped from the expected stratum 2 to 14 as follows (I record the date so I
> can synch with the debug log, also below):
> 
> vm.loadavg={ 0.09 0.10 0.18 }
> 
> Wed 26 Feb 2020 15:16:38 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
>  10.0.7.6203.35.83.2422 u   44   64  3770.147  -227.12
>  33.560
> *127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000
> 0.000
> Wed 26 Feb 2020 15:18:46 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
>  10.0.7.6LOCAL(1)14 u   42   64  3770.147  -227.12
>  44.529
> *127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000
> 0.000
> Wed 26 Feb 2020 15:20:54 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
>  10.0.7.6LOCAL(1)14 u   42   64  3770.147  -227.12
>  73.969
> *127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000
> 0.000
> Wed 26 Feb 2020 15:23:02 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
> *10.0.7.6LOCAL(1)14 u   37   64  3770.164  -370.64
>  74.119
>  127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000
> 0.000
> Time marches on
> Wed 26 Feb 2020 16:03:35 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
> *10.0.7.6LOCAL(1)14 u   11   64  1770.133   -3.148
>  72.295
>  127.127.1.1 .LOCL.  14 l  406  128   100.0000.000
> 0.000
> Wed 26 Feb 2020 16:05:43 AEDT
>  remote   refid  st t when poll reach   delay   offset
>  jitter
> ==
> *10.0.7.6203.35.83.2422 u7   64  3770.164  -42.789
>  73.762
>  127.127.1.1 .LOCL.  14 l  534  128   200.0000.000
> 0.000
> 
> The debug for the above is:
> 26 Feb 14:58:33 ntpd[8772]: Command line: /usr/local/sbin/ntpd -c
> /etc/ntp.conf -g -g -u ntpd --nofork
> ...
> 26 Feb 14:58:34 ntpd[8772]: 10.0.7.6 e014 84 reachable
> 26 Feb 14:58:35 ntpd[8772]: LOCAL(1) 8014 84 reachable
> 26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
> 26 Feb 15:03:40 ntpd[8772]: 0.0.0.0 c515 05 clock_sync
> 26 Feb 15:22:25 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer  <=== Good!
> 26 Feb 15:22:25 ntpd[8772]: 0.0.0.0 0613 03 spike_detect -0.370644 s
> 26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 061c 0c clock_step -0.536289 s
> 26 Feb 15:30:02 ntpd[8772]: 0.0.0.0 0615 05 clock_sync
> 26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
> 26 Feb 15:30:03 ntpd[8772]: 10.0.7.6 e014 84 reachable
> 26 Feb 15:30:07 ntpd[8772]: LOCAL(1) 8014 84 reachable
> 26 Feb 15:30:21 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> ...
> 26 Feb 15:46:49 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
> 26 Feb 15:46:57 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> 
> ...
> 26 Feb 15:56:58 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> ...
> 26 Feb 16:24:33 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== and stays LOCAL
> which is now normal for this box  :(
> 
> Should the jump to stratum 14 be expected?  Anything obviously wrong with
> the ntp.conf?
> 
> I've had a few days of testing on what is usually a very stable (time-wise
> system), seems that running at prio 20 is required.
> 
> /etc/ntp.conf contains
> rlimit memlock -1
> rlimit filenum 32
> driftfile /var/db/ntp/drift
> disable bclient
> server 10.0.7.6 iburst minpoll 4 maxpoll 6 version 4 key 23057 prefer
> 
> server 127.127.1.1 minpoll 7 maxpoll 7
> fudge  127.127.1.1 stratum 14
> 
> restrict -4 default ignore
> restrict -6 default ignore
> restrict 127.0.0.1  nomodify nopeer notrap
> restrict -6 ::1 nomodify nopeer notrap
> restrict 0.0.0.0 ignore
> 
> restrict 10.0.7.6 nomodify nopeer noquery notrap ntpport
> restrict 10.169.168.91 mask 255.255.255.0 nomodify nopeer noquery notrap
> ntpport kod limited
> 
> 
> I'm also very surprised that the jitter on the server (under testing) is so
> poor.  The internet facing time server is
> *x.y.z.t   .ATOM.   1 u   73  5127   23.776   34.905  95.961
> but its very old and not running aslr.
> 
> Any ideas or pointers would be appreciated.  This is very, time consuming.
> :)
> 
> I'm using the following command sequence as these are all being changed
> sysctl kern.elf64.aslr.enable=1 

Re: ntp problems stratum 2 to 14?

2020-03-05 Thread Bob Bishop
Hi,

> On 5 Mar 2020, at 06:33, Peter Jeremy  wrote:
> 
> Hi Dewayne,
> 
> Sorry for the delay.  Unfortunately, I can't really suggest anything -
> it's not clear to me why ntpd would prefer a stratum 14 clock over a
> stratum 2 clock.  Have you tried looking through the debugging hints
> page (https://www.eecis.udel.edu/~mills/ntp/html/debug.html)?
> 
> I haven't seen that problem but I don't use the local clock.
> 
> During startup, it would not seem unreasonable for the local clock to
> become valid first because it will have a lower jitter.  But ntpd
> should switch to the stratum 2 clock and stay with in as the better
> time source.  One problem is that if ntpd decides to switch away from
> the clock for any reason (eg a burst of jitter), it may get stuck on
> the local clock as it drifts further from "real" time.

Yes. I’ve had exactly that happen with a stratum 1 (GPS) server with only local 
as fallback. The GPS signal dropped for a while and the box drifted off 
sufficiently that it wouldn’t reacquire the accurate clock.

The solution is to add a few pool (or whatever) servers into the config - you 
can still ‘prefer’ a local server but it ensures the box won’t drift off into 
the weeds.

> --
> Peter Jeremy

--
Bob Bishop
r...@gid.co.uk






signature.asc
Description: Message signed with OpenPGP


Re: ntp problems stratum 2 to 14?

2020-03-04 Thread Peter Jeremy
Hi Dewayne,

Sorry for the delay.  Unfortunately, I can't really suggest anything -
it's not clear to me why ntpd would prefer a stratum 14 clock over a
stratum 2 clock.  Have you tried looking through the debugging hints
page (https://www.eecis.udel.edu/~mills/ntp/html/debug.html)?

I haven't seen that problem but I don't use the local clock.

During startup, it would not seem unreasonable for the local clock to
become valid first because it will have a lower jitter.  But ntpd
should switch to the stratum 2 clock and stay with in as the better
time source.  One problem is that if ntpd decides to switch away from
the clock for any reason (eg a burst of jitter), it may get stuck on
the local clock as it drifts further from "real" time.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: ntp problems stratum 2 to 14?

2020-02-27 Thread Dewayne Geraghty
On Thu, 27 Feb 2020 at 06:43, Peter Jeremy  wrote:

> On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty 
> wrote:
> >I usually run ntpd with both aslr and as user ntpd.  While testing I
> >noticed that my server with a direct network cable to my main time keeper,
> >jumped from the expected stratum 2 to 14 as follows (I record the date so
> I
> >can synch with the debug log, also below):
> >
> >vm.loadavg={ 0.09 0.10 0.18 }
> >
> >Wed 26 Feb 2020 15:16:38 AEDT
> > remote   refid  st t when poll reach   delay   offset
> > jitter
>
> >==
> > 10.0.7.6203.35.83.2422 u   44   64  3770.147  -227.12
> 33.560
> >*127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000
> 0.000
>
> >26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
>
> Why is this bad?  You've specified that this is a valid clock source so
> ntpd is free to use it if it decides it is the best source of time.
>
> >server 127.127.1.1 minpoll 7 maxpoll 7
> >fudge  127.127.1.1 stratum 14
>
> Synchronizing to the local clock (ie using 127.127.1.x as a reference) is
> almost never correct.  What external (to NTP) source is being used to
> synchronize the local clock?
>
> >I'm also very surprised that the jitter on the server (under testing) is
> so
> >poor.  The internet facing time server is
> >*x.y.z.t   .ATOM.   1 u   73  5127   23.776   34.905  95.961
> >but its very old and not running aslr.
>
> The 23ms distance to the peer suggests that this is over the Internet.
> What
> sort of link do you have to the Internet and how heavily loaded is it?  The
> NTP protocol includes the assumption that the client-server path delay is
> symmetric - this is often untrue for SOHO connections.  And SOHO
> connections
> will often wind up saturated in one direction - which skews the apparent
> timestamps and shows up as high jitter values.
>
> > /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork
> ...
> >I get similar results with /usr/sbin/ntpd, I've been testing both and
> >happened to record details for the port ntpd.
>
> It's probably not relevant but it would be useful for you to say up front
> which ntpd you are having problems with and which version of the port you
> have installed.
>
> --
> Peter Jeremy
>

Hi Peter, I appreciate your thoughts. Yes, using LOCL is bad because it
should only be used when the stratum 2 machine is unavailable, and it isn't
(representative ping time average 0.15ms). The load is less than 10% on
both devices and both the internet and internal traffic is typically less
than 50Kb. :/

The use of LOCL clock was a fix as named failed if ntpd only used the
timeserver and it lost sync (due to some ipsec work another story), I
suspect kerberos had a part as it uses tkey-gssapi-keytab. I should
investigate why the use of LOCL clock makes things work, but ... its a
workaround and I'm ok with it.

I'm at my wits end, I've systematically changed one variable from the list,
and always the system clock reverts to LOCL within 20 minutes if not
immediately. This is FreeBSD 12.1-STABLE #0 r356046M: Tue Dec 24. I think
its time to try an earlier ntp to see if that helps (???) :(

The variables tested, one changed at a time:
- security.mac.ntpd.enabled
- kern.elf64.aslr.enable kern.elf64.aslr.stack_gap changed as a pair
- security.mac.portacl.rules
- run as root or ntpd
- use of proccontrol (which was changed with different combinations of
aslr, stack_gap
- all off and run as root
- and of course changes to the command line (-g or -G or -g -x)

I guess everyone else is using ntpd without a problem? (or not...)
Cheers, Dewayne
PS Apologies for delay in getting back, gmail placed your reply in the spam
folder :/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntp problems stratum 2 to 14?

2020-02-26 Thread Peter Jeremy
On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty  
wrote:
>I usually run ntpd with both aslr and as user ntpd.  While testing I
>noticed that my server with a direct network cable to my main time keeper,
>jumped from the expected stratum 2 to 14 as follows (I record the date so I
>can synch with the debug log, also below):
>
>vm.loadavg={ 0.09 0.10 0.18 }
>
>Wed 26 Feb 2020 15:16:38 AEDT
> remote   refid  st t when poll reach   delay   offset
> jitter
>==
> 10.0.7.6203.35.83.2422 u   44   64  3770.147  -227.12 33.560
>*127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000  0.000

>26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad

Why is this bad?  You've specified that this is a valid clock source so
ntpd is free to use it if it decides it is the best source of time.

>server 127.127.1.1 minpoll 7 maxpoll 7
>fudge  127.127.1.1 stratum 14

Synchronizing to the local clock (ie using 127.127.1.x as a reference) is
almost never correct.  What external (to NTP) source is being used to
synchronize the local clock?

>I'm also very surprised that the jitter on the server (under testing) is so
>poor.  The internet facing time server is
>*x.y.z.t   .ATOM.   1 u   73  5127   23.776   34.905  95.961
>but its very old and not running aslr.

The 23ms distance to the peer suggests that this is over the Internet.  What
sort of link do you have to the Internet and how heavily loaded is it?  The
NTP protocol includes the assumption that the client-server path delay is
symmetric - this is often untrue for SOHO connections.  And SOHO connections
will often wind up saturated in one direction - which skews the apparent
timestamps and shows up as high jitter values.

> /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork
...
>I get similar results with /usr/sbin/ntpd, I've been testing both and
>happened to record details for the port ntpd.

It's probably not relevant but it would be useful for you to say up front
which ntpd you are having problems with and which version of the port you
have installed.

-- 
Peter Jeremy


signature.asc
Description: PGP signature