Re: OpenNTPD - no constraint reply - no time sync

2015-07-18 Thread Mikolaj Kucharski
On Sat, Jul 18, 2015 at 11:44:17AM -0600, Brent Cook wrote:
 On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski
 miko...@kucharski.name wrote:
  minimalisic patch as below fixed the issue for me:
 
  Index: constraint.c
  ===
  RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v
  retrieving revision 1.12
  diff -u -p -u -r1.12 constraint.c
  --- constraint.c28 May 2015 21:34:36 -  1.12
  +++ constraint.c28 May 2015 23:14:47 -
  @@ -279,7 +279,7 @@ constraint_check_child(void)
  cstr-addr-ss), 
  CONSTRAINT_SCAN_INTERVAL);
  }
 
  -   if (fail || cstr-state  STATE_REPLY_RECEIVED) {
  +   if (fail || cstr-state  STATE_QUERY_SENT) {
  cstr-senderrors++;
  constraint_close(cstr-fd);
  }
 
 Thank you for the in-depth analysis. I have been running this patch
 the last couple
 of days and have not experienced any issues as well. What say you reyk?

I can confirm that I run this patch from the end of May (from the time
when I've posted this to the misc mailing list) on amd64 and i386 and I
don't have any issues with it. I did not test portable.

-- 
best regards
q#



Re: OpenNTPD - no constraint reply - no time sync

2015-07-18 Thread Brent Cook
 On Jul 18, 2015, at 1:52 PM, Mikolaj Kucharski miko...@kucharski.name wrote:
 
 On Sat, Jul 18, 2015 at 11:44:17AM -0600, Brent Cook wrote:
 On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski
 miko...@kucharski.name wrote:
 minimalisic patch as below fixed the issue for me:
 
 Index: constraint.c
 ===
 RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v
 retrieving revision 1.12
 diff -u -p -u -r1.12 constraint.c
 --- constraint.c28 May 2015 21:34:36 -  1.12
 +++ constraint.c28 May 2015 23:14:47 -
 @@ -279,7 +279,7 @@ constraint_check_child(void)
cstr-addr-ss), 
 CONSTRAINT_SCAN_INTERVAL);
}
 
 -   if (fail || cstr-state  STATE_REPLY_RECEIVED) {
 +   if (fail || cstr-state  STATE_QUERY_SENT) {
cstr-senderrors++;
constraint_close(cstr-fd);
}
 
 Thank you for the in-depth analysis. I have been running this patch
 the last couple
 of days and have not experienced any issues as well. What say you reyk?
 
 I can confirm that I run this patch from the end of May (from the time
 when I've posted this to the misc mailing list) on amd64 and i386 and I
 don't have any issues with it. I did not test portable.
 

Thanks, I have committed your simpler patch.



Re: OpenNTPD - no constraint reply - no time sync

2015-07-18 Thread Brent Cook
On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski
miko...@kucharski.name wrote:
 Hi,

 I've initially reported this problem a while ago and I thought problem
 was related to IPv6. Now I belive it is not. I did some research and
 this is what I've found. I contacted Reyk couple of days ago, but didn't
 get any reply from him yet, so decided to post here.

 I'm using OpenNTPD's constraint from the time it was introduced. However
 I found that ntpd(8) often is getting stuck at the very beginning when
 it starts and never recovers from that scenario. Usually it looks as
 follows when you hit the problem:

 # ntpctl -sa
 0/4 peers valid, clock unsynced

 peer
wt tl st  next  poll  offset   delay  jitter
 87.232.1.41 0.pool.ntp.org
 1  2  -0s0s  peer not valid 
 54.171.104.100 1.pool.ntp.org
 1  2  -0s0s  peer not valid 
 193.1.193.157 2.pool.ntp.org
 1  2  -0s0s  peer not valid 
 85.91.1.180 3.pool.ntp.org
 1  2  -0s0s  peer not valid 

 All counters are zero and all peers are not valid. For me it usually
 stays like that and never recovers until rcctl restart ntpd. Then often
 it fails like above again. Multiple restarts in a row solve the problem.

 I did some debug modifications to ntpd and this is what I've found.
 I'm running my tests on:

 # sysctl -n kern.version
 OpenBSD 5.7-current (GENERIC) #955: Thu May 28 13:09:53 MDT 2015
 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

 and to keep IPv6 out of the picture I have following settings in resolv.conf:

 # grep -w family /etc/resolv.conf
 family inet4

 The ntpd server itself is configured as follows:

 # grep -ve '^#' /etc/ntpd.conf
 server 0.pool.ntp.org
 server 1.pool.ntp.org
 server 2.pool.ntp.org
 server 3.pool.ntp.org
 constraints from www.google.com

 I'm running debug version of ntpd as follows:

 ./ntpd.debug -vd 21 | ts

 Here is example debug session when I hit the issue. Lines which contain
 XXX MK are added by me via log_debug():

 May 28 23:25:54 ntp engine ready
 May 28 23:25:54 XXX MK STATE_NONE=0
 May 28 23:25:54 XXX MK STATE_DNS_INPROGRESS=1
 May 28 23:25:54 XXX MK STATE_DNS_TEMPFAIL=2
 May 28 23:25:54 XXX MK STATE_DNS_DONE=3
 May 28 23:25:54 XXX MK STATE_QUERY_SENT=4
 May 28 23:25:54 XXX MK STATE_REPLY_RECEIVED=5
 May 28 23:25:54 XXX MK STATE_INVALID=6
 May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0
 May 28 23:25:54 XXX MK client_query() start
 May 28 23:25:54 XXX MK client_query() start
 May 28 23:25:54 XXX MK client_query() start
 May 28 23:25:54 XXX MK client_query() start
 May 28 23:25:54 XXX MK constraint_query() starts state=1
 May 28 23:25:54 XXX MK constraint_query() fail with default?
 May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1
 May 28 23:25:54 XXX MK ntp_main() about to loop with control_dispatch_msg() 
 and j=3 i=3 nfds=1
 May 28 23:25:54 XXX MK ntp_main() about to loop with 
 constraint_dispatch_msg() and j=3 i=3 nfds=1
 May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0
 May 28 23:25:54 XXX MK constraint_query() starts state=1
 May 28 23:25:54 XXX MK constraint_query() fail with default?
 May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1
 May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() 
 and j=3 i=3 nfds=0
 May 28 23:25:55 XXX MK ntp_main() about to loop with 
 constraint_dispatch_msg() and j=3 i=3 nfds=0
 May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0
 May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=3
 May 28 23:25:55 constraint request to 216.58.208.68
 May 28 23:25:55 XXX MK constraint_query() succeeded?
 May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68
 May 28 23:25:55 XXX MK httpsdate_init() with 216.58.208.68
 May 28 23:25:55 XXX MK httpsdate_request() with 216.58.208.68 port 443
 May 28 23:25:55 XXX MK httpsdate_request() date from 216.58.208.68 is Date: 
 Thu, 28 May 2015 22:25:56 GMT
 May 28 23:25:55 XXX MK httpsdate_request() success from 216.58.208.68
 May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 done
 May 28 23:25:55 XXX MK ntp_main() function poll() returned -1, errno=4 i=4
 May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() 
 and j=3 i=4 nfds=-1
 May 28 23:25:55 XXX MK ntp_main() about to loop with 
 constraint_dispatch_msg() and j=3 i=4 nfds=-1
 May 28 23:25:55 XXX MK constraint_check_child() bummer, we bump senderrors 
 now senderrors=1 fail=0 state=4
 May 28 23:25:55 XXX MK constraint_close() are we going to set state to 
 STATE_INVALID? state=4
 May 28 23:25:55 XXX MK constraint_close() somehow we set the state to 
 STATE_INVALID state=6 senderrors=1
 May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0
 May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=6
 May 28 

Re: OpenNTPD - no constraint reply - no time sync

2015-07-04 Thread Atanas Vladimirov

On 29.05.2015 02:24, Mikolaj Kucharski wrote:

Hi,

I've initially reported this problem a while ago and I thought problem
was related to IPv6. Now I belive it is not. I did some research and
this is what I've found. I contacted Reyk couple of days ago, but 
didn't

get any reply from him yet, so decided to post here.

I'm using OpenNTPD's constraint from the time it was introduced. 
However

I found that ntpd(8) often is getting stuck at the very beginning when
it starts and never recovers from that scenario. Usually it looks as
follows when you hit the problem:

# ntpctl -sa
0/4 peers valid, clock unsynced

peer
   wt tl st  next  poll  offset   delay  jitter
87.232.1.41 0.pool.ntp.org
1  2  -0s0s  peer not valid 
54.171.104.100 1.pool.ntp.org
1  2  -0s0s  peer not valid 
193.1.193.157 2.pool.ntp.org
1  2  -0s0s  peer not valid 
85.91.1.180 3.pool.ntp.org
1  2  -0s0s  peer not valid 


Hi,
I observed this problem too. I'm following -current and after every 
reboot OpenNTPd stuck as

you describe. Only my NMEA sensor is working.

[ns]~$ ntpctl -sa
0/13 peers valid, 1/1 sensors valid, clock synced, stratum 1

peer
   wt tl st  next  poll  offset   delay  jitter
not resolved from pool 1.bg.pool.ntp.org
1  2  -0s   15s  peer not valid 
2001:470:1f0a:98e::2 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
2a01:8740::fffd::123 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
2a02:80e0:0:2::123 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
2a02:6800:0:99::2 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
77.70.121.24 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
92.247.168.103 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
87.97.157.120 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
176.12.47.129 from pool 2.bg.pool.ntp.org
1  2  -0s0s  peer not valid 
213.154.229.24 from pool 3.europe.pool.ntp.org
1  2  -0s0s  peer not valid 
79.175.127.142 from pool 3.europe.pool.ntp.org
1  2  -0s0s  peer not valid 
91.235.212.22 from pool 3.europe.pool.ntp.org
1  2  -0s0s  peer not valid 
82.197.164.46 from pool 3.europe.pool.ntp.org
1  2  -0s0s  peer not valid 

sensor
   wt gd st  next  poll  offset  correction
nmea0  GPS
 * 10  1  0   12s   15s-0.003ms 0.600ms

[ns]~$ uptime
12:08PM  up 12:18, 1 user, load averages: 0.47, 0.71, 0.55



All counters are zero and all peers are not valid. For me it usually
stays like that and never recovers until rcctl restart ntpd. Then often
it fails like above again. Multiple restarts in a row solve the 
problem.


Only rcctl restart ntpd helps.

[ns]~$ cat /etc/ntpd.conf
# $OpenBSD: ntpd.conf,v 1.13 2015/05/18 13:48:38 deraadt Exp $

# Addresses to listen on (ntpd does not listen by default)
listen on *

servers 1.bg.pool.ntp.org
servers 2.bg.pool.ntp.org
servers 3.europe.pool.ntp.org

server ntp.rootshells.eu weight 7

# use a specific local timedelta sensor (radio clock, etc)
sensor nmea0 correction 600 weight 10 refid GPS

# get the time constraint from a well-known HTTPS site
constraints from https://www.google.com;


OpenBSD 5.8-beta (GENERIC) #1059: Fri Jul  3 12:47:22 MDT 2015
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 2130575360 (2031MB)
avail mem = 2062237696 (1966MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xfb7f0 (50 entries)
bios0: vendor American Megatrends Inc. version 080011 date 02/20/2009
bios0: Supermicro H8SSL
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC OEMB
acpi0: wakeup devices P1P2(S4) USB0(S1) USB1(S1) USB2(S1) PS2K(S4) 
PS2M(S4) SLPB(S1)

acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 146, 1995.33 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,M

MXX,FFXSR,LONG,3DNOW2,3DNOW,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully 
associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully 
associative

cpu0: AMD erratum 89 present, BIOS upgrade may be required
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
ioapic0 at 

Re: OpenNTPD - no constraint reply - no time sync

2015-05-28 Thread Mikolaj Kucharski
Hi,

I've initially reported this problem a while ago and I thought problem
was related to IPv6. Now I belive it is not. I did some research and
this is what I've found. I contacted Reyk couple of days ago, but didn't
get any reply from him yet, so decided to post here.

I'm using OpenNTPD's constraint from the time it was introduced. However
I found that ntpd(8) often is getting stuck at the very beginning when
it starts and never recovers from that scenario. Usually it looks as
follows when you hit the problem:

# ntpctl -sa
0/4 peers valid, clock unsynced

peer
   wt tl st  next  poll  offset   delay  jitter
87.232.1.41 0.pool.ntp.org 
1  2  -0s0s  peer not valid 
54.171.104.100 1.pool.ntp.org 
1  2  -0s0s  peer not valid 
193.1.193.157 2.pool.ntp.org 
1  2  -0s0s  peer not valid 
85.91.1.180 3.pool.ntp.org 
1  2  -0s0s  peer not valid 

All counters are zero and all peers are not valid. For me it usually
stays like that and never recovers until rcctl restart ntpd. Then often
it fails like above again. Multiple restarts in a row solve the problem.

I did some debug modifications to ntpd and this is what I've found.
I'm running my tests on:

# sysctl -n kern.version
OpenBSD 5.7-current (GENERIC) #955: Thu May 28 13:09:53 MDT 2015
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

and to keep IPv6 out of the picture I have following settings in resolv.conf:

# grep -w family /etc/resolv.conf
family inet4

The ntpd server itself is configured as follows:

# grep -ve '^#' /etc/ntpd.conf
server 0.pool.ntp.org
server 1.pool.ntp.org
server 2.pool.ntp.org
server 3.pool.ntp.org
constraints from www.google.com

I'm running debug version of ntpd as follows:

./ntpd.debug -vd 21 | ts 

Here is example debug session when I hit the issue. Lines which contain
XXX MK are added by me via log_debug():

May 28 23:25:54 ntp engine ready
May 28 23:25:54 XXX MK STATE_NONE=0
May 28 23:25:54 XXX MK STATE_DNS_INPROGRESS=1
May 28 23:25:54 XXX MK STATE_DNS_TEMPFAIL=2
May 28 23:25:54 XXX MK STATE_DNS_DONE=3
May 28 23:25:54 XXX MK STATE_QUERY_SENT=4
May 28 23:25:54 XXX MK STATE_REPLY_RECEIVED=5
May 28 23:25:54 XXX MK STATE_INVALID=6
May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0
May 28 23:25:54 XXX MK client_query() start
May 28 23:25:54 XXX MK client_query() start
May 28 23:25:54 XXX MK client_query() start
May 28 23:25:54 XXX MK client_query() start
May 28 23:25:54 XXX MK constraint_query() starts state=1
May 28 23:25:54 XXX MK constraint_query() fail with default?
May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1
May 28 23:25:54 XXX MK ntp_main() about to loop with control_dispatch_msg() and 
j=3 i=3 nfds=1
May 28 23:25:54 XXX MK ntp_main() about to loop with constraint_dispatch_msg() 
and j=3 i=3 nfds=1
May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0
May 28 23:25:54 XXX MK constraint_query() starts state=1
May 28 23:25:54 XXX MK constraint_query() fail with default?
May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1
May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() and 
j=3 i=3 nfds=0
May 28 23:25:55 XXX MK ntp_main() about to loop with constraint_dispatch_msg() 
and j=3 i=3 nfds=0
May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0
May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=3
May 28 23:25:55 constraint request to 216.58.208.68
May 28 23:25:55 XXX MK constraint_query() succeeded?
May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68
May 28 23:25:55 XXX MK httpsdate_init() with 216.58.208.68
May 28 23:25:55 XXX MK httpsdate_request() with 216.58.208.68 port 443
May 28 23:25:55 XXX MK httpsdate_request() date from 216.58.208.68 is Date: 
Thu, 28 May 2015 22:25:56 GMT
May 28 23:25:55 XXX MK httpsdate_request() success from 216.58.208.68
May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 done
May 28 23:25:55 XXX MK ntp_main() function poll() returned -1, errno=4 i=4
May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() and 
j=3 i=4 nfds=-1
May 28 23:25:55 XXX MK ntp_main() about to loop with constraint_dispatch_msg() 
and j=3 i=4 nfds=-1
May 28 23:25:55 XXX MK constraint_check_child() bummer, we bump senderrors now 
senderrors=1 fail=0 state=4
May 28 23:25:55 XXX MK constraint_close() are we going to set state to 
STATE_INVALID? state=4
May 28 23:25:55 XXX MK constraint_close() somehow we set the state to 
STATE_INVALID state=6 senderrors=1
May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0
May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=6
May 28 23:25:55 XXX MK constraint_query() fail with STATE_INVALID last=850 
now=850 diff=0
May 28 23:25:55 XXX MK ntp_main() function constraint_query() returned -1 for 

OpenNTPD - no constraint reply - no time sync

2015-03-30 Thread Mikolaj Kucharski
Hi,

I have following ntpd.conf file on an OpenBSD machine:

OpenBSD 5.7-current (GENERIC) #839: Mon Mar 30 14:21:47 MDT 2015
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

# /etc/ntpd.conf
server 0.pool.ntp.org
server 1.pool.ntp.org
server 2.pool.ntp.org
server 3.pool.ntp.org
constraints from www.google.com

I've noticed that ntpd(8) often ends up with all peers in 'peer not valid'
state, never syncing the time. I've enabled debug logging and this is
what I see in the logs:

Mar 31 00:29:38 openbsd57 ntpd[12097]: ntp engine ready
Mar 31 00:29:39 openbsd57 ntpd[12097]: constraint request to 
2a00:1450:400b:800::2004
Mar 31 00:29:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132
Mar 31 00:29:39 openbsd57 ntpd[30212]: tls failed: 2a00:1450:400b:800::2004: 
connect
Mar 31 00:29:39 openbsd57 ntpd[12097]: no constraint reply from 
2a00:1450:400b:800::2004 received in time, next query 900s
Mar 31 00:34:37 openbsd57 ntpd[12097]: 0 out of 4 peers valid
Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 0.pool.ntp.org (54.194.18.100) 
Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 1.pool.ntp.org (149.157.192.5) 
Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 2.pool.ntp.org (193.1.219.116) 
Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 3.pool.ntp.org (85.91.1.164) 
Mar 31 00:44:39 openbsd57 ntpd[12097]: constraint request to 
2a00:1450:400b:800::2004
Mar 31 00:44:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132
Mar 31 00:44:39 openbsd57 ntpd[1720]: tls failed: 2a00:1450:400b:800::2004: 
connect
Mar 31 00:44:39 openbsd57 ntpd[12097]: no constraint reply from 
2a00:1450:400b:800::2004 received in time, next query 900s
Mar 31 00:59:39 openbsd57 ntpd[12097]: constraint request to 
2a00:1450:400b:800::2004
Mar 31 00:59:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132
Mar 31 00:59:39 openbsd57 ntpd[13994]: tls failed: 2a00:1450:400b:800::2004: 
connect
Mar 31 00:59:39 openbsd57 ntpd[12097]: no constraint reply from 
2a00:1450:400b:800::2004 received in time, next query 900s


That box doesn't have IPv6 configured:

# ifconfig
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 32768
priority: 0
groups: lo
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff00
vio0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 52:54:00:12:34:72
priority: 0
groups: egress
media: Ethernet autoselect
status: active
inet 172.16.0.52 netmask 0xff00 broadcast 172.16.0.255
enc0: flags=0
priority: 0
groups: enc
status: active


# nc -v 2a00:1450:400b:800::2004 443
nc: connect to 2a00:1450:400b:800::2004 port 443 (tcp) failed: No route to host


-- 
best regards
q#