Re: ntpd dies nightly on a server with jails

2017-03-23 Thread O. Hartmann
On Fri, 17 Mar 2017 12:20:15 -0600
Ian Lepore  wrote:

> On Fri, 2017-03-17 at 18:05 +0100, O. Hartmann wrote:
> > Am Wed, 15 Mar 2017 13:12:37 -0700
> > Cy Schubert  schrieb:
> >   
> > > 
> > > Hi O.Hartmann,
> > > 
> > > I'll try to answer as much as I can in the noon hour I have left.
> > > 
> > > In message <20170315071724.78bb0...@freyja.zeit4.iv.bundesimmobilie  
> > > n.de>,   
> > > "O. H
> > > artmann" writes:  
> > > > 
> > > > Running a host with several jails on recent CURRENT (12.0-CURRENT 
> > > > #8 r315187:
> > > > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily
> > > > basis.
> > > > 
> > > > The box is an older two-socket Fujitsu server equipted with two
> > > > four-core
> > > > Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.
> > > > 
> > > > The box has several jails, each jail does NOT run service ntpd.
> > > > Each jail has
> > > > its dedicated loopback, lo1 throughout lo5 (for the moment) with
> > > > dedicated IP
> > > > :
> > > > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > > > 
> > > > The host itself has two main NICs, broadcom based. bcm0 is
> > > > dedicated to the
> > > > host, bcm1 is shared amongst the jails: each jail has an IP bound
> > > > to bcm1 via
> > > > whihc the jails communicate with the network.
> > > > 
> > > > I try to capture log informations via syslog, but FreeBSD's ntpd
> > > > seems to be
> > > > very, very sparse with such informations, coverging to null - I
> > > > can't see
> > > > anything suiatble in the logs why NTPD dies almost every night
> > > > leaving the
> > > > system with a wild reset of time. Sometimes it is a gain of 6
> > > > hours, sometime
> > > > s
> > > > it is only half an hour. I leave the box at 16:00 local time
> > > > usually and take
> > > > care again at ~ 7 o'clock in the morning local time.    
> > > We will need to turn on debugging. Unfortunately debug code is not
> > > compiled 
> > > into the binary. We have two options. You can either update 
> > > src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's
> > > the exact 
> > > same ntp) with the DEBUG option -- this is probably simpler. Then
> > > enable 
> > > debug with -d and -D. -D increases verbosity. I just committed a
> > > debug 
> > > option to both ntp ports to assist here.
> > > 
> > > Next question: Do you see any indication of a core dump? I'd be
> > > interested 
> > > in looking at it if possible.
> > >   
> > > > 
> > > > 
> > > > When the clock is floating that wild, in all cases ntpd isn't
> > > > running any mor
> > > > e.
> > > > I try to restart with options -g and -G to adjust the time
> > > > quickly at the
> > > > beginning, which works fine.    
> > > This is disconcerting. If your clock is floating wildly without
> > > ntpd 
> > > running there are other issues that might be at play here. At most
> > > the 
> > > clock might drift a little, maybe a minute or two a day but not by
> > > a lot. 
> > > Does the drift cause your clocks to run fast or slow?
> > >   
> > > > 
> > > > 
> > > > Apart from possible misconfigurations of the jails (I'm quite new
> > > > to jails an
> > > > d
> > > > their pitfalls), I was wondering what causes ntpd to die. i can't
> > > > determine
> > > > exactly the time of its death, so it might be related to
> > > > diurnal/periodic
> > > > processes (I use only the most vanilla configurations on
> > > > periodic, except for
> > > > checking ZFS's scrubbing enabled).    
> > > As I'm a little rushed for time, I didn't catch whether the jails 
> > > themselves were also running ntpd... just thought I'd ask. I don't
> > > see how 
> > > zfs scrubbing or any other periodic scripts could cause this.
> > >   
> > > > 
> > > > 
> > > > I'ven't had the chance to check whether the hardware is
> > > > completely all right,
> > > > but from a superficial point of view there is no issue with high
> > > > gain of the
> > > > internal clock or other hardware issues.    
> > > It's probably a good idea to check. I don't think that would cause
> > > ntpd any 
> > > gas. I've seen RTC battery messages on my gear which haven't caused
> > > ntpd 
> > > any problem. I have two machines which complain about RTC battery
> > > being 
> > > dead, where in fact I have replaced the batteries and the messages
> > > still 
> > > are displayed at boot. I'm not sure if it's possible for a kernel
> > > to damage 
> > > the RTC. In my case that doesn't cause ntpd any problems. It's
> > > probably 
> > > good to check anyway.
> > >   
> > > > 
> > > > 
> > > > If there are known issues with jails (the problem occurs since I
> > > > use those),
> > > > advice is appreciated.    
> > > Not that I know of.
> > > 
> > >   
> > Just some strange news:
> > 
> > I left the server the whole day with ntpd disabled and I didn't watch
> > a gain of the RTC
> > by one second, even stressing the machine.
> > 
> > But soon after restarting ntpd, I realised immediately a 30 minutes
> > off! 

Re: ntpd dies nightly on a server with jails

2017-03-17 Thread Cy Schubert
In message <1489782793.40576.185.ca...@freebsd.org>, Ian Lepore writes:
> On Fri, 2017-03-17 at 13:26 -0700, Don Lewis wrote:
> > On 17 Mar, O. Hartmann wrote:
> > 
> > > 
> > > Just some strange news:
> > > 
> > > I left the server the whole day with ntpd disabled and I didn't
> > > watch
> > > a gain of the RTC by one second, even stressing the machine.
> > > 
> > > But soon after restarting ntpd, I realised immediately a 30 minutes
> > > off! This morning, the discrapancy was almost 5 hours - it looked
> > > more
> > > like a weird ajustment to another time base than UTC.
> > > 
> > > Over the weekend I'll leave the server with ntpd disabled and only
> > > RTC
> > > running. I've the strange feeling that something is intentionally
> > > readjusting the ntpd time due to a misconfiguration or a rogue ntp
> > > server in the X.CC.pool.ntp.org
> > A ntp should recognize a single bad server and ignore it in favor of 
> > the other servers that are sane.
> > 
> > It sounds like something is going off the rails once ntpd starts
> > calling
> > adjtime().  What is the output of:
> > sysctl kern.clockrate
> > 
> > I'd suggest starting ntpd and running "ntpq -c pe" a few times a
> > minute
> > and capturing its output to monitor the status of ntpd as it starts
> > up
> > and try to capture things going wrong.   You should probably disable
> > iburst in ntp.conf to give more visibility in the early startup.
> > 
> > For the first few minutes ntpd should just be getting reliable
> > timestamp
> > info and won't start trying to adjust the clock until it has captured
> > endough samples and figured out which servers are best.  Then the
> > behaviour of the offset is the thing to watch.  If the iniital offset
> > is
> > large enough, ntpd will step the clock once to get it close to zero,
> > otherwise it will just use adjtime to slowy push the offset towards
> > zero.  I think though that you will see the offset start gyrating
> > madly.
> > 
> > You might want to set /var/db/ntpd.drift to zero beforehand if there
> > is
> > an insane value in there.  If the initial drift value is bogus, will
> > try
> > to use it which will push the time offset away from zero so fast that
> > it
> > will decide to keep stepping the clock back to zero before it can
> > capture enough samples from the external servers to determine the
> > true
> > local clock drift rate.
> 
> Do not set ntpd.drift contents to zero.  Delete the file.  There's a
> huge difference between a file that says the clock is perfect and a
> missing file which triggers ntpd to do a 15-minute frequency
> measurement to come up with the initial drift correction.

Yes. And, without debugging output and/or a dump, I don't think we'll be 
any closer to the truth. Until then the best we can do is make educated 
guesses.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ntpd dies nightly on a server with jails

2017-03-17 Thread Ian Lepore
On Fri, 2017-03-17 at 13:26 -0700, Don Lewis wrote:
> On 17 Mar, O. Hartmann wrote:
> 
> > 
> > Just some strange news:
> > 
> > I left the server the whole day with ntpd disabled and I didn't
> > watch
> > a gain of the RTC by one second, even stressing the machine.
> > 
> > But soon after restarting ntpd, I realised immediately a 30 minutes
> > off! This morning, the discrapancy was almost 5 hours - it looked
> > more
> > like a weird ajustment to another time base than UTC.
> > 
> > Over the weekend I'll leave the server with ntpd disabled and only
> > RTC
> > running. I've the strange feeling that something is intentionally
> > readjusting the ntpd time due to a misconfiguration or a rogue ntp
> > server in the X.CC.pool.ntp.org
> A ntp should recognize a single bad server and ignore it in favor of 
> the other servers that are sane.
> 
> It sounds like something is going off the rails once ntpd starts
> calling
> adjtime().  What is the output of:
>   sysctl kern.clockrate
> 
> I'd suggest starting ntpd and running "ntpq -c pe" a few times a
> minute
> and capturing its output to monitor the status of ntpd as it starts
> up
> and try to capture things going wrong.   You should probably disable
> iburst in ntp.conf to give more visibility in the early startup.
> 
> For the first few minutes ntpd should just be getting reliable
> timestamp
> info and won't start trying to adjust the clock until it has captured
> endough samples and figured out which servers are best.  Then the
> behaviour of the offset is the thing to watch.  If the iniital offset
> is
> large enough, ntpd will step the clock once to get it close to zero,
> otherwise it will just use adjtime to slowy push the offset towards
> zero.  I think though that you will see the offset start gyrating
> madly.
> 
> You might want to set /var/db/ntpd.drift to zero beforehand if there
> is
> an insane value in there.  If the initial drift value is bogus, will
> try
> to use it which will push the time offset away from zero so fast that
> it
> will decide to keep stepping the clock back to zero before it can
> capture enough samples from the external servers to determine the
> true
> local clock drift rate.

Do not set ntpd.drift contents to zero.  Delete the file.  There's a
huge difference between a file that says the clock is perfect and a
missing file which triggers ntpd to do a 15-minute frequency
measurement to come up with the initial drift correction.

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ntpd dies nightly on a server with jails

2017-03-17 Thread Don Lewis
On 17 Mar, O. Hartmann wrote:

> Just some strange news:
> 
> I left the server the whole day with ntpd disabled and I didn't watch
> a gain of the RTC by one second, even stressing the machine.
> 
> But soon after restarting ntpd, I realised immediately a 30 minutes
> off! This morning, the discrapancy was almost 5 hours - it looked more
> like a weird ajustment to another time base than UTC.
> 
> Over the weekend I'll leave the server with ntpd disabled and only RTC
> running. I've the strange feeling that something is intentionally
> readjusting the ntpd time due to a misconfiguration or a rogue ntp
> server in the X.CC.pool.ntp.org

A ntp should recognize a single bad server and ignore it in favor of 
the other servers that are sane.

It sounds like something is going off the rails once ntpd starts calling
adjtime().  What is the output of:
sysctl kern.clockrate

I'd suggest starting ntpd and running "ntpq -c pe" a few times a minute
and capturing its output to monitor the status of ntpd as it starts up
and try to capture things going wrong.   You should probably disable
iburst in ntp.conf to give more visibility in the early startup.

For the first few minutes ntpd should just be getting reliable timestamp
info and won't start trying to adjust the clock until it has captured
endough samples and figured out which servers are best.  Then the
behaviour of the offset is the thing to watch.  If the iniital offset is
large enough, ntpd will step the clock once to get it close to zero,
otherwise it will just use adjtime to slowy push the offset towards
zero.  I think though that you will see the offset start gyrating madly.

You might want to set /var/db/ntpd.drift to zero beforehand if there is
an insane value in there.  If the initial drift value is bogus, will try
to use it which will push the time offset away from zero so fast that it
will decide to keep stepping the clock back to zero before it can
capture enough samples from the external servers to determine the true
local clock drift rate.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ntpd dies nightly on a server with jails

2017-03-17 Thread Ian Lepore
On Fri, 2017-03-17 at 18:05 +0100, O. Hartmann wrote:
> Am Wed, 15 Mar 2017 13:12:37 -0700
> Cy Schubert  schrieb:
> 
> > 
> > Hi O.Hartmann,
> > 
> > I'll try to answer as much as I can in the noon hour I have left.
> > 
> > In message <20170315071724.78bb0...@freyja.zeit4.iv.bundesimmobilie
> > n.de>, 
> > "O. H
> > artmann" writes:
> > > 
> > > Running a host with several jails on recent CURRENT (12.0-CURRENT 
> > > #8 r315187:
> > > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily
> > > basis.
> > > 
> > > The box is an older two-socket Fujitsu server equipted with two
> > > four-core
> > > Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.
> > > 
> > > The box has several jails, each jail does NOT run service ntpd.
> > > Each jail has
> > > its dedicated loopback, lo1 throughout lo5 (for the moment) with
> > > dedicated IP
> > > :
> > > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > > 
> > > The host itself has two main NICs, broadcom based. bcm0 is
> > > dedicated to the
> > > host, bcm1 is shared amongst the jails: each jail has an IP bound
> > > to bcm1 via
> > > whihc the jails communicate with the network.
> > > 
> > > I try to capture log informations via syslog, but FreeBSD's ntpd
> > > seems to be
> > > very, very sparse with such informations, coverging to null - I
> > > can't see
> > > anything suiatble in the logs why NTPD dies almost every night
> > > leaving the
> > > system with a wild reset of time. Sometimes it is a gain of 6
> > > hours, sometime
> > > s
> > > it is only half an hour. I leave the box at 16:00 local time
> > > usually and take
> > > care again at ~ 7 o'clock in the morning local time.  
> > We will need to turn on debugging. Unfortunately debug code is not
> > compiled 
> > into the binary. We have two options. You can either update 
> > src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's
> > the exact 
> > same ntp) with the DEBUG option -- this is probably simpler. Then
> > enable 
> > debug with -d and -D. -D increases verbosity. I just committed a
> > debug 
> > option to both ntp ports to assist here.
> > 
> > Next question: Do you see any indication of a core dump? I'd be
> > interested 
> > in looking at it if possible.
> > 
> > > 
> > > 
> > > When the clock is floating that wild, in all cases ntpd isn't
> > > running any mor
> > > e.
> > > I try to restart with options -g and -G to adjust the time
> > > quickly at the
> > > beginning, which works fine.  
> > This is disconcerting. If your clock is floating wildly without
> > ntpd 
> > running there are other issues that might be at play here. At most
> > the 
> > clock might drift a little, maybe a minute or two a day but not by
> > a lot. 
> > Does the drift cause your clocks to run fast or slow?
> > 
> > > 
> > > 
> > > Apart from possible misconfigurations of the jails (I'm quite new
> > > to jails an
> > > d
> > > their pitfalls), I was wondering what causes ntpd to die. i can't
> > > determine
> > > exactly the time of its death, so it might be related to
> > > diurnal/periodic
> > > processes (I use only the most vanilla configurations on
> > > periodic, except for
> > > checking ZFS's scrubbing enabled).  
> > As I'm a little rushed for time, I didn't catch whether the jails 
> > themselves were also running ntpd... just thought I'd ask. I don't
> > see how 
> > zfs scrubbing or any other periodic scripts could cause this.
> > 
> > > 
> > > 
> > > I'ven't had the chance to check whether the hardware is
> > > completely all right,
> > > but from a superficial point of view there is no issue with high
> > > gain of the
> > > internal clock or other hardware issues.  
> > It's probably a good idea to check. I don't think that would cause
> > ntpd any 
> > gas. I've seen RTC battery messages on my gear which haven't caused
> > ntpd 
> > any problem. I have two machines which complain about RTC battery
> > being 
> > dead, where in fact I have replaced the batteries and the messages
> > still 
> > are displayed at boot. I'm not sure if it's possible for a kernel
> > to damage 
> > the RTC. In my case that doesn't cause ntpd any problems. It's
> > probably 
> > good to check anyway.
> > 
> > > 
> > > 
> > > If there are known issues with jails (the problem occurs since I
> > > use those),
> > > advice is appreciated.  
> > Not that I know of.
> > 
> > 
> Just some strange news:
> 
> I left the server the whole day with ntpd disabled and I didn't watch
> a gain of the RTC
> by one second, even stressing the machine.
> 
> But soon after restarting ntpd, I realised immediately a 30 minutes
> off! This morning,
> the discrapancy was almost 5 hours - it looked more like a weird
> ajustment to another
> time base than UTC.
> 
> Over the weekend I'll leave the server with ntpd disabled and only
> RTC running. I've the
> strange feeling that something is intentionally readjusting the ntpd
> time due to a
> misconfiguration or a rogue ntp server in the 

Re: ntpd dies nightly on a server with jails

2017-03-17 Thread O. Hartmann
Am Wed, 15 Mar 2017 13:12:37 -0700
Cy Schubert  schrieb:

> Hi O.Hartmann,
> 
> I'll try to answer as much as I can in the noon hour I have left.
> 
> In message <20170315071724.78bb0...@freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > Running a host with several jails on recent CURRENT (12.0-CURRENT #8 
> > r315187:
> > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> > 
> > The box is an older two-socket Fujitsu server equipted with two four-core
> > Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.
> > 
> > The box has several jails, each jail does NOT run service ntpd. Each jail 
> > has
> > its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated 
> > IP
> > :
> > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > 
> > The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> > host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 
> > via
> > whihc the jails communicate with the network.
> > 
> > I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> > very, very sparse with such informations, coverging to null - I can't see
> > anything suiatble in the logs why NTPD dies almost every night leaving the
> > system with a wild reset of time. Sometimes it is a gain of 6 hours, 
> > sometime
> > s
> > it is only half an hour. I leave the box at 16:00 local time usually and 
> > take
> > care again at ~ 7 o'clock in the morning local time.  
> 
> We will need to turn on debugging. Unfortunately debug code is not compiled 
> into the binary. We have two options. You can either update 
> src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
> same ntp) with the DEBUG option -- this is probably simpler. Then enable 
> debug with -d and -D. -D increases verbosity. I just committed a debug 
> option to both ntp ports to assist here.
> 
> Next question: Do you see any indication of a core dump? I'd be interested 
> in looking at it if possible.
> 
> > 
> > When the clock is floating that wild, in all cases ntpd isn't running any 
> > mor
> > e.
> > I try to restart with options -g and -G to adjust the time quickly at the
> > beginning, which works fine.  
> 
> This is disconcerting. If your clock is floating wildly without ntpd 
> running there are other issues that might be at play here. At most the 
> clock might drift a little, maybe a minute or two a day but not by a lot. 
> Does the drift cause your clocks to run fast or slow?
> 
> > 
> > Apart from possible misconfigurations of the jails (I'm quite new to jails 
> > an
> > d
> > their pitfalls), I was wondering what causes ntpd to die. i can't determine
> > exactly the time of its death, so it might be related to diurnal/periodic
> > processes (I use only the most vanilla configurations on periodic, except 
> > for
> > checking ZFS's scrubbing enabled).  
> 
> As I'm a little rushed for time, I didn't catch whether the jails 
> themselves were also running ntpd... just thought I'd ask. I don't see how 
> zfs scrubbing or any other periodic scripts could cause this.
> 
> > 
> > I'ven't had the chance to check whether the hardware is completely all 
> > right,
> > but from a superficial point of view there is no issue with high gain of the
> > internal clock or other hardware issues.  
> 
> It's probably a good idea to check. I don't think that would cause ntpd any 
> gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
> any problem. I have two machines which complain about RTC battery being 
> dead, where in fact I have replaced the batteries and the messages still 
> are displayed at boot. I'm not sure if it's possible for a kernel to damage 
> the RTC. In my case that doesn't cause ntpd any problems. It's probably 
> good to check anyway.
> 
> > 
> > If there are known issues with jails (the problem occurs since I use those),
> > advice is appreciated.  
> 
> Not that I know of.
> 
> 

Just some strange news:

I left the server the whole day with ntpd disabled and I didn't watch a gain of 
the RTC
by one second, even stressing the machine.

But soon after restarting ntpd, I realised immediately a 30 minutes off! This 
morning,
the discrapancy was almost 5 hours - it looked more like a weird ajustment to 
another
time base than UTC.

Over the weekend I'll leave the server with ntpd disabled and only RTC running. 
I've the
strange feeling that something is intentionally readjusting the ntpd time due 
to a
misconfiguration or a rogue ntp server in the X.CC.pool.ntp.org

-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).


pgpEH0t0CwOhX.pgp
Description: OpenPGP digital signature


Re: ntpd dies nightly on a server with jails

2017-03-16 Thread Don Lewis
On 16 Mar, O. Hartmann wrote:
> Am Wed, 15 Mar 2017 13:12:37 -0700
> Cy Schubert  schrieb:

>> > 
>> > When the clock is floating that wild, in all cases ntpd isn't
>> > running any mor e.
>> > I try to restart with options -g and -G to adjust the time quickly
>> > at the beginning, which works fine.
>> 
>> This is disconcerting. If your clock is floating wildly without ntpd 
>> running there are other issues that might be at play here. At most
>> the clock might drift a little, maybe a minute or two a day but not
>> by a lot. Does the drift cause your clocks to run fast or slow?
> 
> Today, I switched off ntpd on the jail-bearing host. After an hour or
> so the gain of the clock wasn't apart from my DCF77 clock - at least
> not within the granularity of the minutes. So I switched on ntpd
> again. After a while, I checked status via "service ntpd status", and
> I would bet off my ass that the result was "is running with PID XXX".
> The next minute I did the same, the clock was off by almost half an
> hour (always behind real time, never before!) and ntpd wasn't running.
> A coincidence? I can not tell, I did a "clear" on the terminal :-( But
> that was strange.

I think that ntp might exit if it sees time going insane.  According to
this old discussion, the exit is silent:
https://forum.pfsense.org/index.php?topic=53906.0

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ntpd dies nightly on a server with jails

2017-03-16 Thread O. Hartmann
Am Wed, 15 Mar 2017 13:12:37 -0700
Cy Schubert  schrieb:


Thank you very much for responding.

> Hi O.Hartmann,
> 
> I'll try to answer as much as I can in the noon hour I have left.
> 
> In message <20170315071724.78bb0...@freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > Running a host with several jails on recent CURRENT (12.0-CURRENT #8 
> > r315187:
> > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> > 
> > The box is an older two-socket Fujitsu server equipted with two four-core
> > Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.
> > 
> > The box has several jails, each jail does NOT run service ntpd. Each jail 
> > has
> > its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated 
> > IP
> > :
> > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > 
> > The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> > host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 
> > via
> > whihc the jails communicate with the network.
> > 
> > I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> > very, very sparse with such informations, coverging to null - I can't see
> > anything suiatble in the logs why NTPD dies almost every night leaving the
> > system with a wild reset of time. Sometimes it is a gain of 6 hours, 
> > sometime
> > s
> > it is only half an hour. I leave the box at 16:00 local time usually and 
> > take
> > care again at ~ 7 o'clock in the morning local time.  
> 
> We will need to turn on debugging. Unfortunately debug code is not compiled 
> into the binary. We have two options. You can either update 
> src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
> same ntp) with the DEBUG option -- this is probably simpler. Then enable 
> debug with -d and -D. -D increases verbosity. I just committed a debug 
> option to both ntp ports to assist here.

I realised that this wasn't the case when I turned the switch on ntpd simply on 
- the
output was the same as before. So I feared that I have to recompile with 
debugging
explicitely switched on ...

> 
> Next question: Do you see any indication of a core dump? I'd be interested 
> in looking at it if possible.

I have, intentionally, switched off core dumping. I will switch that on. But in 
all
messages being logged and searched for "ntp", I never saw any error resulting 
in a crash,
but I'll look tomorrow closer.

> 
> > 
> > When the clock is floating that wild, in all cases ntpd isn't running any 
> > mor
> > e.
> > I try to restart with options -g and -G to adjust the time quickly at the
> > beginning, which works fine.  
> 
> This is disconcerting. If your clock is floating wildly without ntpd 
> running there are other issues that might be at play here. At most the 
> clock might drift a little, maybe a minute or two a day but not by a lot. 
> Does the drift cause your clocks to run fast or slow?

Today, I switched off ntpd on the jail-bearing host. After an hour or so the 
gain of the
clock wasn't apart from my DCF77 clock - at least not within the granularity of 
the
minutes. So I switched on ntpd again. After a while, I checked status via 
"service ntpd
status", and I would bet off my ass that the result was "is running with PID 
XXX". The
next minute I did the same, the clock was off by almost half an hour (always 
behind real
time, never before!) and ntpd wasn't running. A coincidence? I can not tell, I 
did a
"clear" on the terminal :-( But that was strange.

> 
> > 
> > Apart from possible misconfigurations of the jails (I'm quite new to jails 
> > an
> > d
> > their pitfalls), I was wondering what causes ntpd to die. i can't determine
> > exactly the time of its death, so it might be related to diurnal/periodic
> > processes (I use only the most vanilla configurations on periodic, except 
> > for
> > checking ZFS's scrubbing enabled).  
> 
> As I'm a little rushed for time, I didn't catch whether the jails 
> themselves were also running ntpd... just thought I'd ask. I don't see how 
> zfs scrubbing or any other periodic scripts could cause this.

The jails do not have ntpd running since all the docs I read tell, that the 
jail-bearing
host provides the time. So I checked/ double-checked, that they do not have 
ntpd running.

By mentioning ZFS and scrubbing I was more thinking about time-adjusting 
periodic jobs
like adjkerntz or friends - if there are any I'm not aware of. I see, it's more 
confusing.

> 
> > 
> > I'ven't had the chance to check whether the hardware is completely all 
> > right,
> > but from a superficial point of view there is no issue with high gain of the
> > internal clock or other hardware issues.  
> 
> It's probably a good idea to check. I don't think that would cause ntpd any 
> gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
> any problem. I have two machines which complain about RTC battery being 
> dead, 

Re: ntpd dies nightly on a server with jails

2017-03-15 Thread Cy Schubert
Hi O.Hartmann,

I'll try to answer as much as I can in the noon hour I have left.

In message <20170315071724.78bb0...@freyja.zeit4.iv.bundesimmobilien.de>, 
"O. H
artmann" writes:
> Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r315187:
> Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> 
> The box is an older two-socket Fujitsu server equipted with two four-core
> Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.
> 
> The box has several jails, each jail does NOT run service ntpd. Each jail has
> its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated IP
> :
> 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> 
> The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 via
> whihc the jails communicate with the network.
> 
> I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> very, very sparse with such informations, coverging to null - I can't see
> anything suiatble in the logs why NTPD dies almost every night leaving the
> system with a wild reset of time. Sometimes it is a gain of 6 hours, sometime
> s
> it is only half an hour. I leave the box at 16:00 local time usually and take
> care again at ~ 7 o'clock in the morning local time.

We will need to turn on debugging. Unfortunately debug code is not compiled 
into the binary. We have two options. You can either update 
src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
same ntp) with the DEBUG option -- this is probably simpler. Then enable 
debug with -d and -D. -D increases verbosity. I just committed a debug 
option to both ntp ports to assist here.

Next question: Do you see any indication of a core dump? I'd be interested 
in looking at it if possible.

> 
> When the clock is floating that wild, in all cases ntpd isn't running any mor
> e.
> I try to restart with options -g and -G to adjust the time quickly at the
> beginning, which works fine.

This is disconcerting. If your clock is floating wildly without ntpd 
running there are other issues that might be at play here. At most the 
clock might drift a little, maybe a minute or two a day but not by a lot. 
Does the drift cause your clocks to run fast or slow?

> 
> Apart from possible misconfigurations of the jails (I'm quite new to jails an
> d
> their pitfalls), I was wondering what causes ntpd to die. i can't determine
> exactly the time of its death, so it might be related to diurnal/periodic
> processes (I use only the most vanilla configurations on periodic, except for
> checking ZFS's scrubbing enabled).

As I'm a little rushed for time, I didn't catch whether the jails 
themselves were also running ntpd... just thought I'd ask. I don't see how 
zfs scrubbing or any other periodic scripts could cause this.

> 
> I'ven't had the chance to check whether the hardware is completely all right,
> but from a superficial point of view there is no issue with high gain of the
> internal clock or other hardware issues.

It's probably a good idea to check. I don't think that would cause ntpd any 
gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
any problem. I have two machines which complain about RTC battery being 
dead, where in fact I have replaced the batteries and the messages still 
are displayed at boot. I'm not sure if it's possible for a kernel to damage 
the RTC. In my case that doesn't cause ntpd any problems. It's probably 
good to check anyway.

> 
> If there are known issues with jails (the problem occurs since I use those),
> advice is appreciated.

Not that I know of.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


ntpd dies nightly on a server with jails

2017-03-15 Thread O. Hartmann
Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r315187:
Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.

The box is an older two-socket Fujitsu server equipted with two four-core
Intel(R) Xeon(R) CPU L5420  @ 2.50GHz.

The box has several jails, each jail does NOT run service ntpd. Each jail has
its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated IP:
127.0.1.1 - 127.0.5.1 (if this matter, I believe not).

The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 via
whihc the jails communicate with the network.

I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
very, very sparse with such informations, coverging to null - I can't see
anything suiatble in the logs why NTPD dies almost every night leaving the
system with a wild reset of time. Sometimes it is a gain of 6 hours, sometimes
it is only half an hour. I leave the box at 16:00 local time usually and take
care again at ~ 7 o'clock in the morning local time.

When the clock is floating that wild, in all cases ntpd isn't running any more.
I try to restart with options -g and -G to adjust the time quickly at the
beginning, which works fine.

Apart from possible misconfigurations of the jails (I'm quite new to jails and
their pitfalls), I was wondering what causes ntpd to die. i can't determine
exactly the time of its death, so it might be related to diurnal/periodic
processes (I use only the most vanilla configurations on periodic, except for
checking ZFS's scrubbing enabled).

I'ven't had the chance to check whether the hardware is completely all right,
but from a superficial point of view there is no issue with high gain of the
internal clock or other hardware issues.

If there are known issues with jails (the problem occurs since I use those),
advice is appreciated.

Thanks in advance,

O. Hartmann
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"