Hello Antonio

On 01.07.2012 00:47, Antonio M. Moreiras wrote:
I had two odd events today, just after midnight (utc).

My nagios monitoring went crazy (all servers really ok, but alarming on
nagios). I think some regular expression in the ntp monitoring plugin
didn't like some new response triggered at entering the day. Probably it
will be OK again after todays's midnight.

I also monitor ntpd with Nagios with check_ntp_peer and check_ntp_time. I mostly monitor servers (which have the leap second file from NIST) and only one client (Mac OS X 10.6), which does not have the leap second file. Nagios triggered an error just after the leap second (local time is in CEST aka UTC+2):

[01-07-2012 02:01:49] flashback CRITICAL NTP time NTP CRITICAL: Offset 0.99893713 secs [01-07-2012 02:08:19] flashback CRITICAL NTP peer NTP CRITICAL: Offset -1.001178 secs, jitter=816.351000, stratum=1 [01-07-2012 02:16:49] flashback OK NTP time NTP OK: Offset 0.000156879425 secs [01-07-2012 02:18:19] flashback OK NTP peer NTP OK: Offset -0.000113 secs, jitter=0.073000, stratum=1

This is also one of the systems I had setup my script to report time during leap second (using unix date, as ntptime was not available). It was the only system which did not do the leap second, so it went off this 1 second for a short time, which Nagios reported.

Output during leap second from:
for i in `/opt/local/bin/gseq 0 240`; do date ; sleep 0.5 ; done

Sun Jul  1 01:59:57 CEST 2012
Sun Jul  1 01:59:57 CEST 2012
Sun Jul  1 01:59:58 CEST 2012
Sun Jul  1 01:59:58 CEST 2012
Sun Jul  1 01:59:59 CEST 2012
Sun Jul  1 01:59:59 CEST 2012
Sun Jul  1 02:00:00 CEST 2012
Sun Jul  1 02:00:00 CEST 2012
Sun Jul  1 02:00:01 CEST 2012
Sun Jul  1 02:00:01 CEST 2012
Sun Jul  1 02:00:02 CEST 2012
Sun Jul  1 02:00:02 CEST 2012


Output during leap second on a FreeBSD 7.4 system from (on one line):
for i in `/usr/local/bin/gseq 0 240`; do ntptime | awk '/time d3/ {print $3" "$4" "$5" "$6" "$7}' | sed 's/,$//' ; sleep 0.5 ; done

Sun, Jul 1 2012 1:59:57.030
Sun, Jul 1 2012 1:59:57.540
Sun, Jul 1 2012 1:59:58.046
Sun, Jul 1 2012 1:59:58.556
Sun, Jul 1 2012 1:59:59.065
Sun, Jul 1 2012 1:59:59.578
Sun, Jul 1 2012 1:59:59.085
Sun, Jul 1 2012 1:59:59.594
Sun, Jul 1 2012 2:00:00.105
Sun, Jul 1 2012 2:00:00.615
Sun, Jul 1 2012 2:00:01.135
Sun, Jul 1 2012 2:00:01.641
Sun, Jul 1 2012 2:00:02.151
Sun, Jul 1 2012 2:00:02.907

The similar output was on a recent Gentoo system. Even a second script running on this systems with unix date did report the :59th second twice (four times in the output) and not reporting the leap second with :60.

Two and half hours later, one of my stratum one servers just crashed:
version="ntpd [email protected] Mon Jan  9 15:21:05 UTC 2012 (1)",
processor="x86_64", system="Linux/2.6.33.7-rt29-pps536-ts22". I didn't
believe in coincidences, and think that it could be related to the
treatment of the leap second in the linux kernel.

I also have one 'older' Linux System in my network, which crashed one hour after the leap second. Seems to be this known Linux Kernel bug (prior to 2.6.29) which is mention in this Slashdot posting [1].

[1] http://it.slashdot.org/story/12/06/30/2123248/the-leap-second-is-here-are-your-systems-ready


bye
Fabian
_______________________________________________
pool mailing list
[email protected]
http://lists.ntp.org/listinfo/pool

Reply via email to