On 07/03/2012 07:10 PM, myron wrote:
On Jul 3, 2012, at 1:00 PM, Bill Cole wrote:
On 3 Jul 2012, at 12:07, myron wrote:
On Jul 3, 2012, at 11:52 AM, Ralf Hildebrandt wrote:
* myron <kowal...@cs.moravian.edu>:
This morning I came in to find no new mail in my inbox. I ended up
reboot the server. These were the log entries that seem to indicate
when and the
problem was. I checked the time on the server and it was not off at
all. Can someone suggest from these entries what the problem was?
Jul 2 20:52:11 errol dovecot: dovecot: Fatal: Time just moved
backwards by 9 seconds. This might cause a lot of problems, so I'll
just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards
Dovevot killed itself because the time moved backwards by 9 seconds.
BTW: why? Use NTP!
FWIW, using NTP (badly) can be the proximate cause of such a
backwards jump. It's probably the most common one behind a human
manually changing the system time. An unregulated clock chip jumping
backwards is definitionally defective.
I am (was) using ntp. I don't know why the time moved backwards. It
wasn't when I checked the time before rebooting.
root@errol:/var/log# ps -ef | grep ntpd
ntp 1049 1 0 08:25 ? 00:00:00 /usr/sbin/ntpd -p
/var/run/ntpd.pid -g -u 107:116
root 4457 1996 0 12:03 pts/0 00:00:00 grep --color=auto ntpd
running ntpd != functional ntpd.
It used to be that firewalls and trivial configurations were the most
common causes of NTP running without actually keeping the system
clock stable and accurate. These days it is more common for
virtualization to be the cause of trouble which can be highly opaque.
Subjectively, I have also been convinced that server RTC chips have
had a huge drop in accuracy and stability in the past decade, causing
the drift assumptions in traditionally good-enough ntpd
configurations to break down. It also doesn't help that apparently a
lot of software can't handle leap seconds, and that caused lots of
trouble at 00:00 GMT 7/1.
A deep discussion of solid time synch would be off-topic, but as you
can see it is absolutely critical when using Dovecot SASL to either
solve the time synch problem or catch the well-documented failure
mode of Dovecot shooting itself in the head to avoid doing bad things.
It definitely reset more than a few seconds. It looks like I lost
internet connection down the line with the ntp servers.
That has absolutely no impact on the accuracy or continuity of the
*local ntp server*.
Keep in mind that ntpd contains thousands of lines of code to ensure
that the time keeps running ACCURATELY, even in prolonged absence of
higher-stratum peers.
syslog.1:Jul 2 20:52:11 errol ntpd[24983]: time reset -10.407874 s
syslog.1:Jul 2 20:52:11 errol ntpd[24983]: kernel time sync status
change 6001
syslog.1:Jul 2 20:57:56 errol ntpd[24983]: synchronized to
91.189.94.4, stratum 2
syslog.1:Jul 2 20:57:56 errol ntpd[24983]: kernel time sync status
change 2001
syslog.1:Jul 2 21:07:31 errol ntpd[24983]: no servers reachable
syslog.1:Jul 2 21:14:59 errol ntpd[24983]: synchronized to
91.189.94.4, stratum 2
syslog.1:Jul 2 21:16:13 errol ntpd[24983]: time reset +10.403478 s
syslog.1:Jul 2 21:26:33 errol ntpd[24983]: synchronized to
91.189.94.4, stratum 2
Instead, I suspect this is one more symptom of the leap second bug that
keeps cropping up this week.
Consult The Interwebs for your version of ntpd and see if others report
the same.
--
J.