On 07/03/2012 07:10 PM, myron wrote:
On Jul 3, 2012, at 1:00 PM, Bill Cole wrote:

On 3 Jul 2012, at 12:07, myron wrote:

On Jul 3, 2012, at 11:52 AM, Ralf Hildebrandt wrote:

* myron <kowal...@cs.moravian.edu>:
This morning I came in to find no new mail in my inbox. I ended up
reboot the server. These were the log entries that seem to indicate
when and the
problem was. I checked the time on the server and it was not off at
all. Can someone suggest from these entries what the problem was?

Jul  2 20:52:11 errol dovecot: dovecot: Fatal: Time just moved
backwards by 9 seconds. This might cause a lot of problems, so I'll
just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards

Dovevot killed itself because the time moved backwards by 9 seconds.
BTW: why? Use NTP!

FWIW, using NTP (badly) can be the proximate cause of such a backwards jump. It's probably the most common one behind a human manually changing the system time. An unregulated clock chip jumping backwards is definitionally defective.

I am (was) using ntp. I don't know why the time moved backwards. It wasn't when I checked the time before rebooting.

root@errol:/var/log# ps -ef | grep ntpd
ntp 1049 1 0 08:25 ? 00:00:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 107:116
root      4457  1996  0 12:03 pts/0    00:00:00 grep --color=auto ntpd

running ntpd != functional ntpd.

It used to be that firewalls and trivial configurations were the most common causes of NTP running without actually keeping the system clock stable and accurate. These days it is more common for virtualization to be the cause of trouble which can be highly opaque. Subjectively, I have also been convinced that server RTC chips have had a huge drop in accuracy and stability in the past decade, causing the drift assumptions in traditionally good-enough ntpd configurations to break down. It also doesn't help that apparently a lot of software can't handle leap seconds, and that caused lots of trouble at 00:00 GMT 7/1.

A deep discussion of solid time synch would be off-topic, but as you can see it is absolutely critical when using Dovecot SASL to either solve the time synch problem or catch the well-documented failure mode of Dovecot shooting itself in the head to avoid doing bad things.

It definitely reset more than a few seconds. It looks like I lost internet connection down the line with the ntp servers.


That has absolutely no impact on the accuracy or continuity of the *local ntp server*.

Keep in mind that ntpd contains thousands of lines of code to ensure that the time keeps running ACCURATELY, even in prolonged absence of higher-stratum peers.


syslog.1:Jul  2 20:52:11 errol ntpd[24983]: time reset -10.407874 s
syslog.1:Jul 2 20:52:11 errol ntpd[24983]: kernel time sync status change 6001 syslog.1:Jul 2 20:57:56 errol ntpd[24983]: synchronized to 91.189.94.4, stratum 2 syslog.1:Jul 2 20:57:56 errol ntpd[24983]: kernel time sync status change 2001
syslog.1:Jul  2 21:07:31 errol ntpd[24983]: no servers reachable
syslog.1:Jul 2 21:14:59 errol ntpd[24983]: synchronized to 91.189.94.4, stratum 2
syslog.1:Jul  2 21:16:13 errol ntpd[24983]: time reset +10.403478 s
syslog.1:Jul 2 21:26:33 errol ntpd[24983]: synchronized to 91.189.94.4, stratum 2


Instead, I suspect this is one more symptom of the leap second bug that keeps cropping up this week.

Consult The Interwebs for your version of ntpd and see if others report the same.


--
J.

Reply via email to