Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-09-01 Thread Larry Rosenman via dovecot
I've added the patch to the FreeBSD port pending an official release.

On Sun, Sep 1, 2024 at 11:36 AM Larry Rosenman  wrote:

> that sounds promising Michael.  @Timo Sirainen   any chance
> of a release with this included?
>
> On Sun, Sep 1, 2024 at 9:32 AM Michael Grimm via dovecot <
> dovecot@dovecot.org> wrote:
>
>> Timo Sirainen via dovecot  wrote:
>> >
>> > On 30. Aug 2024, at 19.00, dco2024--- via dovecot 
>> wrote:
>>
>> >> This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel
>> is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is
>> logged once every 12-15 hours.
>> >>
>> >> Syslog:
>> >> 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved
>> forwards by 0.100068 seconds - adjusting timeouts.
>> >> 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved
>> forwards by 0.100063 seconds - adjusting timeouts.
>> >> [snip]
>> >> Chrony ntp keeps the time in sync and the time has been in sync to
>> within 30us of UTC for many days. I noticed that it reports that the
>> unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for
>> dovecot's 12 hour warning interval:
>> >>  12 hours * 3600 secs/hour * 2.3/100 = 0.0998 seconds.
>> >> Could it be that dovecot is effectively measuring intervals of the
>> uncorrected system clock time instead of the longer term adjusted time, and
>> it complains when the accumulated NTP adjustments sum to 0.1 seconds.
>> >
>> > I don't see how that would be possible. The check is using only just
>> generated timestamps, not anything from a long time ago.
>> >
>> > I wonder if this kind of a simple patch would be good enough of a fix:
>> >
>> > [snip]
>>
>> I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.
>>
>> Now, after 24 hours, dovecot doesn't complain about "Time moved forwards"
>> any longer.
>>
>> Before, I had had between 10 and 250 complaints every day.
>>
>> HTH and regards,
>> Michael
>>
>>
>>
>> ___
>> dovecot mailing list -- dovecot@dovecot.org
>> To unsubscribe send an email to dovecot-le...@dovecot.org
>>
>
>
> --
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
> US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
>


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-09-01 Thread Larry Rosenman via dovecot
that sounds promising Michael.  @Timo Sirainen   any chance of
a release with this included?

On Sun, Sep 1, 2024 at 9:32 AM Michael Grimm via dovecot <
dovecot@dovecot.org> wrote:

> Timo Sirainen via dovecot  wrote:
> >
> > On 30. Aug 2024, at 19.00, dco2024--- via dovecot 
> wrote:
>
> >> This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel
> is 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is
> logged once every 12-15 hours.
> >>
> >> Syslog:
> >> 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved
> forwards by 0.100068 seconds - adjusting timeouts.
> >> 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved
> forwards by 0.100063 seconds - adjusting timeouts.
> >> [snip]
> >> Chrony ntp keeps the time in sync and the time has been in sync to
> within 30us of UTC for many days. I noticed that it reports that the
> unadjusted system clock is about 2.31 ppm fast of UTC. Doing the math for
> dovecot's 12 hour warning interval:
> >>  12 hours * 3600 secs/hour * 2.3/100 = 0.0998 seconds.
> >> Could it be that dovecot is effectively measuring intervals of the
> uncorrected system clock time instead of the longer term adjusted time, and
> it complains when the accumulated NTP adjustments sum to 0.1 seconds.
> >
> > I don't see how that would be possible. The check is using only just
> generated timestamps, not anything from a long time ago.
> >
> > I wonder if this kind of a simple patch would be good enough of a fix:
> >
> > [snip]
>
> I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.
>
> Now, after 24 hours, dovecot doesn't complain about "Time moved forwards"
> any longer.
>
> Before, I had had between 10 and 250 complaints every day.
>
> HTH and regards,
> Michael
>
>
>
> ___
> dovecot mailing list -- dovecot@dovecot.org
> To unsubscribe send an email to dovecot-le...@dovecot.org
>


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-09-01 Thread Michael Grimm via dovecot
Timo Sirainen via dovecot  wrote:
> 
> On 30. Aug 2024, at 19.00, dco2024--- via dovecot  wrote:

>> This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 
>> 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged 
>> once every 12-15 hours. 
>> 
>> Syslog:
>> 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards 
>> by 0.100068 seconds - adjusting timeouts.
>> 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards 
>> by 0.100063 seconds - adjusting timeouts.
>> [snip]
>> Chrony ntp keeps the time in sync and the time has been in sync to within 
>> 30us of UTC for many days. I noticed that it reports that the unadjusted 
>> system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 
>> hour warning interval:
>>  12 hours * 3600 secs/hour * 2.3/100 = 0.0998 seconds.
>> Could it be that dovecot is effectively measuring intervals of the 
>> uncorrected system clock time instead of the longer term adjusted time, and 
>> it complains when the accumulated NTP adjustments sum to 0.1 seconds.
> 
> I don't see how that would be possible. The check is using only just 
> generated timestamps, not anything from a long time ago.
> 
> I wonder if this kind of a simple patch would be good enough of a fix:
> 
> [snip]

I did apply your patch to dovecot-2.3.21.1 on 14.1-STABLE FreeBSD.

Now, after 24 hours, dovecot doesn't complain about "Time moved forwards" any 
longer.

Before, I had had between 10 and 250 complaints every day.

HTH and regards,
Michael



___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-30 Thread Timo Sirainen via dovecot
On 30. Aug 2024, at 19.00, dco2024--- via dovecot  wrote:
> 
> This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 
> 6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged 
> once every 12-15 hours. 
> 
> Syslog:
> 2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100068 seconds - adjusting timeouts.
> 2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100063 seconds - adjusting timeouts.
> 2024-08-26 06:52:16 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100062 seconds - adjusting timeouts.
> 2024-08-26 18:57:54 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100068 seconds - adjusting timeouts.
> 2024-08-27 07:24:34 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100061 seconds - adjusting timeouts.
> 2024-08-27 19:38:48 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100060 seconds - adjusting timeouts.
> 2024-08-28 20:21:44 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100071 seconds - adjusting timeouts.
> 2024-08-29 08:41:44 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100070 seconds - adjusting timeouts.
> 2024-08-29 21:04:37 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100071 seconds - adjusting timeouts.
> 2024-08-30 09:30:36 UTC myhost dovecot: master: Warning: Time moved forwards 
> by 0.100066 seconds - adjusting timeouts.
> 
> Chrony ntp keeps the time in sync and the time has been in sync to within 
> 30us of UTC for many days. I noticed that it reports that the unadjusted 
> system clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 
> hour warning interval:
>   12 hours * 3600 secs/hour * 2.3/100 = 0.0998 seconds.
> Could it be that dovecot is effectively measuring intervals of the 
> uncorrected system clock time instead of the longer term adjusted time, and 
> it complains when the accumulated NTP adjustments sum to 0.1 seconds.

I don't see how that would be possible. The check is using only just generated 
timestamps, not anything from a long time ago.

I wonder if this kind of a simple patch would be good enough of a fix:

diff --git a/src/lib/ioloop.c b/src/lib/ioloop.c
index 98c2dc2bf4..a63f861330 100644
--- a/src/lib/ioloop.c
+++ b/src/lib/ioloop.c
@@ -18,6 +18,7 @@
Dovecot generally doesn't have very important short timeouts, so to avoid
logging many warnings about this, use a rather high value. */
 #define IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS (10)
+#define IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE (100)
 
 time_t ioloop_time = 0;
 struct timeval ioloop_timeval;
@@ -654,9 +655,13 @@ static void io_loop_handle_timeouts_real(struct ioloop 
*ioloop)
/* the callback may have slept, so check the time again. */
i_gettimeofday(&ioloop_timeval);
} else {
+   int max_diff = diff_usecs < 
IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE ?
+   IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS :
+   IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS_LARGE;
+
diff_usecs = timeval_diff_usecs(&ioloop->next_max_time,
&ioloop_timeval);
-   if (unlikely(-diff_usecs >= 
IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS)) {
+   if (unlikely(-diff_usecs >= max_diff)) {
io_loops_timeouts_update(-diff_usecs);
/* time moved forward */
ioloop->time_moved_callback(&ioloop->next_max_time,


___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-30 Thread dco2024--- via dovecot
This is not limited to FreeBSD. I'm seeing it on Gentoo Linux. Kernel is 
6.6.47-gentoo-x86_64, dovecot 2.3.21.1 (d492236fa0). The warning is logged once 
every 12-15 hours. 

Syslog:
2024-08-24 18:03:49 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100068 seconds - adjusting timeouts.
2024-08-25 06:18:49 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100063 seconds - adjusting timeouts.
2024-08-26 06:52:16 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100062 seconds - adjusting timeouts.
2024-08-26 18:57:54 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100068 seconds - adjusting timeouts.
2024-08-27 07:24:34 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100061 seconds - adjusting timeouts.
2024-08-27 19:38:48 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100060 seconds - adjusting timeouts.
2024-08-28 20:21:44 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100071 seconds - adjusting timeouts.
2024-08-29 08:41:44 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100070 seconds - adjusting timeouts.
2024-08-29 21:04:37 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100071 seconds - adjusting timeouts.
2024-08-30 09:30:36 UTC myhost dovecot: master: Warning: Time moved forwards by 
0.100066 seconds - adjusting timeouts.

Chrony ntp keeps the time in sync and the time has been in sync to within 30us 
of UTC for many days. I noticed that it reports that the unadjusted system 
clock is about 2.31 ppm fast of UTC. Doing the math for dovecot's 12 hour 
warning interval:
   12 hours * 3600 secs/hour * 2.3/100 = 0.0998 seconds.
Could it be that dovecot is effectively measuring intervals of the uncorrected 
system clock time instead of the longer term adjusted time, and it complains 
when the accumulated NTP adjustments sum to 0.1 seconds.
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-26 Thread Timo Sirainen via dovecot
On 24. Aug 2024, at 5.06, Jochen Bern via dovecot  wrote:
> 
> On 21.08.24 11:35, Timo Sirainen wrote:
>>> [Lots and lots of "but my NTP sync is much more precise than that" in
>>> the FreeBSD thread]
>> The way Dovecot works is:
>>  - It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
>>  - Then it calls kqueue() to wait for I/O for max 5 milliseconds
>>  - Then it notices that it actually returned more than 105 milliseconds
>>later, and then logs a warning about it.
> 
> I think that more information is needed to pinpoint possible causes, and one 
> of the open questions is: What clock does dovecot look at to determine how 
> long it *actually* stayed dormant? On Linux, software that has need of a 
> monotonously increasing "time" to derive guaranteed unique IDs from often 
> looks at the kernel uptime - which is essentially a count of ticks since 
> bootup, and *not* being corrected by NTP.

Dovecot is doing simply gettimeofday() calls before and after epoll/kqueue/etc. 
It would be possible to use e.g. clock_gettime(CLOCK_MONOTONIC) to see whether 
there really was a time change, but this seems a bit excessive since Dovecot 
needs the real time in any case, so the current checks are "free", while doing 
calls to get monotonic time would only be useful to handle issues with time 
changes.

Another possibility would be to start using timerfd API when it's supported. 
Looks like it exists also in FreeBSD. This might be a good idea, although some 
parts of Dovecot can create/update a lot of timeouts, so I wonder how efficient 
it is to have syscalls updating the timers all the time. But I guess it would 
be fine.

> Similarly, it should be determined whether the timeouts of I/O function 
> called (i.e., kqueue()) are or aren't influenced by NTP's corrections to 
> system time.

I doubt clock changes affect those calls, since they ask to wait for N 
microseconds, not until a specific time.

>> Also, this is kind of a problem when it does happen. Since Dovecot
>> thinks the time moved e.g. 100ms forward, it adjusts all timeouts to
>> happen 100ms backwards. If this wasn't a true time jump, then these
>> timeouts now happen 100ms earlier.
> 
> That is, of course, a dangerous approach if you do *not* have a guarantee 
> that the timeouts of the I/O function called are *otherwise* true to the 
> requested duration. But shouldn't those other concurrently-running timeouts 
> notice an actual discontinuity of the timescale just the same as the first 
> one did? Maybe some sort of "N 'nay's needed for a vote of nonconfidence" 
> mechanism would be safer ...

There's only one timeout concurrently running per process. In theory the 
processes could talk to each other to find out whether there is such a time 
jump in more processes, but that would be very complicated.
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-24 Thread Harlan Stenn via dovecot



On 8/24/2024 12:13 PM, Jochen Bern via dovecot wrote:

On 24.08.24 05:04, Harlan Stenn wrote:

On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:

(As an example for why this is relevant: Several hundred deviations of
100 ms or more per day sum up to several 10+ seconds per day, if only
they all are in the same direction, or several 115+ ppm.


Forward step or slew adjustments should be no problem.


Well, define "problem". The OP has a problem with the many log messages 
that *say* that the timescale slipped 100+ ms into the future (whether 
those actually happen to the system's *clock*, and if so, are triggered 
by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo 
said that dovecot should then also try to counteract the offset for 
other still-running timeouts, which sounds like a problem waiting to 
happen *there* to me ...


I have no info/opinion about this specific situation.

I will say that I prefer:

- learning about problems "sooner" rather than "later"
- identifying and fixing problems at the "right" place
- - not too soon (often causes over-reach)
- - not too late (more expensive and leaves places where
there is still a problem)


ntpd refuses to do *slews* correcting by more than 500 ppm;


This is news to me.


H. I checked that and a couple ntpd manpages, while the more current 
versions blame the 500 ppm limit on Unix kernels, the older ones say that


You're talking about frequency adjustments, not phase adjustments. 
Stepping and slewing address phase issues.



The maximum slew rate possible is limited to 500 parts-per-million (PPM)
as a consequence of the correctness principles on which the NTP protocol
and algorithm design are based.


I remember that *some* limit - not necessarily 500 ppm, but that was the 
value chosen - was a *necessary* requirement for the proof, and that 
ntpd used to be a stickler for protocol and limits as written. Did the 
latter change ... ?


No.  But "choose your poison".

It takes over 30 minutes to slew a 1s correction.  By default, that 
correction will be applied at 500ppm.


(Anyway, I once had to sysadmin hardware that would flip-flop between 
IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly 
sure that that was beyond the back-then ntpd's capabilities to adjust 
for. Its offset *kept growing* when they asked me to "fix its clock 
problem".)


That means your system clock was running at the wrong rate, and the boot 
code did a poor job of understanding the base clock frequency.


Dave Mills chose 500ppm as the limit for reasons including:

- a bound is required to make sure the algorithms will converge
  to correct time within a usefully bounded period of time
- by observation, the worst "useful" clocks kept time to within
  200ppm.  Since two "worst useful" clocks could have one running at
  250ppm fast and the other at 250ppm slow, the net result is a 500ppm
  range for the correction.

Many years' ago I wrote the 'calc_tickadj' program, which lives in 
ntp/scripts/calc_tickadj/ .  It is a major piece of what you want to do. 
 It will tell you how much your "tick" needs to be adjusted to get your 
system clock running the best it can.


Please note:

- that script aims to produce a tick value requiring the smallest 
possible *positive* ongoing slew adjustment.  If an ongoing negative 
slew adjustment is required and that is not given, the system clock will 
be running faster than "correct" time and any step correction to fix 
that will break ACID.


- the script was written before tickless kernels showed up.  I haven't 
looked at this case, and I would hope/expect there is a way to address 
this for a tickless kernel, too.


- if the system time is still horrible (for example, a VM that gets 
"stunned" a lot, or any other case where the flow of time is more 
random) this approach will not help much.



If what we offer does not satisfy your requirements, please let me know
and we'll find a way to improve things.


(Just to be perfectly clear here: No complaints from *me*. I'm firmly in 
the "hardware outside the ±100 ppm corridor is defective" camp.)


Kind regards,


___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-24 Thread Jochen Bern via dovecot

On 24.08.24 05:04, Harlan Stenn wrote:

On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:

(As an example for why this is relevant: Several hundred deviations of
100 ms or more per day sum up to several 10+ seconds per day, if only
they all are in the same direction, or several 115+ ppm.


Forward step or slew adjustments should be no problem.


Well, define "problem". The OP has a problem with the many log messages 
that *say* that the timescale slipped 100+ ms into the future (whether 
those actually happen to the system's *clock*, and if so, are triggered 
by the system's NTP sync, is unlikely-but-still-unclear IMHO), and Timo 
said that dovecot should then also try to counteract the offset for 
other still-running timeouts, which sounds like a problem waiting to 
happen *there* to me ...



ntpd refuses to do *slews* correcting by more than 500 ppm;


This is news to me.


H. I checked that and a couple ntpd manpages, while the more current 
versions blame the 500 ppm limit on Unix kernels, the older ones say that



The maximum slew rate possible is limited to 500 parts-per-million (PPM)
as a consequence of the correctness principles on which the NTP protocol
and algorithm design are based.


I remember that *some* limit - not necessarily 500 ppm, but that was the 
value chosen - was a *necessary* requirement for the proof, and that 
ntpd used to be a stickler for protocol and limits as written. Did the 
latter change ... ?


(Anyway, I once had to sysadmin hardware that would flip-flop between 
IIRC -450 and +550 ppm from one bootup to the next, so I *am* perfectly 
sure that that was beyond the back-then ntpd's capabilities to adjust 
for. Its offset *kept growing* when they asked me to "fix its clock 
problem".)



If what we offer does not satisfy your requirements, please let me know
and we'll find a way to improve things.


(Just to be perfectly clear here: No complaints from *me*. I'm firmly in 
the "hardware outside the ±100 ppm corridor is defective" camp.)


Kind regards,
--
Jochen Bern
Systemingenieur

Binect GmbH
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-23 Thread Harlan Stenn via dovecot
I'm speaking as st...@ntp.org here, but I'm not subscribed to this list 
via that email address.


On 8/23/2024 7:06 PM, Jochen Bern via dovecot wrote:

On 21.08.24 11:35, Timo Sirainen wrote:

[Lots and lots of "but my NTP sync is much more precise than that" in
the FreeBSD thread]


The way Dovecot works is:
  - It finds the next timeout, sees that it happens in e.g. 5 
milliseconds.

  - Then it calls kqueue() to wait for I/O for max 5 milliseconds
  - Then it notices that it actually returned more than 105 milliseconds
    later, and then logs a warning about it.


I think that more information is needed to pinpoint possible causes, and 
one of the open questions is: What clock does dovecot look at to 
determine how long it *actually* stayed dormant? On Linux, software that 
has need of a monotonously increasing "time" to derive guaranteed unique 
IDs from often looks at the kernel uptime - which is essentially a count 
of ticks since bootup, and *not* being corrected by NTP.


Similarly, it should be determined whether the timeouts of I/O function 
called (i.e., kqueue()) are or aren't influenced by NTP's corrections to 
system time.


The third information I'd like to have is what client software provides 
that NTP sync to the machine; ntpd, chronyd, something else?


(As an example for why this is relevant: Several hundred deviations of 
100 ms or more per day sum up to several 10+ seconds per day, if only 
they all are in the same direction, or several 115+ ppm.


Forward step or slew adjustments should be no problem.

Backward adjustments must be slewed, to keep time monotonic.

ntpd refuses to 
do *slews* correcting by more than 500 ppm;


This is news to me.

See 
https://www.ntp.org/documentation/4.2.8-series/ntpd/#command-line-options 
for more information.


See the docs for -g and -x, for example.

Also see https://www.ntp.org/documentation/4.2.8-series/ntp.conf/ and 
the 'panic', 'step', and 'stepback' options.


If what we offer does not satisfy your requirements, please let me know 
and we'll find a way to improve things.


if the OS clock's frequency 
error exceeds that, ntpd would need to do *steps* every now and then, 
and in a default configuration, an ntpd will refuse to do a *second* 
step and *die* instead.



That is not ntpd's default behavior, but it does happen if the -g option 
is present.  I have ideas on how to address this, probably in the 
upcoming ntp-4.4 release.


Again, forward steps should not be a problem for dovecot, and backward 
adjustments can be forced to be slewed.


Or, if the reference clock sways *back and 
forth*, ntpd should very likely complain about its sources' jitter in 
the logs. chronyd, however, is more ruthless in whacking the local clock 
into "sync" with the external sources, and much more inclined to define 
"sync" as "low difference", rather than also taking frequency stability 
into account like ntpd.)


My understanding of what Miroslav told me is that chronyd picks a source 
of time and tracks it as best and quickly as it can, and at some point 
may pick a new source.


Ntpd identifies "correct time" as best it can, from a useful number of 
qualified sources.  It does this *well*, and ntpd will take its time to 
make sure this happens in a stable and predictable way.  Ntpd drives to 
"correct time", which may be in the "middle" of the set of qualified 
targets.



Also, this is kind of a problem when it does happen. Since Dovecot
thinks the time moved e.g. 100ms forward, it adjusts all timeouts to
happen 100ms backwards. If this wasn't a true time jump, then these
timeouts now happen 100ms earlier.


That is, of course, a dangerous approach if you do *not* have a 
guarantee that the timeouts of the I/O function called are *otherwise* 
true to the requested duration. But shouldn't those other concurrently- 
running timeouts notice an actual discontinuity of the timescale just 
the same as the first one did? Maybe some sort of "N 'nay's needed for a 
vote of nonconfidence" mechanism would be safer ...


Important stuff, and Difficult to do with current APIs.


Kind regards,


H
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-23 Thread Jochen Bern via dovecot

On 21.08.24 11:35, Timo Sirainen wrote:

[Lots and lots of "but my NTP sync is much more precise than that" in
the FreeBSD thread]


The way Dovecot works is:
  - It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
  - Then it calls kqueue() to wait for I/O for max 5 milliseconds
  - Then it notices that it actually returned more than 105 milliseconds
later, and then logs a warning about it.


I think that more information is needed to pinpoint possible causes, and 
one of the open questions is: What clock does dovecot look at to 
determine how long it *actually* stayed dormant? On Linux, software that 
has need of a monotonously increasing "time" to derive guaranteed unique 
IDs from often looks at the kernel uptime - which is essentially a count 
of ticks since bootup, and *not* being corrected by NTP.


Similarly, it should be determined whether the timeouts of I/O function 
called (i.e., kqueue()) are or aren't influenced by NTP's corrections to 
system time.


The third information I'd like to have is what client software provides 
that NTP sync to the machine; ntpd, chronyd, something else?


(As an example for why this is relevant: Several hundred deviations of 
100 ms or more per day sum up to several 10+ seconds per day, if only 
they all are in the same direction, or several 115+ ppm. ntpd refuses to 
do *slews* correcting by more than 500 ppm; if the OS clock's frequency 
error exceeds that, ntpd would need to do *steps* every now and then, 
and in a default configuration, an ntpd will refuse to do a *second* 
step and *die* instead. Or, if the reference clock sways *back and 
forth*, ntpd should very likely complain about its sources' jitter in 
the logs. chronyd, however, is more ruthless in whacking the local clock 
into "sync" with the external sources, and much more inclined to define 
"sync" as "low difference", rather than also taking frequency stability 
into account like ntpd.)



Also, this is kind of a problem when it does happen. Since Dovecot
thinks the time moved e.g. 100ms forward, it adjusts all timeouts to
happen 100ms backwards. If this wasn't a true time jump, then these
timeouts now happen 100ms earlier.


That is, of course, a dangerous approach if you do *not* have a 
guarantee that the timeouts of the I/O function called are *otherwise* 
true to the requested duration. But shouldn't those other 
concurrently-running timeouts notice an actual discontinuity of the 
timescale just the same as the first one did? Maybe some sort of "N 
'nay's needed for a vote of nonconfidence" mechanism would be safer ...


Kind regards,
--
Jochen Bern
Systemingenieur

Binect GmbH
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-21 Thread Timo Sirainen via dovecot
On 21. Aug 2024, at 12.35, Timo Sirainen  wrote:
> 
> The way Dovecot works is:
> - It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
> - Then it calls kqueue() to wait for I/O for max 5 milliseconds
> - Then it notices that it actually returned more than 105 milliseconds later, 
> and then logs a warning about it.
> 
> So kqueue() apparently isn't very accurate in its timeout handling.

Actually another guess: Some people were saying it happens mainly on idle 
hours. Maybe kqueue() is accurate with low timeout values, but not accurate on 
high timeout values? So if Dovecot asked kqueue() to wait for <100ms, it would 
be very accurate. But if it asks to wait for 1ms, kqueue() would think it's 
okay to return after 10100ms. If that's the case, this check could be changed 
to allow higher time jumps only on higher timeout waits.

___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: maintainer-feedback requested: [Bug 280929] mail/dovecot move bogus warning "Time moved forwards" to debug

2024-08-21 Thread Timo Sirainen via dovecot
The way Dovecot works is:
 - It finds the next timeout, sees that it happens in e.g. 5 milliseconds.
 - Then it calls kqueue() to wait for I/O for max 5 milliseconds
 - Then it notices that it actually returned more than 105 milliseconds later, 
and then logs a warning about it.

So kqueue() apparently isn't very accurate in its timeout handling.

With some googling I found 
https://lists.freebsd.org/pipermail/freebsd-arch/2012-March/012416.html which 
suggests this could happen at least if kern.hz is set to 20 or less. Could that 
be the case?

I guess we could increase IOLOOP_TIME_MOVED_FORWARDS_MIN_USECS higher than 100 
ms, but that might start hiding problems. Nowadays some people use rather short 
timeouts in e.g. some HTTP requests (auth, push-notifications). It could be 
difficult to understand why 100ms timeout happens only at 200ms without this 
warning message. Although if it happens only rarely, I guess it's not much of a 
problem.

Anyway, would be good to understand first why this happens in FreeBSD before 
growing the warning time.

Also, this is kind of a problem when it does happen. Since Dovecot thinks the 
time moved e.g. 100ms forward, it adjusts all timeouts to happen 100ms 
backwards. If this wasn't a true time jump, then these timeouts now happen 
100ms earlier. So e.g. a HTTP request with <100ms timeout can actually trigger 
an immediate timeout. Hiding the log message makes debugging this also more 
difficult. So I don't think it's a good solution to simply hide it or change it 
to debug level, as it may mask real problems.

> On 19. Aug 2024, at 19.11, Larry Rosenman via dovecot  
> wrote:
> 
> Comments from the dovecot community?
> 
> Aug 19, 2024 11:07:30 AM bugzilla-nore...@freebsd.org:
> 
> 
> Bugzilla Automation  has asked Larry Rosenman
>  for maintainer-feedback:
> Bug 280929: mail/dovecot move bogus warning "Time moved forwards" to debug
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280929
> 
> 
> 
> --- Description ---
> Dovecot complains about time moving forward, which seems to be due to a broken
> mechanism (on FreeBSD) used to measure timeouts. This warning spams the 
> maillog
> up to several hundred times per day.
> 
> There's an ongoing thread about this issue in the freebsd forums:
> https://forums.freebsd.org/threads/dovecot-time-moved-forwards.82886
> 
> In post #33 RypPn points out the offending line in main.c and in post #34
> msplsh suggests instead of completely removing/commenting out the line, it
> would be more sensible to move it from 'warning' to 'debug'.
> This is what this patch does: change the log facility to 'debug' to mute that
> bogus message for standard configurations, but keep it in 'debug' for anyone
> who might want to debug that issue in the future.
> 
> I tested the patch as a local patch in poudriere and it builds fine on
> 13.3-RELEASE with the quarterly and latest branch.
> ___
> dovecot mailing list -- dovecot@dovecot.org
> To unsubscribe send an email to dovecot-le...@dovecot.org

___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org