Package: monit
Version: 1:5.27.2-1
Severity: serious
Forwarded: 
https://bitbucket.org/tildeslash/monit/pull-requests/110/use-an-epsilon-when-doing-the-reboot-boot

Hi!

On Linux the current method to retrieve the boot timestamp is racy,
which means that the reboot checks (the daemon start delay), and the
state machinery can be affected. The former is an annoyance as monit
will not respond to commands during the set amount of time. But the
latter means that services set to «onreboot nostart» and managed f.ex.
by a HA system will lose their state on «monit restart», which can be
rather bad.

I notice that Hurd patch modifies the Linux _getStartTime()
implementation making it need way more syscalls, which can exacerbate
this problem, and IMO should be either reverted for bullseye, or
modified so that the Hurd has its own sysdep file with that change.

I'm attaching the patch I've submitted upstream, which fixes the
problem for us.

To reproduce I added a Log_debug() entry to see the exact timestamps,
but this can be easily seen anyway by adding a start delay and
checking whether the delay gets skipped or taken into account after
each «service monit restart».

Thanks,
Guillem
From dc50d6cca9a69922f75f98fe73d95bb2f1067cfa Mon Sep 17 00:00:00 2001
From: Guillem Jover <gjo...@sipwise.com>
Date: Tue, 20 Jul 2021 20:40:54 +0200
Subject: [PATCH] Use an epsilon when doing the reboot boot timestamp
 comparisons

At least on Linux, the current method to retrieve the boot timestamp is
racy, as we first fetch the system uptime (seconds since the epoch),
then subtract that from the current time. But if between these two
syscalls a new or more seconds elapse, then the result can change
depending on the invocation compared to the stored state. This is
trivial to trigger at least on a VirtualBox machine.

To avoid this problem we refactor the reboot checks to use
State_reboot(), and change it to use a small epsilon value, either
positive or negative, to account for a possible mismatch in the
current timestamp or the state one.
---
 src/state.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/state.c b/src/state.c
index 05c3b64e..986202b7 100644
--- a/src/state.c
+++ b/src/state.c
@@ -243,7 +243,7 @@ static void _updateStart(Service_T S, int nstart, int ncycle) {
 
 
 static void _updateMonitor(Service_T S, Monitor_State monitor) {
-        if (systeminfo.booted == booted || S->onreboot == Onreboot_Laststate) {
+        if (!State_reboot() || S->onreboot == Onreboot_Laststate) {
                 // Monit reload or restart within the same boot session OR persistent state => restore the monitoring state
                 if (monitor == Monitor_Not)
                         S->monitor = Monitor_Not;
@@ -664,6 +664,9 @@ void State_restore() {
 
 
 bool State_reboot() {
-        return systeminfo.booted == booted ? false : true;
+        /* We need to use a small epsilon when comparing the boot timestamps,
+         * as at least on Linux its gathering is racy and can diverge from the
+         * real boot timestamp. */
+        return abs(systeminfo.booted - booted) < 2 ? false : true;
 }
 
-- 
2.32.0

  • Bug#991334: monit: Race condition on reboot timestamp checks Guillem Jover

Reply via email to