Bug#991334: monit: Race condition on reboot timestamp checks

2021-07-29 Thread Fabio A. De Muzio Tobich
First thanks for reporting this, Guillem.

I was waiting for an upstream response to the pull request and, as Wookey
pointed out, was rejected.

Let me quote upstream reply here for the sake of information:

"Thanks for pointing the problem out and for the patch, i think the solution
is however shaky, as the time between the syscalls may vary and the hardcoded
epsilon may be insufficient. The patch also changes the behaviour on all
platforms, not just linux and may break cases where two reboots will occur
within consecutive seconds (not very likely, but possible in the future).
We need to fix the root cause (race condition in _getStartTime on linux)."

So, I will follow upstream here and I will not accept the patch, but I will
keep the bug open until a more appropriate solution to the problem is found.

Also, as pointed out by Wookey, this bug should is not serious, the correct
severity for it is normal, so I changed the severity to normal.

Cheers.

-- 
⢀⣴⠾⠻⢶⣦
⣾⠁⢠⠒⠀⣿⡁   Fabio Augusto De Muzio Tobich
⢿⡄⠘⠷⠚⠋⠀   9730 4066 E5AE FAC2 2683  D03D 4FB3 B4D3 7EF6 3B2E
⠈⠳⣄


signature.asc
Description: PGP signature


Bug#991334:

2021-07-29 Thread Rowan Wookey
Severity: normal

This is not a serious bug and is currently slated to remove monit from testing. 
Please check for severities here 
https://www.debian.org/Bugs/Developer#severities

It's been rejected upstream I'll let the maintainers here comment on if it 
should be accepted or rejected here.



Bug#991334: monit: Race condition on reboot timestamp checks

2021-07-20 Thread Guillem Jover
Package: monit
Version: 1:5.27.2-1
Severity: serious
Forwarded: 
https://bitbucket.org/tildeslash/monit/pull-requests/110/use-an-epsilon-when-doing-the-reboot-boot

Hi!

On Linux the current method to retrieve the boot timestamp is racy,
which means that the reboot checks (the daemon start delay), and the
state machinery can be affected. The former is an annoyance as monit
will not respond to commands during the set amount of time. But the
latter means that services set to «onreboot nostart» and managed f.ex.
by a HA system will lose their state on «monit restart», which can be
rather bad.

I notice that Hurd patch modifies the Linux _getStartTime()
implementation making it need way more syscalls, which can exacerbate
this problem, and IMO should be either reverted for bullseye, or
modified so that the Hurd has its own sysdep file with that change.

I'm attaching the patch I've submitted upstream, which fixes the
problem for us.

To reproduce I added a Log_debug() entry to see the exact timestamps,
but this can be easily seen anyway by adding a start delay and
checking whether the delay gets skipped or taken into account after
each «service monit restart».

Thanks,
Guillem
From dc50d6cca9a69922f75f98fe73d95bb2f1067cfa Mon Sep 17 00:00:00 2001
From: Guillem Jover 
Date: Tue, 20 Jul 2021 20:40:54 +0200
Subject: [PATCH] Use an epsilon when doing the reboot boot timestamp
 comparisons

At least on Linux, the current method to retrieve the boot timestamp is
racy, as we first fetch the system uptime (seconds since the epoch),
then subtract that from the current time. But if between these two
syscalls a new or more seconds elapse, then the result can change
depending on the invocation compared to the stored state. This is
trivial to trigger at least on a VirtualBox machine.

To avoid this problem we refactor the reboot checks to use
State_reboot(), and change it to use a small epsilon value, either
positive or negative, to account for a possible mismatch in the
current timestamp or the state one.
---
 src/state.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/state.c b/src/state.c
index 05c3b64e..986202b7 100644
--- a/src/state.c
+++ b/src/state.c
@@ -243,7 +243,7 @@ static void _updateStart(Service_T S, int nstart, int ncycle) {
 
 
 static void _updateMonitor(Service_T S, Monitor_State monitor) {
-if (systeminfo.booted == booted || S->onreboot == Onreboot_Laststate) {
+if (!State_reboot() || S->onreboot == Onreboot_Laststate) {
 // Monit reload or restart within the same boot session OR persistent state => restore the monitoring state
 if (monitor == Monitor_Not)
 S->monitor = Monitor_Not;
@@ -664,6 +664,9 @@ void State_restore() {
 
 
 bool State_reboot() {
-return systeminfo.booted == booted ? false : true;
+/* We need to use a small epsilon when comparing the boot timestamps,
+ * as at least on Linux its gathering is racy and can diverge from the
+ * real boot timestamp. */
+return abs(systeminfo.booted - booted) < 2 ? false : true;
 }
 
-- 
2.32.0