Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

Thomas Munro Sun, 12 Mar 2023 20:12:23 -0700

On Sat, Mar 11, 2023 at 11:49 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.mu...@gmail.com> writes:
> > I think this is the minimal back-patchable change.  I propose to go
> > ahead and do that, and then to kick the ideas about latch API changes
> > into a new thread for the next commitfest.
>
> OK by me, but then again 4753ef37 wasn't my patch.


I'll wait another day to see if Stephen or anyone else who hasn't hit
Monday yet wants to object.

Here also are those other minor tweaks, for master only.  I see now
that nanosleep() has already been proposed before:

https://www.postgresql.org/message-id/flat/CABQrizfxpBLZT5mZeE0js5oCh1tqEWvcGF3vMRCv5P-RwUY5dQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/4902.1552349020%40sss.pgh.pa.us

There I see the question of whether it should loop on EINTR to keep
waiting the remaining time.  Generally it seems like a job for
something higher level to deal with interruption policy, and of course
all the race condition and portability problems inherent with signals
are fixed by using latches instead, so I don't think there really is a
good answer to that question -- if you loop, you break our programming
rules by wilfully ignoring eg global barriers, but if you don't loop,
it implies you're relying on the interrupt to cause you to do
something and yet you might have missed it if it was delivered just
before the syscall.  At the time of the earlier thread, maybe it was
more acceptable as it could only delay cancel for that backend, but
now it might even delay arbitrary other backends, and neither answer
to that question can fix that in a race-free way.  Also, back then
latches had a SIGUSR1 handler on common systems, but now they don't,
so (racy unreliable) latch responsiveness has decreased since then.
So I think we should just leave the interface as it is, and build
better things and then eventually retire it.  This general topic is
also currently being discussed at:

https://www.postgresql.org/message-id/flat/20230209205929.GA720594%40nathanxps13

I propose to go ahead and make this small improvement anyway because
it'll surely be a while before we delete the last pg_usleep() call,
and it's good to spring-clean old confusing commentary about signals
and portability.

From 03c7bd1a8df9f3bd3b6b00ad86c8ca734b45f9bd Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Sat, 11 Mar 2023 10:42:20 +1300
Subject: [PATCH 1/3] Fix fractional vacuum_cost_delay.

Commit 4753ef37 changed vacuum_delay_point() to use the WaitLatch() API,
to fix the problem that vacuum could keep running for a very long time
after the postmaster died.

Unfortunately, that broke commit caf626b2's support for fractional
vacuum_cost_delay, which shipped in PostgreSQL 12.  WaitLatch() works in
whole milliseconds.

For now, revert the change from commit 4753ef37, but add an explicit
check for postmaster death.  That's an extra system call on systems
other than Linux and FreeBSD, but that overhead doesn't matter much
considering that we willingly went to sleep and woke up again.  (In
later work, we might add higher resolution timeouts to the latch API so
that we could do this in the standard programming pattern, but that
wouldn't be back-patched.)

Back-patch to 14, where commit 4753ef37 arrived.

Reported-by: Melanie Plageman <melanieplage...@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com
---
 src/backend/commands/vacuum.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2e12baf8eb..c54360a6a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -50,6 +50,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2232,11 +2233,18 @@ vacuum_delay_point(void)
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;

-		(void) WaitLatch(MyLatch,
-						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 msec,
-						 WAIT_EVENT_VACUUM_DELAY);
-		ResetLatch(MyLatch);
+		pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
+		pg_usleep(msec * 1000);
+		pgstat_report_wait_end();
+
+		/*
+		 * We don't want to ignore postmaster death during very long vacuums
+		 * with vacuum_cost_delay configured.  We can't use the usual
+		 * WaitLatch() approach here because we want microsecond-based sleep
+		 * durations above.
+		 */
+		if (IsUnderPostmaster && !PostmasterIsAlive())
+			exit(1);

 		VacuumCostBalance = 0;

-- 
2.39.2

From f0c3b6506e040933d2466ee4061f29a9a946a155 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 13 Mar 2023 09:50:06 +1300
Subject: [PATCH 2/3] Update obsolete comment about pg_usleep() accuracy.

There are still some systems that use traditional tick or jiffy-based
sleep timing, but many including Linux (see "man 7 time"), FreeBSD and
macOS are limited only by the accuracy of the available timer hardware
and system load.  Update our comment about that, which previously said
that most Unix systems had that design.  Also mention that Windows is
like the older Unixen in this respect.

Reported-by: Nathan Bossart <nathandboss...@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BogAon8_V223Ldv6taPR2uKH3X_UJ_A7LJAf3-VRARPA%40mail.gmail.com
---
 src/port/pgsleep.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index 8a709cd01d..dcba7eda44 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -26,8 +26,9 @@
  * pg_usleep --- delay the specified number of microseconds.
  *
  * NOTE: although the delay is specified in microseconds, the effective
- * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
- * the requested delay to be rounded up to the next resolution boundary.
+ * resolution is only 1/HZ on systems that use periodic kernel ticks to wake
+ * up.  This may cause sleeps to be rounded up by 1-20 milliseconds on older
+ * Unixen and Windows.
  *
  * On machines where "long" is 32 bits, the maximum delay is ~2000 seconds.
  *
-- 
2.39.2

From 1f0bd35300dd0c669d4bbf1f50857f17ae64b08c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 13 Mar 2023 10:45:14 +1300
Subject: [PATCH 3/3] Use nanosleep() to implement pg_usleep().

The previous coding based on select() required a lot of commentary about
historical portability concerns.  We can just get rid of all that and
use the POSIX nanosleep() function.  The newer clock_nanosleep() variant
might be better, because it can take a CLOCK_MONOTONIC argument to avoid
the chance that ntpd adjustments affect sleep times, but macOS hasn't
caught up with that yet.
---
 src/port/pgsleep.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index dcba7eda44..3f42021596 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -12,9 +12,7 @@
  */
 #include "c.h"
 
-#include <unistd.h>
-#include <sys/select.h>
-#include <sys/time.h>
+#include <time.h>
 
 /*
  * In a Windows backend, we don't use this implementation, but rather
@@ -32,15 +30,11 @@
  *
  * On machines where "long" is 32 bits, the maximum delay is ~2000 seconds.
  *
- * CAUTION: the behavior when a signal arrives during the sleep is platform
- * dependent.  On most Unix-ish platforms, a signal does not terminate the
- * sleep; but on some, it will (the Windows implementation also allows signals
- * to terminate pg_usleep).  And there are platforms where not only does a
- * signal not terminate the sleep, but it actually resets the timeout counter
- * so that the sleep effectively starts over!  It is therefore rather hazardous
- * to use this for long sleeps; a continuing stream of signal events could
- * prevent the sleep from ever terminating.  Better practice for long sleeps
- * is to use WaitLatch() with a timeout.
+ * CAUTION: if interrupted by a signal, this function will return, but its
+ * interface doesn't report that.  It's not a good idea to use this
+ * for long sleeps in the backend, because backends are expected to respond to
+ * interrupts promptly.  Better practice for long sleeps is to use WaitLatch()
+ * with a timeout.
  */
 void
 pg_usleep(long microsec)
@@ -48,11 +42,11 @@ pg_usleep(long microsec)
 	if (microsec > 0)
 	{
 #ifndef WIN32
-		struct timeval delay;
+		struct timespec delay;
 
 		delay.tv_sec = microsec / 1000000L;
-		delay.tv_usec = microsec % 1000000L;
-		(void) select(0, NULL, NULL, NULL, &delay);
+		delay.tv_nsec = (microsec % 1000000L) * 1000;
+		(void) nanosleep(&delay, NULL);
 #else
 		SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
 #endif
-- 
2.39.2

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

Reply via email to