On Sat, Mar 11, 2023 at 11:49 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.mu...@gmail.com> writes: > > I think this is the minimal back-patchable change. I propose to go > > ahead and do that, and then to kick the ideas about latch API changes > > into a new thread for the next commitfest. > > OK by me, but then again 4753ef37 wasn't my patch.
I'll wait another day to see if Stephen or anyone else who hasn't hit Monday yet wants to object. Here also are those other minor tweaks, for master only. I see now that nanosleep() has already been proposed before: https://www.postgresql.org/message-id/flat/CABQrizfxpBLZT5mZeE0js5oCh1tqEWvcGF3vMRCv5P-RwUY5dQ%40mail.gmail.com https://www.postgresql.org/message-id/flat/4902.1552349020%40sss.pgh.pa.us There I see the question of whether it should loop on EINTR to keep waiting the remaining time. Generally it seems like a job for something higher level to deal with interruption policy, and of course all the race condition and portability problems inherent with signals are fixed by using latches instead, so I don't think there really is a good answer to that question -- if you loop, you break our programming rules by wilfully ignoring eg global barriers, but if you don't loop, it implies you're relying on the interrupt to cause you to do something and yet you might have missed it if it was delivered just before the syscall. At the time of the earlier thread, maybe it was more acceptable as it could only delay cancel for that backend, but now it might even delay arbitrary other backends, and neither answer to that question can fix that in a race-free way. Also, back then latches had a SIGUSR1 handler on common systems, but now they don't, so (racy unreliable) latch responsiveness has decreased since then. So I think we should just leave the interface as it is, and build better things and then eventually retire it. This general topic is also currently being discussed at: https://www.postgresql.org/message-id/flat/20230209205929.GA720594%40nathanxps13 I propose to go ahead and make this small improvement anyway because it'll surely be a while before we delete the last pg_usleep() call, and it's good to spring-clean old confusing commentary about signals and portability.
From 03c7bd1a8df9f3bd3b6b00ad86c8ca734b45f9bd Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Sat, 11 Mar 2023 10:42:20 +1300 Subject: [PATCH 1/3] Fix fractional vacuum_cost_delay. Commit 4753ef37 changed vacuum_delay_point() to use the WaitLatch() API, to fix the problem that vacuum could keep running for a very long time after the postmaster died. Unfortunately, that broke commit caf626b2's support for fractional vacuum_cost_delay, which shipped in PostgreSQL 12. WaitLatch() works in whole milliseconds. For now, revert the change from commit 4753ef37, but add an explicit check for postmaster death. That's an extra system call on systems other than Linux and FreeBSD, but that overhead doesn't matter much considering that we willingly went to sleep and woke up again. (In later work, we might add higher resolution timeouts to the latch API so that we could do this in the standard programming pattern, but that wouldn't be back-patched.) Back-patch to 14, where commit 4753ef37 arrived. Reported-by: Melanie Plageman <melanieplage...@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com --- src/backend/commands/vacuum.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 2e12baf8eb..c54360a6a0 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -50,6 +50,7 @@ #include "postmaster/bgworker_internals.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" +#include "storage/pmsignal.h" #include "storage/proc.h" #include "storage/procarray.h" #include "utils/acl.h" @@ -2232,11 +2233,18 @@ vacuum_delay_point(void) if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - (void) WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - msec, - WAIT_EVENT_VACUUM_DELAY); - ResetLatch(MyLatch); + pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY); + pg_usleep(msec * 1000); + pgstat_report_wait_end(); + + /* + * We don't want to ignore postmaster death during very long vacuums + * with vacuum_cost_delay configured. We can't use the usual + * WaitLatch() approach here because we want microsecond-based sleep + * durations above. + */ + if (IsUnderPostmaster && !PostmasterIsAlive()) + exit(1); VacuumCostBalance = 0; -- 2.39.2
From f0c3b6506e040933d2466ee4061f29a9a946a155 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 13 Mar 2023 09:50:06 +1300 Subject: [PATCH 2/3] Update obsolete comment about pg_usleep() accuracy. There are still some systems that use traditional tick or jiffy-based sleep timing, but many including Linux (see "man 7 time"), FreeBSD and macOS are limited only by the accuracy of the available timer hardware and system load. Update our comment about that, which previously said that most Unix systems had that design. Also mention that Windows is like the older Unixen in this respect. Reported-by: Nathan Bossart <nathandboss...@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BogAon8_V223Ldv6taPR2uKH3X_UJ_A7LJAf3-VRARPA%40mail.gmail.com --- src/port/pgsleep.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c index 8a709cd01d..dcba7eda44 100644 --- a/src/port/pgsleep.c +++ b/src/port/pgsleep.c @@ -26,8 +26,9 @@ * pg_usleep --- delay the specified number of microseconds. * * NOTE: although the delay is specified in microseconds, the effective - * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect - * the requested delay to be rounded up to the next resolution boundary. + * resolution is only 1/HZ on systems that use periodic kernel ticks to wake + * up. This may cause sleeps to be rounded up by 1-20 milliseconds on older + * Unixen and Windows. * * On machines where "long" is 32 bits, the maximum delay is ~2000 seconds. * -- 2.39.2
From 1f0bd35300dd0c669d4bbf1f50857f17ae64b08c Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 13 Mar 2023 10:45:14 +1300 Subject: [PATCH 3/3] Use nanosleep() to implement pg_usleep(). The previous coding based on select() required a lot of commentary about historical portability concerns. We can just get rid of all that and use the POSIX nanosleep() function. The newer clock_nanosleep() variant might be better, because it can take a CLOCK_MONOTONIC argument to avoid the chance that ntpd adjustments affect sleep times, but macOS hasn't caught up with that yet. --- src/port/pgsleep.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c index dcba7eda44..3f42021596 100644 --- a/src/port/pgsleep.c +++ b/src/port/pgsleep.c @@ -12,9 +12,7 @@ */ #include "c.h" -#include <unistd.h> -#include <sys/select.h> -#include <sys/time.h> +#include <time.h> /* * In a Windows backend, we don't use this implementation, but rather @@ -32,15 +30,11 @@ * * On machines where "long" is 32 bits, the maximum delay is ~2000 seconds. * - * CAUTION: the behavior when a signal arrives during the sleep is platform - * dependent. On most Unix-ish platforms, a signal does not terminate the - * sleep; but on some, it will (the Windows implementation also allows signals - * to terminate pg_usleep). And there are platforms where not only does a - * signal not terminate the sleep, but it actually resets the timeout counter - * so that the sleep effectively starts over! It is therefore rather hazardous - * to use this for long sleeps; a continuing stream of signal events could - * prevent the sleep from ever terminating. Better practice for long sleeps - * is to use WaitLatch() with a timeout. + * CAUTION: if interrupted by a signal, this function will return, but its + * interface doesn't report that. It's not a good idea to use this + * for long sleeps in the backend, because backends are expected to respond to + * interrupts promptly. Better practice for long sleeps is to use WaitLatch() + * with a timeout. */ void pg_usleep(long microsec) @@ -48,11 +42,11 @@ pg_usleep(long microsec) if (microsec > 0) { #ifndef WIN32 - struct timeval delay; + struct timespec delay; delay.tv_sec = microsec / 1000000L; - delay.tv_usec = microsec % 1000000L; - (void) select(0, NULL, NULL, NULL, &delay); + delay.tv_nsec = (microsec % 1000000L) * 1000; + (void) nanosleep(&delay, NULL); #else SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE); #endif -- 2.39.2