Re: EINTR in ftruncate()

2022-11-08 Thread Thomas Munro
On Wed, Aug 17, 2022 at 7:51 AM Thomas Munro  wrote:
> On Sat, Jul 16, 2022 at 5:18 PM Thomas Munro  wrote:
> > On Sat, Jul 16, 2022 at 1:28 AM Tom Lane  wrote:
> > > Thomas Munro  writes:
> > > > On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
> > > >> (Someday we oughta go ahead and make our Windows signal API look more
> > > >> like POSIX, as I suggested back in 2015.  I'm still not taking
> > > >> point on that, though.)
> > >
> > > > For the sigprocmask() part, here's a patch that passes CI.  Only the
> > > > SIG_SETMASK case is actually exercised by our current code, though.
> > >
> > > Passes an eyeball check, but I can't actually test it.
> >
> > Thanks.  Pushed.
> >
> > I'm not brave enough to try to write a replacement sigaction() yet,
> > but it does appear that we could rip more ugliness and inconsistencies
> > that way, eg sa_mask.
>
> Here's a draft patch that adds a minimal sigaction() implementation
> for Windows.  It doesn't implement stuff we're not using, for sample
> sa_sigaction functions, but it has the sa_mask feature so we can
> harmonize the stuff that I believe you were talking about.

Pushed.

As discussed before, a much better idea would probably be to use
latch-based wakeup instead of putting postmaster logic in signal
handlers, but in the meantime this gets rid of the extra
Windows-specific noise.




Re: EINTR in ftruncate()

2022-09-30 Thread Andres Freund
Hi,

On 2022-09-30 13:53:45 -0500, Justin Pryzby wrote:
> On Wed, Aug 17, 2022 at 07:51:34AM +1200, Thomas Munro wrote:
> > Here's a draft patch that adds a minimal sigaction() implementation
> > for Windows.  It doesn't implement stuff we're not using, for sample
> > sa_sigaction functions, but it has the sa_mask feature so we can
> > harmonize the stuff that I believe you were talking about.
> 
> Did you see that this paniced ?
> 
> https://cirrus-ci.com/task/4975957546106880
> https://api.cirrus-ci.com/v1/artifact/task/4975957546106880/testrun/build/testrun/recovery/027_stream_regress/log/027_stream_regress_standby_1.log
> 
> 2022-09-30 09:13:03.496 GMT [7312][startup] PANIC:  
> hash_xlog_split_allocate_page: failed to acquire cleanup lock
> 2022-09-30 09:13:03.496 GMT [7312][startup] CONTEXT:  WAL redo at 0/7AF6FA8 
> for Hash/SPLIT_ALLOCATE_PAGE: new_bucket 63, meta_page_masks_updated F, 
> issplitpoint_changed F; blkref #0: rel 1663/16384/23784, blk 45; blkref #1: 
> rel 1663/16384/23784, blk 78; blkref #2: rel 1663/16384/23784, blk 0

This looks like broken code in hash, independent of any recent changes:
https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-09-30 Thread Justin Pryzby
On Wed, Aug 17, 2022 at 07:51:34AM +1200, Thomas Munro wrote:
> Here's a draft patch that adds a minimal sigaction() implementation
> for Windows.  It doesn't implement stuff we're not using, for sample
> sa_sigaction functions, but it has the sa_mask feature so we can
> harmonize the stuff that I believe you were talking about.

Did you see that this paniced ?

https://cirrus-ci.com/task/4975957546106880
https://api.cirrus-ci.com/v1/artifact/task/4975957546106880/testrun/build/testrun/recovery/027_stream_regress/log/027_stream_regress_standby_1.log

2022-09-30 09:13:03.496 GMT [7312][startup] PANIC:  
hash_xlog_split_allocate_page: failed to acquire cleanup lock
2022-09-30 09:13:03.496 GMT [7312][startup] CONTEXT:  WAL redo at 0/7AF6FA8 for 
Hash/SPLIT_ALLOCATE_PAGE: new_bucket 63, meta_page_masks_updated F, 
issplitpoint_changed F; blkref #0: rel 1663/16384/23784, blk 45; blkref #1: rel 
1663/16384/23784, blk 78; blkref #2: rel 1663/16384/23784, blk 0




Re: EINTR in ftruncate()

2022-08-16 Thread Thomas Munro
On Sat, Jul 16, 2022 at 5:18 PM Thomas Munro  wrote:
> On Sat, Jul 16, 2022 at 1:28 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
> > >> (Someday we oughta go ahead and make our Windows signal API look more
> > >> like POSIX, as I suggested back in 2015.  I'm still not taking
> > >> point on that, though.)
> >
> > > For the sigprocmask() part, here's a patch that passes CI.  Only the
> > > SIG_SETMASK case is actually exercised by our current code, though.
> >
> > Passes an eyeball check, but I can't actually test it.
>
> Thanks.  Pushed.
>
> I'm not brave enough to try to write a replacement sigaction() yet,
> but it does appear that we could rip more ugliness and inconsistencies
> that way, eg sa_mask.

Here's a draft patch that adds a minimal sigaction() implementation
for Windows.  It doesn't implement stuff we're not using, for sample
sa_sigaction functions, but it has the sa_mask feature so we can
harmonize the stuff that I believe you were talking about.
From f00f169e10b884dab337b73b31de89577fb662de Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 17 Aug 2022 07:36:53 +1200
Subject: [PATCH] Provide sigaction for Windows.

Harmonize Unix and Windows code by modernizing our signal emulation to
use sigaction() instead of signal().

Discussion: https://postgr.es/m/CA%2BhUKGK5ax1AoFMvXksLROrxwB1%3D-EsT%3D7%2B6oUtCn47GbC9zjA%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c|  9 -
 src/backend/port/win32/signal.c | 41 +++--
 src/backend/postmaster/postmaster.c | 57 ++---
 src/include/libpq/pqsignal.h| 14 +++
 src/port/pqsignal.c | 22 +--
 5 files changed, 56 insertions(+), 87 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 26ed671d1a..1ab34c5214 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -110,16 +110,10 @@ pqinitmask(void)
  * postmaster ever unblocks signals.
  *
  * pqinitmask() must have been invoked previously.
- *
- * On Windows, this function is just an alias for pqsignal()
- * (and note that it's calling the code in src/backend/port/win32/signal.c,
- * not src/port/pqsignal.c).  On that platform, the postmaster's signal
- * handlers still have to block signals for themselves.
  */
 pqsigfunc
 pqsignal_pm(int signo, pqsigfunc func)
 {
-#ifndef WIN32
 	struct sigaction act,
 oact;
 
@@ -142,7 +136,4 @@ pqsignal_pm(int signo, pqsigfunc func)
 	if (sigaction(signo, , ) < 0)
 		return SIG_ERR;
 	return oact.sa_handler;
-#else			/* WIN32 */
-	return pqsignal(signo, func);
-#endif
 }
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 53b93a50b2..d533de1bc6 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -34,7 +34,7 @@ HANDLE		pgwin32_initial_signal_pipe = INVALID_HANDLE_VALUE;
 static CRITICAL_SECTION pg_signal_crit_sec;
 
 /* Note that array elements 0 are unused since they correspond to signal 0 */
-static pqsigfunc pg_signal_array[PG_SIGNAL_COUNT];
+static struct sigaction pg_signal_array[PG_SIGNAL_COUNT];
 static pqsigfunc pg_signal_defaults[PG_SIGNAL_COUNT];
 
 
@@ -85,7 +85,9 @@ pgwin32_signal_initialize(void)
 
 	for (i = 0; i < PG_SIGNAL_COUNT; i++)
 	{
-		pg_signal_array[i] = SIG_DFL;
+		pg_signal_array[i].sa_handler = SIG_DFL;
+		pg_signal_array[i].sa_mask = 0;
+		pg_signal_array[i].sa_flags = 0;
 		pg_signal_defaults[i] = SIG_IGN;
 	}
 	pg_signal_mask = 0;
@@ -131,15 +133,27 @@ pgwin32_dispatch_queued_signals(void)
 			if (exec_mask & sigmask(i))
 			{
 /* Execute this signal */
-pqsigfunc	sig = pg_signal_array[i];
+struct sigaction *act = _signal_array[i];
+pqsigfunc	sig = act->sa_handler;
 
 if (sig == SIG_DFL)
 	sig = pg_signal_defaults[i];
 pg_signal_queue &= ~sigmask(i);
 if (sig != SIG_ERR && sig != SIG_IGN && sig != SIG_DFL)
 {
+	sigset_t	block_mask;
+	sigset_t	save_mask;
+
 	LeaveCriticalSection(_signal_crit_sec);
+
+	block_mask = act->sa_mask;
+	if ((act->sa_flags & SA_NODEFER) == 0)
+		block_mask |= sigmask(i);
+
+	sigprocmask(SIG_BLOCK, _mask, _mask);
 	sig(i);
+	sigprocmask(SIG_SETMASK, _mask, NULL);
+
 	EnterCriticalSection(_signal_crit_sec);
 	break;		/* Restart outer loop, in case signal mask or
  * queue has been modified inside signal
@@ -187,22 +201,25 @@ pqsigprocmask(int how, const sigset_t *set, sigset_t *oset)
 	return 0;
 }
 
-
 /*
  * Unix-like signal handler installation
  *
  * Only called on main thread, no sync required
  */
-pqsigfunc
-pqsignal(int signum, pqsigfunc handler)
+int
+pqsigaction(int signum, const struct sigaction *act,
+			struct sigaction *oldact)
 {
-	pqsigfunc	prevfunc;
-
 	if (signum >= PG_SIGNAL_COUNT || signum < 0)
-		return SIG_ERR;
-	prevfunc = pg_signal_array[signum];
-	pg_signal_array[signum] = handler;
-	return prevfunc;
+	{
+		errno = EINVAL;

Re: EINTR in ftruncate()

2022-07-15 Thread Thomas Munro
On Sat, Jul 16, 2022 at 1:28 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
> >> (Someday we oughta go ahead and make our Windows signal API look more
> >> like POSIX, as I suggested back in 2015.  I'm still not taking
> >> point on that, though.)
>
> > For the sigprocmask() part, here's a patch that passes CI.  Only the
> > SIG_SETMASK case is actually exercised by our current code, though.
>
> Passes an eyeball check, but I can't actually test it.

Thanks.  Pushed.

I'm not brave enough to try to write a replacement sigaction() yet,
but it does appear that we could rip more ugliness and inconsistencies
that way, eg sa_mask.




Re: EINTR in ftruncate()

2022-07-15 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
>> (Someday we oughta go ahead and make our Windows signal API look more
>> like POSIX, as I suggested back in 2015.  I'm still not taking
>> point on that, though.)

> For the sigprocmask() part, here's a patch that passes CI.  Only the
> SIG_SETMASK case is actually exercised by our current code, though.

Passes an eyeball check, but I can't actually test it.

regards, tom lane




Re: EINTR in ftruncate()

2022-07-14 Thread Thomas Munro
On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
> (Someday we oughta go ahead and make our Windows signal API look more
> like POSIX, as I suggested back in 2015.  I'm still not taking
> point on that, though.)

For the sigprocmask() part, here's a patch that passes CI.  Only the
SIG_SETMASK case is actually exercised by our current code, though.

One weird thing about our PG_SETMASK() macro is that you couldn't have
used its return value portably: on Windows we were returning the old
mask (like sigsetmask(), which has no way to report errors), and on
Unix we were returning 0/-1 (from setprocmask(), ie the error we never
checked).
From b21140e9b3f593b46c3bb4782c6bf84ca248dc15 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 15 Jul 2022 14:35:01 +1200
Subject: [PATCH] Emulate sigprocmask(), not sigsetmask(), on Windows.

Since commit a65e0864, we've required Unix systems to have
sigprocmask().  For Windows we still emulated the old deprecated 4.2BSD
function sigsetmask().  Emulate sigprocmask() instead, for consistency.

Discussion: https://postgr.es/m/3153247.1657834482%40sss.pgh.pa.us
---
 src/backend/port/win32/signal.c | 29 +++--
 src/include/libpq/pqsignal.h| 11 +++
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index b71164d8db..53b93a50b2 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -112,7 +112,7 @@ pgwin32_signal_initialize(void)
 /*
  * Dispatch all signals currently queued and not blocked
  * Blocked signals are ignored, and will be fired at the time of
- * the pqsigsetmask() call.
+ * the pqsigprocmask() call.
  */
 void
 pgwin32_dispatch_queued_signals(void)
@@ -154,12 +154,29 @@ pgwin32_dispatch_queued_signals(void)
 
 /* signal masking. Only called on main thread, no sync required */
 int
-pqsigsetmask(int mask)
+pqsigprocmask(int how, const sigset_t *set, sigset_t *oset)
 {
-	int			prevmask;
+	if (oset)
+		*oset = pg_signal_mask;
 
-	prevmask = pg_signal_mask;
-	pg_signal_mask = mask;
+	if (!set)
+		return 0;
+
+	switch (how)
+	{
+		case SIG_BLOCK:
+			pg_signal_mask |= *set;
+			break;
+		case SIG_UNBLOCK:
+			pg_signal_mask &= ~*set;
+			break;
+		case SIG_SETMASK:
+			pg_signal_mask = *set;
+			break;
+		default:
+			errno = EINVAL;
+			return -1;
+	}
 
 	/*
 	 * Dispatch any signals queued up right away, in case we have unblocked
@@ -167,7 +184,7 @@ pqsigsetmask(int mask)
 	 */
 	pgwin32_dispatch_queued_signals();
 
-	return prevmask;
+	return 0;
 }
 
 
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 41227a30e2..d17ddb787e 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -15,15 +15,18 @@
 
 #include 
 
-#ifndef WIN32
 #define PG_SETMASK(mask)	sigprocmask(SIG_SETMASK, mask, NULL)
-#else
+
+#ifdef WIN32
 /* Emulate POSIX sigset_t APIs on Windows */
 typedef int sigset_t;
 
-extern int	pqsigsetmask(int mask);
+extern int	pqsigprocmask(int how, const sigset_t *set, sigset_t *oset);
 
-#define PG_SETMASK(mask)		pqsigsetmask(*(mask))
+#define SIG_BLOCK1
+#define SIG_UNBLOCK2
+#define SIG_SETMASK3
+#define sigprocmask(how, set, oset) pqsigprocmask((how), (set), (oset))
 #define sigemptyset(set)		(*(set) = 0)
 #define sigfillset(set)			(*(set) = ~0)
 #define sigaddset(set, signum)	(*(set) |= (sigmask(signum)))
-- 
2.36.1



Re: EINTR in ftruncate()

2022-07-14 Thread Tom Lane
Thomas Munro  writes:
> So why would I add another wrapper like PG_SETMASK and leave it
> unimplemented for now on Windows, when I could just use sigprocmask()
> directly and leave it unimplemented for now on Windows?

Fair enough, I guess.  No objection to this patch.

(Someday we oughta go ahead and make our Windows signal API look more
like POSIX, as I suggested back in 2015.  I'm still not taking
point on that, though.)

regards, tom lane




Re: EINTR in ftruncate()

2022-07-14 Thread Thomas Munro
On Fri, Jul 15, 2022 at 3:27 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > ISTM it would be cleaner to patch PG_SETMASK to have a second argument
> > and to return the original mask if that's not NULL.  This is more
> > invasive, but there aren't that many callsites of that macro.
>
> [ shoulda read your message before replying ]
>
> Given that this needs back-patched, I think changing PG_SETMASK
> is a bad idea: there might be outside callers.  However, we could
> add another macro with the additional argument.  PG_GET_AND_SET_MASK?

It's funny though, the reason we had PG_SETMASK in the first place is
not for Windows.  Ancient commit 47937403676 added that for long gone
pre-POSIX systems like NeXTSTEP which only had single-argument
sigsetmask(), not sigprocmask().  In general on Windows we're
emulating POSIX signal interfaces with normal names like sigemptyset()
etc, it just so happens that we chose to emulate that pre-standard
sigsetmask() interface (as you complained about in the commit message
for a65e0864).

So why would I add another wrapper like PG_SETMASK and leave it
unimplemented for now on Windows, when I could just use sigprocmask()
directly and leave it unimplemented for now on Windows?

The only reason I can think of for a wrapper is to provide a place to
check the return code and ereport (panic?).  That seems to be of
limited value (how can it fail ... bad "how" value, or a sandbox
denying some system calls, ...?).  I did make sure to preserve the
errno though; even though we're assuming these calls can't fail by
long standing tradition, I didn't feel like additionally assuming that
successful calls don't clobber errno.

I guess, coded like this, it should also be safe to do it in the
postmaster, but I think maybe we should leave it conditional, rather
than relying on BlockSig being initialised and sane during early
postmaster initialisation.
From 17eb588af303873dbd76d0f24b536f74747cb3bf Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 15 Jul 2022 09:00:12 +1200
Subject: [PATCH] Make dsm_impl_resize more future-proof.

Commit 4518c798 blocks signals for a short region of code, but it
assumed that whatever calls it had the mask set to UnBlockSig on entry.
That may be true today, but it would be more resilient to save and
restore the caller's signal mask.

Previously we couldn't do this because our macro for abstracting signal
mask changes was based on the old pre-POSIX sigsetmask(), due to 1990s
portability concerns.  Since this is a POSIX-only code path, and since
a65e0864 established that supported Unix systems must have
sigprocmask(), we can just use sigprocmask() directly.  It would also be
implementable for Windows, but that's not needed for this.

Back-patch to all releases, like 4518c798 and 80845b7c.

Reviewed-by: Alvaro Herrera 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com
---
 src/backend/storage/ipc/dsm_impl.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 0247d13a91..69c6df75b4 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -49,6 +49,7 @@
 #include "postgres.h"
 
 #include 
+#include 
 #include 
 #ifndef WIN32
 #include 
@@ -62,7 +63,7 @@
 #endif
 
 #include "common/file_perm.h"
-#include "libpq/pqsignal.h"		/* for PG_SETMASK macro */
+#include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "portability/mem.h"
@@ -355,6 +356,7 @@ dsm_impl_posix_resize(int fd, off_t size)
 {
 	int			rc;
 	int			save_errno;
+	sigset_t	save_sigmask;
 
 	/*
 	 * Block all blockable signals, except SIGQUIT.  posix_fallocate() can run
@@ -363,7 +365,7 @@ dsm_impl_posix_resize(int fd, off_t size)
 	 * conflicts), the retry loop might never succeed.
 	 */
 	if (IsUnderPostmaster)
-		PG_SETMASK();
+		sigprocmask(SIG_SETMASK, , _sigmask);
 
 	pgstat_report_wait_start(WAIT_EVENT_DSM_ALLOCATE);
 #if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
@@ -402,7 +404,7 @@ dsm_impl_posix_resize(int fd, off_t size)
 	if (IsUnderPostmaster)
 	{
 		save_errno = errno;
-		PG_SETMASK();
+		sigprocmask(SIG_SETMASK, _sigmask, NULL);
 		errno = save_errno;
 	}
 
-- 
2.30.2



Re: EINTR in ftruncate()

2022-07-14 Thread Tom Lane
Thomas Munro  writes:
> ... but now I'm wondering if we should be more defensive and possibly
> even save/restore the mask.

+1, sounds like a more future-proof solution.

> Originally I discounted that because I
> thought I had to go through PG_SETMASK for portability reasons, but on
> closer inspection, I don't see any reason not to use sigprocmask
> directly in Unix-only code.

Seems like the thing to do is to add a suitable operation to the
pqsignal.h API.  We could leave it unimplemented for now on Windows,
and then worry what to do if we ever need it in that context.

regards, tom lane




Re: EINTR in ftruncate()

2022-07-14 Thread Tom Lane
Alvaro Herrera  writes:
> ISTM it would be cleaner to patch PG_SETMASK to have a second argument
> and to return the original mask if that's not NULL.  This is more
> invasive, but there aren't that many callsites of that macro.

[ shoulda read your message before replying ]

Given that this needs back-patched, I think changing PG_SETMASK
is a bad idea: there might be outside callers.  However, we could
add another macro with the additional argument.  PG_GET_AND_SET_MASK?

regards, tom lane




Re: EINTR in ftruncate()

2022-07-14 Thread Alvaro Herrera
On 2022-Jul-15, Thomas Munro wrote:

> I checked that this throw-away assertion doesn't fail currently:
> 
> if (IsUnderPostmaster)
> +   {
> +   sigset_t old;
> +   sigprocmask(SIG_SETMASK, NULL, );
> +   Assert(memcmp(, , sizeof(UnBlockSig)) == 0);
> PG_SETMASK();
> +   }
> 
> ... but now I'm wondering if we should be more defensive and possibly
> even save/restore the mask.

Yeah, that sounds better to me.

> Originally I discounted that because I thought I had to go through
> PG_SETMASK for portability reasons, but on closer inspection, I don't
> see any reason not to use sigprocmask directly in Unix-only code.

ISTM it would be cleaner to patch PG_SETMASK to have a second argument
and to return the original mask if that's not NULL.  This is more
invasive, but there aren't that many callsites of that macro.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: EINTR in ftruncate()

2022-07-14 Thread Thomas Munro
On Fri, Jul 15, 2022 at 1:02 AM Thomas Munro  wrote:
> On Fri, Jul 15, 2022 at 12:15 AM Thomas Munro  wrote:
> > Yeah.  Done, and pushed.  0002 not back-patched.
>
> Hmm, there were a couple of hard to understand build farm failures.
> My first thought is that the signal mask stuff should only be done if
> IsUnderPostmaster, otherwise it clobbers the postmaster's signal mask
> when reached from dsm_postmaster_startup().  Looking into that.

I pushed that change, and I hope that clears up the problems seen on
eg curculio.  It does raise the more general question of why it's safe
to assume the signal mask is UnBlockSig on entry in regular backends.
I expect it to be in released branches, but things get more
complicated as we use DSM in more ways and it's not ideal to bet on
that staying true.  I checked that this throw-away assertion doesn't
fail currently:

if (IsUnderPostmaster)
+   {
+   sigset_t old;
+   sigprocmask(SIG_SETMASK, NULL, );
+   Assert(memcmp(, , sizeof(UnBlockSig)) == 0);
PG_SETMASK();
+   }

... but now I'm wondering if we should be more defensive and possibly
even save/restore the mask.  Originally I discounted that because I
thought I had to go through PG_SETMASK for portability reasons, but on
closer inspection, I don't see any reason not to use sigprocmask
directly in Unix-only code.




Re: EINTR in ftruncate()

2022-07-14 Thread Thomas Munro
On Fri, Jul 15, 2022 at 12:15 AM Thomas Munro  wrote:
> Yeah.  Done, and pushed.  0002 not back-patched.

Hmm, there were a couple of hard to understand build farm failures.
My first thought is that the signal mask stuff should only be done if
IsUnderPostmaster, otherwise it clobbers the postmaster's signal mask
when reached from dsm_postmaster_startup().  Looking into that.




Re: EINTR in ftruncate()

2022-07-14 Thread Thomas Munro
On Tue, Jul 12, 2022 at 5:45 AM Andres Freund  wrote:
> Hm - given that we've observed ftruncate failing with strace, and that
> stracing to find problems isn't insane, shouldn't we retry the ftruncate too?
> It's kind of obsoleted by your next patch, but not really, because it's not
> unconceivable that other OSs behave similarly? And IIUC you're planning to not
> backpatch 0002?

Yeah.  Done, and pushed.  0002 not back-patched.

> > + pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
>
> (not new in this patch, just moved around) - FILL_ZERO_WRITE is imo an odd
> description of what's happening... In my understanding this isn't doing any
> writing, just allocating. But whatever...

True.  Fixed in master.




Re: EINTR in ftruncate()

2022-07-11 Thread Andres Freund
Hi,

On 2022-07-07 17:58:10 +1200, Thomas Munro wrote:
> Even if we go with this approach now, I think it's plausible that we
> might want to reconsider this yet again one day, perhaps allocating
> memory with some future asynchronous infrastructure while still
> processing interrupts.  It's not very nice to hold up recovery or
> ProcSignalBarrier for long operations.

I think the next improvement would be to do the fallocate in smaller chunks,
and accept interrupts inbetween.


> I'm a little unclear about ftruncate() here.  I don't expect it to
> report EINTR in other places where we use it (ie to make a local file
> on a non-"slow device" smaller), because I expect that to be like
> read(), write() etc which we don't wrap in EINTR loops.  Here you've
> observed EINTR when messing around with a debugger*.  It seems
> inconsistent to put posix_fallocate() in an EINTR retry loop for the
> benefit of debuggers, but not ftruncate().  But perhaps that's good
> enough, on the theory that posix_fallocate(1GB) is a very large target
> and you have a decent chance of hitting it.

> *It's funny that ftruncate() apparently doesn't automatically restart
> for ptrace SIGCONT on Linux according to your report, while poll()
> does according to my experiments, even though the latter is one of the
> never-restart functions (it doesn't on other OSes I hack on, and you
> feel the difference when debugging missing wakeup type bugs...).
> Random implementation details...

My test was basically while (true); {if (!ftruncate()) bleat(); if
(!fallocate()) bleat();} with a SIGALRM triggering regularly in the
background. The ftruncate failed, rarely, when attaching / detaching strace
-p. Rarely enough that I had already written you an IM saying that I couldn't
make it fail...  So it's hard to be confident this can't otherwise be
hit. With that caveat: I didn't hit it with a "real" file on a "real"
filesystem in a few minutes of trying. But unsurprisingly it triggers when
putting the file on a tmpfs.


> @@ -362,6 +355,14 @@ dsm_impl_posix_resize(int fd, off_t size)
>  {
>   int rc;
>  
> + /*
> +  * Block all blockable signals, except SIGQUIT.  posix_fallocate() can 
> run
> +  * for quite a long time, and is an all-or-nothing operation.  If we
> +  * allowed SIGUSR1 to interrupt us repeatedly (for example, due to 
> recovery
> +  * conflicts), the retry loop might never succeed.
> +  */
> + PG_SETMASK();
> +
>   /* Truncate (or extend) the file to the requested size. */
>   rc = ftruncate(fd, size);
>

Hm - given that we've observed ftruncate failing with strace, and that
stracing to find problems isn't insane, shouldn't we retry the ftruncate too?
It's kind of obsoleted by your next patch, but not really, because it's not
unconceivable that other OSs behave similarly? And IIUC you're planning to not
backpatch 0002?


> + pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);

(not new in this patch, just moved around) - FILL_ZERO_WRITE is imo an odd
description of what's happening... In my understanding this isn't doing any
writing, just allocating. But whatever...


Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-11 Thread Alvaro Herrera
On 2022-Jul-07, Thomas Munro wrote:

> On Thu, Jul 7, 2022 at 9:05 AM Thomas Munro  wrote:
> > On Thu, Jul 7, 2022 at 9:03 AM Andres Freund  wrote:
> > > On 2022-07-07 08:56:33 +1200, Thomas Munro wrote:
> > > > On Thu, Jul 7, 2022 at 8:39 AM Andres Freund  wrote:
> > > > > So I think we need: 1) block most signals, 2) a retry loop *without*
> > > > > interrupt checks.
> 
> Here's a draft patch that tries to explain all this in the commit
> message and comments.

I gave 0001 a try.  I agree with the approach, and it seems to work as
intended; or at least I couldn't break it under GDB.

I didn't look at 0002, but I wish that the pgstat_report_wait calls were
moved to cover both syscalls in a separate commit, just so we still have
them even if we decide not to do 0002.

Do you intend to get it pushed before the next minors?  I have an
interest in that happening.  Thanks for working on this.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Here's a general engineering tip: if the non-fun part is too complex for you
to figure out, that might indicate the fun part is too ambitious." (John Naylor)
https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com




Re: EINTR in ftruncate()

2022-07-06 Thread Thomas Munro
On Thu, Jul 7, 2022 at 9:05 AM Thomas Munro  wrote:
> On Thu, Jul 7, 2022 at 9:03 AM Andres Freund  wrote:
> > On 2022-07-07 08:56:33 +1200, Thomas Munro wrote:
> > > On Thu, Jul 7, 2022 at 8:39 AM Andres Freund  wrote:
> > > > So I think we need: 1) block most signals, 2) a retry loop *without*
> > > > interrupt checks.

Here's a draft patch that tries to explain all this in the commit
message and comments.

Even if we go with this approach now, I think it's plausible that we
might want to reconsider this yet again one day, perhaps allocating
memory with some future asynchronous infrastructure while still
processing interrupts.  It's not very nice to hold up recovery or
ProcSignalBarrier for long operations.

I'm a little unclear about ftruncate() here.  I don't expect it to
report EINTR in other places where we use it (ie to make a local file
on a non-"slow device" smaller), because I expect that to be like
read(), write() etc which we don't wrap in EINTR loops.  Here you've
observed EINTR when messing around with a debugger*.  It seems
inconsistent to put posix_fallocate() in an EINTR retry loop for the
benefit of debuggers, but not ftruncate().  But perhaps that's good
enough, on the theory that posix_fallocate(1GB) is a very large target
and you have a decent chance of hitting it.

Another observation while staring at that ftruncate():  It's entirely
redundant on Linux, because we only ever call dsm_impl_posix_resize()
to make the segment bigger.  Before commit 3c60d0fa (v12) it was
possible to resize a segment to be smaller with dsm_resize(), so you
needed one or t'other depending on the requested size and we just
called both, but dsm_resize() wasn't ever used AFAIK and didn't even
work on all DSM implementations, among other problems, so we ripped it
out.  So... on master at least, we could also change the #ifdef to be
either-or.  While refactoring like that, I think we might as well also
rearrange the code so that the wait event is reported also for other
OSes, just in case it takes a long time.  See 0002 patch.

*It's funny that ftruncate() apparently doesn't automatically restart
for ptrace SIGCONT on Linux according to your report, while poll()
does according to my experiments, even though the latter is one of the
never-restart functions (it doesn't on other OSes I hack on, and you
feel the difference when debugging missing wakeup type bugs...).
Random implementation details...
From d22c937914906135fddc8635792641089ba3e2cd Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 7 Jul 2022 15:15:11 +1200
Subject: [PATCH 1/2] Block signals while allocating DSM memory.

On Linux, we call posix_fallocate() on shm_open()'d memory to avoid
later potential SIGBUS (see commit 899bd785).

Based on field reports of systems stuck in an EINTR retry loop there,
there, we made it possible to break out of that loop via slightly odd
coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from
the loop (see commit 422952ee).

On further reflection, that was not a great choice for at least two
reasons:

1.  If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing
and the EINTR error would be surfaced to the user.

2.  If EINTR was reported but neither QueryCancelPending nor
ProcDiePending was set, then we'd dutifully retry, but with a bit more
understanding of how posix_fallocate() works, it's now clear that you
can get into a loop that never terminates.  posix_fallocate() is not a
function that can do some of the job and tell you about progress if it's
interrupted, it has to undo what it's done so far and report EINTR, and
if signals keep arriving faster than it can complete (cf recovery
conflict signals), you're stuck.

Therefore, for now, we'll simply block most signals to guarantee
progress.  SIGQUIT is not blocked (see InitPostmasterChild()), because
its expected handler doesn't return, and unblockable signals like
SIGCONT are not expected to arrive at a high rate.  For good measure,
we'll include the ftruncate() call in the blocked region.

Back-patch to all supported releases.

Reported-by: Alvaro Herrera 
Reported-by: Nicola Contu 
Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
---
 src/backend/storage/ipc/dsm_impl.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 323862a3d2..aa47e5de70 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -62,6 +62,7 @@
 #endif
 
 #include "common/file_perm.h"
+#include "libpq/pqsignal.h"		/* for PG_SETMASK macro */
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "portability/mem.h"
@@ -306,14 +307,6 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		shm_unlink(name);
 		errno = save_errno;
 
-		/*
-		 * If we received a query cancel or termination signal, we will have
-		 * EINTR set here.  If the caller said that errors are 

Re: EINTR in ftruncate()

2022-07-06 Thread Thomas Munro
On Thu, Jul 7, 2022 at 9:03 AM Andres Freund  wrote:
> On 2022-07-07 08:56:33 +1200, Thomas Munro wrote:
> > On Thu, Jul 7, 2022 at 8:39 AM Andres Freund  wrote:
> > > So I think we need: 1) block most signals, 2) a retry loop *without*
> > > interrupt checks.
> >
> > Yeah.  I was also wondering about wrapping the whole function in
> > PG_SETMASK(), PG_SETMASK(), but also leaving the
> > while (rc == EINTR) loop there (without the check for *Pending
> > variables), only because otherwise when you attach a debugger and
> > continue you'll get a spurious EINTR and it'll interfere with program
> > execution.  All blockable signals would be blocked *except* SIGQUIT,
> > which means that fast shutdown/crash will still work.  It seems nice
> > to leave that way to interrupt it without resorting to SIGKILL.
>
> Fast shutdown shouldn't use SIGQUIT - did you mean immediate? I think
> it's fine to allow immediate shutdowns, but I don't think we should
> allow fast shutdowns to interrupt it.

Err, yeah, that one.




Re: EINTR in ftruncate()

2022-07-06 Thread Andres Freund
Hi,

On 2022-07-07 08:56:33 +1200, Thomas Munro wrote:
> On Thu, Jul 7, 2022 at 8:39 AM Andres Freund  wrote:
> > So I think we need: 1) block most signals, 2) a retry loop *without*
> > interrupt checks.
> 
> Yeah.  I was also wondering about wrapping the whole function in
> PG_SETMASK(), PG_SETMASK(), but also leaving the
> while (rc == EINTR) loop there (without the check for *Pending
> variables), only because otherwise when you attach a debugger and
> continue you'll get a spurious EINTR and it'll interfere with program
> execution.  All blockable signals would be blocked *except* SIGQUIT,
> which means that fast shutdown/crash will still work.  It seems nice
> to leave that way to interrupt it without resorting to SIGKILL.

Fast shutdown shouldn't use SIGQUIT - did you mean immediate? I think
it's fine to allow immediate shutdowns, but I don't think we should
allow fast shutdowns to interrupt it.

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-06 Thread Thomas Munro
On Thu, Jul 7, 2022 at 8:39 AM Andres Freund  wrote:
> On 2022-07-06 21:29:41 +0200, Alvaro Herrera wrote:
> > On 2022-Jul-05, Andres Freund wrote:
> >
> > > I think we'd be better off disabling at least some signals during
> > > dsm_impl_posix_resize(). I'm afraid we'll otherwise just find another
> > > variation of these problems. I haven't checked the source of ftruncate, 
> > > but
> > > what Thomas dug up for fallocate makes it pretty clear that our current
> > > approach of just retrying again and again isn't good enough. It's a bit 
> > > more
> > > obvious that it's a problem for fallocate, but I don't think it's worth 
> > > having
> > > different solutions for the two.
> >
> > So what if we move the retry loop one level up?  As in the attached.
> > Here, if we get EINTR then we retry both syscalls.
>
> Doesn't really seem to address the problem to me. posix_fallocate()
> takes some time (~1s for 3GB roughly), so if we signal at a higher rate,
> we'll just get stuck.
>
> I hacked a bit on a test program from Thomas, and it's pretty clearly
> that with a 5ms timer interval you'll pretty much not make
> progress. It's much easier to get fallocate() to get interrupted than
> ftruncate(), but the latter gets interrupted e.g. when you do a strace
> in the "wrong" moment (afaics SIGSTOP/SIGCONT trigger EINTR in
> situations that are retried otherwise).
>
> So I think we need: 1) block most signals, 2) a retry loop *without*
> interrupt checks.

Yeah.  I was also wondering about wrapping the whole function in
PG_SETMASK(), PG_SETMASK(), but also leaving the
while (rc == EINTR) loop there (without the check for *Pending
variables), only because otherwise when you attach a debugger and
continue you'll get a spurious EINTR and it'll interfere with program
execution.  All blockable signals would be blocked *except* SIGQUIT,
which means that fast shutdown/crash will still work.  It seems nice
to leave that way to interrupt it without resorting to SIGKILL.




Re: EINTR in ftruncate()

2022-07-06 Thread Andres Freund
Hi,

On 2022-07-06 21:29:41 +0200, Alvaro Herrera wrote:
> On 2022-Jul-05, Andres Freund wrote:
> 
> > I think we'd be better off disabling at least some signals during
> > dsm_impl_posix_resize(). I'm afraid we'll otherwise just find another
> > variation of these problems. I haven't checked the source of ftruncate, but
> > what Thomas dug up for fallocate makes it pretty clear that our current
> > approach of just retrying again and again isn't good enough. It's a bit more
> > obvious that it's a problem for fallocate, but I don't think it's worth 
> > having
> > different solutions for the two.
> 
> So what if we move the retry loop one level up?  As in the attached.
> Here, if we get EINTR then we retry both syscalls.

Doesn't really seem to address the problem to me. posix_fallocate()
takes some time (~1s for 3GB roughly), so if we signal at a higher rate,
we'll just get stuck.

I hacked a bit on a test program from Thomas, and it's pretty clearly
that with a 5ms timer interval you'll pretty much not make
progress. It's much easier to get fallocate() to get interrupted than
ftruncate(), but the latter gets interrupted e.g. when you do a strace
in the "wrong" moment (afaics SIGSTOP/SIGCONT trigger EINTR in
situations that are retried otherwise).

So I think we need: 1) block most signals, 2) a retry loop *without*
interrupt checks.

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-06 Thread Alvaro Herrera
On 2022-Jul-05, Andres Freund wrote:

> I think we'd be better off disabling at least some signals during
> dsm_impl_posix_resize(). I'm afraid we'll otherwise just find another
> variation of these problems. I haven't checked the source of ftruncate, but
> what Thomas dug up for fallocate makes it pretty clear that our current
> approach of just retrying again and again isn't good enough. It's a bit more
> obvious that it's a problem for fallocate, but I don't think it's worth having
> different solutions for the two.

So what if we move the retry loop one level up?  As in the attached.
Here, if we get EINTR then we retry both syscalls.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"
>From 67d4447bc8075b9eb249e7543afb136b2cfaad9b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Jul 2022 17:16:33 +0200
Subject: [PATCH v4] rework retry loop for dsm_impl_op

---
 src/backend/storage/ipc/dsm_impl.c | 101 +
 1 file changed, 58 insertions(+), 43 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 323862a3d2..4f3284b66b 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -295,30 +295,57 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		}
 		request_size = st.st_size;
 	}
-	else if (dsm_impl_posix_resize(fd, request_size) != 0)
+	else
 	{
-		int			save_errno;
+		int			rc;
 
-		/* Back out what's already been done. */
-		save_errno = errno;
-		close(fd);
-		ReleaseExternalFD();
-		shm_unlink(name);
-		errno = save_errno;
+		for (;;)
+		{
+			rc = dsm_impl_posix_resize(fd, request_size);
+			if (rc == 0)
+break;			/* all good */
 
-		/*
-		 * If we received a query cancel or termination signal, we will have
-		 * EINTR set here.  If the caller said that errors are OK here, check
-		 * for interrupts immediately.
-		 */
-		if (errno == EINTR && elevel >= ERROR)
-			CHECK_FOR_INTERRUPTS();
+			/*
+			 * If there's a signal that we need to handle, bail out here to
+			 * clean up and handle it.
+			 */
+			if (QueryCancelPending || ProcDiePending)
+break;
 
-		ereport(elevel,
-(errcode_for_dynamic_shared_memory(),
- errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
-		name, request_size)));
-		return false;
+			/*
+			 * If we were interrupted but it wasn't a Postgres-level signal,
+			 * try again.
+			 */
+			if (rc == EINTR)
+continue;
+			break;
+		}
+
+		if (rc != 0)
+		{
+			int			save_errno;
+
+			/* Back out what's already been done. */
+			save_errno = errno;
+			close(fd);
+			ReleaseExternalFD();
+			shm_unlink(name);
+			errno = save_errno;
+
+			/*
+			 * If we received a query cancel or termination signal, we will have
+			 * EINTR set here.  If the caller said that errors are OK here, check
+			 * for interrupts immediately.
+			 */
+			if (errno == EINTR && elevel >= ERROR)
+CHECK_FOR_INTERRUPTS();
+
+			ereport(elevel,
+	(errcode_for_dynamic_shared_memory(),
+	 errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
+			name, request_size)));
+			return false;
+		}
 	}
 
 	/* Map it. */
@@ -355,15 +382,20 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
  * If necessary, also ensure that virtual memory is actually allocated by the
  * operating system, to avoid nasty surprises later.
  *
- * Returns non-zero if either truncation or allocation fails, and sets errno.
+ * Returns non-zero if either truncation or allocation fails, and errno is set.
  */
 static int
 dsm_impl_posix_resize(int fd, off_t size)
 {
 	int			rc;
 
-	/* Truncate (or extend) the file to the requested size. */
+	/*
+	 * Truncate (or extend) the file to the requested size.
+	 */
+	errno = 0;
 	rc = ftruncate(fd, size);
+	if (rc != 0)
+		return errno;
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
@@ -374,27 +406,10 @@ dsm_impl_posix_resize(int fd, off_t size)
 	 * SIGBUS later.
 	 */
 #if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
-	if (rc == 0)
-	{
-		/*
-		 * We may get interrupted.  If so, just retry unless there is an
-		 * interrupt pending.  This avoids the possibility of looping forever
-		 * if another backend is repeatedly trying to interrupt us.
-		 */
-		pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
-		do
-		{
-			rc = posix_fallocate(fd, 0, size);
-		} while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
-		pgstat_report_wait_end();
-
-		/*
-		 * The caller expects errno to be set, but posix_fallocate() doesn't
-		 * set it.  Instead it returns error numbers directly.  So set errno,
-		 * even though we'll also return rc to indicate success or failure.
-		 */
-		errno = rc;
-	}
+	pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
+	rc = posix_fallocate(fd, 0, 

Re: EINTR in ftruncate()

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-04 13:07:50 +0200, Alvaro Herrera wrote:
> On 2022-Jul-01, Andres Freund wrote:
> 
> > On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > > On 2022-Jul-01, Andres Freund wrote:
> 
> > > > What is the reason for the || ProcDiePending || QueryCancelPending bit? 
> > > > What
> > > > if there's dsm operations intentionally done while QueryCancelPending?
> > > 
> > > That mirrors the test for the other block in that function, which was
> > > added by 63efab4ca139, whose commit message explains:
> 
> > That whole approach seems quite wrong to me. At the absolute very least the
> > code needs to check if interrupts are being processed in the current context
> > before just giving up due to ProcDiePending || QueryCancelPending.
> 
> For the time being, I can just push the addition of the EINTR retry
> without testing ProcDiePending || QueryCancelPending.

I think we'd be better off disabling at least some signals during
dsm_impl_posix_resize(). I'm afraid we'll otherwise just find another
variation of these problems. I haven't checked the source of ftruncate, but
what Thomas dug up for fallocate makes it pretty clear that our current
approach of just retrying again and again isn't good enough. It's a bit more
obvious that it's a problem for fallocate, but I don't think it's worth having
different solutions for the two.

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-04 Thread Alvaro Herrera
On 2022-Jul-01, Andres Freund wrote:

> On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > On 2022-Jul-01, Andres Freund wrote:

> > > What is the reason for the || ProcDiePending || QueryCancelPending bit? 
> > > What
> > > if there's dsm operations intentionally done while QueryCancelPending?
> > 
> > That mirrors the test for the other block in that function, which was
> > added by 63efab4ca139, whose commit message explains:

> That whole approach seems quite wrong to me. At the absolute very least the
> code needs to check if interrupts are being processed in the current context
> before just giving up due to ProcDiePending || QueryCancelPending.

For the time being, I can just push the addition of the EINTR retry
without testing ProcDiePending || QueryCancelPending.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)
>From b7591beb919c3cfaa8090bda3977b7127de8de28 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Jul 2022 17:16:33 +0200
Subject: [PATCH v3] retry ftruncate

---
 src/backend/storage/ipc/dsm_impl.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 82b7978aeb..145a02204b 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -417,8 +417,19 @@ dsm_impl_posix_resize(int fd, off_t size)
 {
 	int			rc;
 
-	/* Truncate (or extend) the file to the requested size. */
-	rc = ftruncate(fd, size);
+	/*
+	 * Truncate (or extend) the file to the requested size.  If we're
+	 * interrupted by a signal, retry; have caller handle any other error.
+	 */
+	for (;;)
+	{
+		errno = 0;
+		rc = ftruncate(fd, size);
+		if (rc == 0)
+			break;
+		if (errno != EINTR)
+			return rc;
+	}
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
-- 
2.30.2



Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund
Hi,

On 2022-07-02 09:52:33 +1200, Thomas Munro wrote:
> On Sat, Jul 2, 2022 at 9:06 AM Andres Freund  wrote:
> > On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > Chris, do you have any additional details about the machine that lead to 
> > this
> > change? OS version, whether it might have been swapping, etc?
> >
> > I wonder if what happened is that posix_fallocate() used glibc's fallback
> > implementation because the kernel was old enough to not support fallocate()
> > for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
> > ([1]). So e.g. a rhel 6 wouldn't have had that.
> 
> With a quick test program on my Linux 5.10 kernel I see that an
> SA_RESTART signal handler definitely causes posix_fallocate() to
> return EINTR (can post trivial program).
> 
> A drive-by look at the current/modern kernel source supports this:
> shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
> to be the Linux-y way to say you want EINTR or restart as
> appropriate?), and it also undoes all partial progress too (not too
> surprising), which would explain why a perfectly timed machine gun
> stream of signals from our recovery conflict system can make an
> fallocate retry loop never terminate, for large enough sizes.

Yea :(

And even if we fix recovery to not do douse other processes in signals quite
that badly, there are plenty other sources of signals that can arrive at a
steady clip. So I think we need to do something to defuse this another way.

Ideas:

1) do the fallocate in smaller chunks, thereby making it much more likely to
   complete between two signal deliveries
2) block signals while calling posix_fallocate(). That won't work for
   everything (e.g. rapid SIGSTOP/SIGCONT), but that's not something we'd send
   ourselves, so whatever.
3) 1+2
4) ?

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-01 Thread Thomas Munro
On Sat, Jul 2, 2022 at 9:06 AM Andres Freund  wrote:
> On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > > Allow DSM allocation to be interrupted.
> > >
> > > Chris Travers reported that the startup process can repeatedly try to
> > > cancel a backend that is in a posix_fallocate()/EINTR loop and cause 
> > > it
> > > to loop forever.  Teach the retry loop to give up if an interrupt is
> > > pending.  Don't actually check for interrupts in that loop though,
> > > because a non-local exit would skip some clean-up code in the caller.
> >
> > That whole approach seems quite wrong to me. At the absolute very least the
> > code needs to check if interrupts are being processed in the current context
> > before just giving up due to ProcDiePending || QueryCancelPending.
> >
> > I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), 
> > rather
> > than the startup process signalling.

I agree it's not great.  It was a back-patchable bandaid in need of a
better solution.

> Chris, do you have any additional details about the machine that lead to this
> change? OS version, whether it might have been swapping, etc?
>
> I wonder if what happened is that posix_fallocate() used glibc's fallback
> implementation because the kernel was old enough to not support fallocate()
> for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
> ([1]). So e.g. a rhel 6 wouldn't have had that.

With a quick test program on my Linux 5.10 kernel I see that an
SA_RESTART signal handler definitely causes posix_fallocate() to
return EINTR (can post trivial program).

A drive-by look at the current/modern kernel source supports this:
shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
to be the Linux-y way to say you want EINTR or restart as
appropriate?), and it also undoes all partial progress too (not too
surprising), which would explain why a perfectly timed machine gun
stream of signals from our recovery conflict system can make an
fallocate retry loop never terminate, for large enough sizes.




Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund
Hi Chris,

On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > On 2022-Jul-01, Andres Freund wrote:
> > 
> > > On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > > > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > > > having sporadic query failures, because EINTR is reported on some system
> > > > call.  I have been told that the problem persists, though it is very
> > > > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > > > different patch which also protects write(), but I think that's not
> > > > necessary.
> > > 
> > > What is the reason for the || ProcDiePending || QueryCancelPending bit? 
> > > What
> > > if there's dsm operations intentionally done while QueryCancelPending?
> > 
> > That mirrors the test for the other block in that function, which was
> > added by 63efab4ca139, whose commit message explains:
> > 
> > Allow DSM allocation to be interrupted.
> > 
> > Chris Travers reported that the startup process can repeatedly try to
> > cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
> > to loop forever.  Teach the retry loop to give up if an interrupt is
> > pending.  Don't actually check for interrupts in that loop though,
> > because a non-local exit would skip some clean-up code in the caller.
> 
> That whole approach seems quite wrong to me. At the absolute very least the
> code needs to check if interrupts are being processed in the current context
> before just giving up due to ProcDiePending || QueryCancelPending.
> 
> I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), rather
> than the startup process signalling.
> 
> There is an argument for allowing more things to be cancelled, but we'd need a
> retry loop for the !INTERRUPTS_CAN_BE_PROCESSED() case.

Chris, do you have any additional details about the machine that lead to this
change? OS version, whether it might have been swapping, etc?

I wonder if what happened is that posix_fallocate() used glibc's fallback
implementation because the kernel was old enough to not support fallocate()
for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
([1]). So e.g. a rhel 6 wouldn't have had that.

Greetings,

Andres Freund

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e2d12e22c59ce714008aa5266d769f8568d74eac




Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund
Hi,

On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> On 2022-Jul-01, Andres Freund wrote:
> 
> > On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > > having sporadic query failures, because EINTR is reported on some system
> > > call.  I have been told that the problem persists, though it is very
> > > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > > different patch which also protects write(), but I think that's not
> > > necessary.
> > 
> > What is the reason for the || ProcDiePending || QueryCancelPending bit? What
> > if there's dsm operations intentionally done while QueryCancelPending?
> 
> That mirrors the test for the other block in that function, which was
> added by 63efab4ca139, whose commit message explains:
> 
> Allow DSM allocation to be interrupted.
> 
> Chris Travers reported that the startup process can repeatedly try to
> cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
> to loop forever.  Teach the retry loop to give up if an interrupt is
> pending.  Don't actually check for interrupts in that loop though,
> because a non-local exit would skip some clean-up code in the caller.

That whole approach seems quite wrong to me. At the absolute very least the
code needs to check if interrupts are being processed in the current context
before just giving up due to ProcDiePending || QueryCancelPending.

I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), rather
than the startup process signalling.

There is an argument for allowing more things to be cancelled, but we'd need a
retry loop for the !INTERRUPTS_CAN_BE_PROCESSED() case.

Greetings,

Andres Freund




Re: EINTR in ftruncate()

2022-07-01 Thread Alvaro Herrera
On 2022-Jul-01, Andres Freund wrote:

> On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > having sporadic query failures, because EINTR is reported on some system
> > call.  I have been told that the problem persists, though it is very
> > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > different patch which also protects write(), but I think that's not
> > necessary.
> 
> What is the reason for the || ProcDiePending || QueryCancelPending bit? What
> if there's dsm operations intentionally done while QueryCancelPending?

That mirrors the test for the other block in that function, which was
added by 63efab4ca139, whose commit message explains:

Allow DSM allocation to be interrupted.

Chris Travers reported that the startup process can repeatedly try to
cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
to loop forever.  Teach the retry loop to give up if an interrupt is
pending.  Don't actually check for interrupts in that loop though,
because a non-local exit would skip some clean-up code in the caller.

Thanks for looking!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund
Hi,

On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> Nicola Contu reported two years ago to pgsql-general[1] that they were
> having sporadic query failures, because EINTR is reported on some system
> call.  I have been told that the problem persists, though it is very
> infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> different patch which also protects write(), but I think that's not
> necessary.

What is the reason for the || ProcDiePending || QueryCancelPending bit? What
if there's dsm operations intentionally done while QueryCancelPending?

Greetings,

Andres Freund




EINTR in ftruncate()

2022-07-01 Thread Alvaro Herrera
Nicola Contu reported two years ago to pgsql-general[1] that they were
having sporadic query failures, because EINTR is reported on some system
call.  I have been told that the problem persists, though it is very
infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
different patch which also protects write(), but I think that's not
necessary.

Thomas M. produced some more obscure theories for other things that
could fail, but I think we should patch this problem first, which seems
the most obvious one, and deal with others if and when they are
reported.

[1] 
https://www.postgresql.org/message-id/CAMTZZh2V%2B0wJVgSqTVvXUAVMduF57Uxubvvw58%3DkbOae%2B53%2BQQ%40mail.gmail.com

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"
>From 466ed63b9b399c2914aa44f56f56e044341a77f5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Jul 2022 17:16:33 +0200
Subject: [PATCH v2] retry ftruncate

---
 src/backend/storage/ipc/dsm_impl.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 873867e856..4f35e7e3d1 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -362,8 +362,21 @@ dsm_impl_posix_resize(int fd, off_t size)
 {
 	int			rc;
 
-	/* Truncate (or extend) the file to the requested size. */
-	rc = ftruncate(fd, size);
+	/*
+	 * Truncate (or extend) the file to the requested size.  We may get
+	 * interrupted.  If so, just retry unless there is an interrupt pending.
+	 * This avoids the possibility of looping forever if another backend is
+	 * repeatedly trying to interrupt us.
+	 */
+	for (;;)
+	{
+		errno = 0;
+		rc = ftruncate(fd, size);
+		if (rc == 0)
+			break;
+		if (errno != EINTR || ProcDiePending || QueryCancelPending)
+			return rc;
+	}
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
-- 
2.30.2