Re: md.c vs elog.c vs smgrreleaseall() in barrier

Andres Freund Mon, 17 Mar 2025 17:02:19 -0700

Hi,

On 2025-03-14 12:57:51 +1300, Thomas Munro wrote:
> On Fri, Mar 14, 2025 at 12:23 PM Andres Freund <and...@anarazel.de> wrote:
> 
> > > 3.  Considering errfinish()'s stated goal, a sort of backstop to help
> > > you cancel looping message-spewing code only, I wonder if we should
> > > just restrict the kinds of interrupts it processes, so that it only
> > > calls CFI() if we're definitely throwing ERROR or FATAL?
> >
> > I'm not even sure it'd be safe to ERROR out in all the relevant places...
> >
> > E.g.
> >                 /* implementation-specific initialization */
> >                 smgrsw[reln->smgr_which].smgr_open(reln);
> >
> >                 /* it is not pinned yet */
> >                 reln->pincount = 0;
> >                 dlist_push_tail(&unpinned_relns, &reln->node);
> >
> > doesn't this mean that ->pincount is uninitialized in case smgr_open() 
> > errors
> > out and that we'd loose track of the reln because it wasn't yet added to
> > unpinned_rels?
> 
> Ugh, right.


Patch for that attached.


> > > > If we go for that, I can see an argument for doing that in smgr.c 
> > > > instead of
> > > > md.c, afaict any plausible smgr implementation would run into this, as 
> > > > the
> > > > smgrcloseall() is triggered from the smgr level.
> > >
> > > Seems like maybe not a great idea to assume that future smgrs will be
> > > OK with being non-interruptible, if it can be avoided?
> >
> > I think you'd need a fairly large surgery of smgr.c to make that viable - I
> > rather doubt that smgr.c itself is actually safe against interrupts.
> >
> > I can see some arguable uses of smgr handling interrupts, say an smgr
> > implementation based on a network backed store, but you'd need rather 
> > massive
> > changes to actually be sure that smgr.c is called while accepting 
> > interrupts -
> > e.g. today sgmrwritev() will largely be called with interrupts held. Plenty
> > reads too.  I doubt that undoing a few HOLD_INTERRUPTS() is a meaningful 
> > part
> > of the necessary work.
> 
> Right, exactly the case I was thinking of: if some hypothetical future
> network smgr wants to be able to process *some* kinds of things
> carefully while waiting for the network.  I don't think we want to
> constrain ourselves to NFS-style "pretend it's local and make it
> non-interruptible" without any escape hatches, but you're quite right
> that that's probably a whole research project of its own and we just
> haven't refined the interrupt system enough for that yet, so yeah I
> see how you arrived at that conclusion and it makes sense.

Here's a proposed patch for this. It turns out that the bug might already be
reachable, even without defining FDDEBUG. There's a debug ereport() in
register_dirty_segment() - but it's hard to reach in practice.

I don't really know whether that means we ought to backpatch or not - which
makes me want to be conservative and not backpatch.

Greetings,

Andres Freund

>From 79c33df9e53038b7ebc9e1e6d5e575f6ef46e4f2 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 17 Mar 2025 19:41:54 -0400
Subject: [PATCH v2 1/2] smgr: Hold interrupts in most smgr functions

We need to hold interrupts across most of the smgr.c/md.c functions, as
otherwise interrupt processing, e.g. due to a debug elog/ereport, can trigger
procsignal processing, which in turn can trigger smgrreleaseall(). As the
relevant code is not reentrant, we quickly end up in a bad situation.

The only reason we haven't noticed this before is that there is only one
non-error ereport called in affected routines, in register_dirty_segments(),
and that one is extremely rarely reached. If one enables fd.c's FDDEBUG it's
easy to reproduce crashes.

It seems better to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() in smgr.c,
instead of trying to push them down to md.c where possible: For one, every
smgr implementation would be vulnerable, for another, a good bit of smgr.c
code itself is affected too.

Eventually we might want a more targeted solution, allowing e.g. a networked
smgr implementation to be interrupted, but many other, more complicated,
problems would need to be fixed for that to be viable (e.g. smgr.c is often
called with interrupts already held).

Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
---
 src/backend/storage/smgr/smgr.c | 97 ++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index ebe35c04de5..53a09fe4aaa 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -40,6 +40,18 @@
  * themselves, as there could pointers to them in active use.  See
  * smgrrelease() and smgrreleaseall().
  *
+ * NB: We need to hold interrupts across most of the functions in this file,
+ * as otherwise interrupt processing, e.g. due to a debug elog/ereport, can
+ * trigger procsignal processing, which in turn can trigger
+ * smgrreleaseall(). None of the relevant code is reentrant.  It seems better
+ * to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() here, instead of trying to
+ * push them down to md.c where possible: For one, every smgr implementation
+ * would be vulnerable, for another, a good bit of smgr.c code itself is
+ * affected too.  Eventually we might want a more targeted solution, allowing
+ * e.g. a networked smgr implementation to be interrupted, but many other,
+ * more complicated, problems would need to be fixed for that to be viable
+ * (e.g. smgr.c is often called with interrupts already held).
+ *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -53,6 +65,7 @@
 
 #include "access/xlogutils.h"
 #include "lib/ilist.h"
+#include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/ipc.h"
 #include "storage/md.h"
@@ -158,12 +171,16 @@ smgrinit(void)
 {
 	int			i;
 
+	HOLD_INTERRUPTS();
+
 	for (i = 0; i < NSmgr; i++)
 	{
 		if (smgrsw[i].smgr_init)
 			smgrsw[i].smgr_init();
 	}
 
+	RESUME_INTERRUPTS();
+
 	/* register the shutdown proc */
 	on_proc_exit(smgrshutdown, 0);
 }
@@ -176,11 +193,13 @@ smgrshutdown(int code, Datum arg)
 {
 	int			i;
 
+	HOLD_INTERRUPTS();
 	for (i = 0; i < NSmgr; i++)
 	{
 		if (smgrsw[i].smgr_shutdown)
 			smgrsw[i].smgr_shutdown();
 	}
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -206,6 +225,8 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
 
 	Assert(RelFileNumberIsValid(rlocator.relNumber));
 
+	HOLD_INTERRUPTS();
+
 	if (SMgrRelationHash == NULL)
 	{
 		/* First time through: initialize the hash table */
@@ -242,6 +263,8 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
 		dlist_push_tail(&unpinned_relns, &reln->node);
 	}
 
+	RESUME_INTERRUPTS();
+
 	return reln;
 }
 
@@ -283,6 +306,8 @@ smgrdestroy(SMgrRelation reln)
 
 	Assert(reln->pincount == 0);
 
+	HOLD_INTERRUPTS();
+
 	for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
 		smgrsw[reln->smgr_which].smgr_close(reln, forknum);
 
@@ -292,6 +317,8 @@ smgrdestroy(SMgrRelation reln)
 					&(reln->smgr_rlocator),
 					HASH_REMOVE, NULL) == NULL)
 		elog(ERROR, "SMgrRelation hashtable corrupted");
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -302,12 +329,16 @@ smgrdestroy(SMgrRelation reln)
 void
 smgrrelease(SMgrRelation reln)
 {
+	HOLD_INTERRUPTS();
+
 	for (ForkNumber forknum = 0; forknum <= MAX_FORKNUM; forknum++)
 	{
 		smgrsw[reln->smgr_which].smgr_close(reln, forknum);
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
 	}
 	reln->smgr_targblock = InvalidBlockNumber;
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -336,6 +367,8 @@ smgrdestroyall(void)
 {
 	dlist_mutable_iter iter;
 
+	HOLD_INTERRUPTS();
+
 	/*
 	 * Zap all unpinned SMgrRelations.  We rely on smgrdestroy() to remove
 	 * each one from the list.
@@ -347,6 +380,8 @@ smgrdestroyall(void)
 
 		smgrdestroy(rel);
 	}
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -362,12 +397,16 @@ smgrreleaseall(void)
 	if (SMgrRelationHash == NULL)
 		return;
 
+	HOLD_INTERRUPTS();
+
 	hash_seq_init(&status, SMgrRelationHash);
 
 	while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
 	{
 		smgrrelease(reln);
 	}
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -400,7 +439,13 @@ smgrreleaserellocator(RelFileLocatorBackend rlocator)
 bool
 smgrexists(SMgrRelation reln, ForkNumber forknum)
 {
-	return smgrsw[reln->smgr_which].smgr_exists(reln, forknum);
+	bool		ret;
+
+	HOLD_INTERRUPTS();
+	ret = smgrsw[reln->smgr_which].smgr_exists(reln, forknum);
+	RESUME_INTERRUPTS();
+
+	return ret;
 }
 
 /*
@@ -413,7 +458,9 @@ smgrexists(SMgrRelation reln, ForkNumber forknum)
 void
 smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_create(reln, forknum, isRedo);
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
 	if (nrels == 0)
 		return;
 
+	HOLD_INTERRUPTS();
+
 	FlushRelationsAllBuffers(rels, nrels);
 
 	/*
@@ -449,6 +498,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
 				smgrsw[which].smgr_immedsync(rels[i], forknum);
 		}
 	}
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 	if (nrels == 0)
 		return;
 
+	HOLD_INTERRUPTS();
+
 	/*
 	 * Get rid of any remaining buffers for the relations.  bufmgr will just
 	 * drop them without bothering to write the contents.
@@ -522,6 +575,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 	}
 
 	pfree(rlocators);
+
+	RESUME_INTERRUPTS();
 }
 
 
@@ -538,6 +593,8 @@ void
 smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		   const void *buffer, bool skipFsync)
 {
+	HOLD_INTERRUPTS();
+
 	smgrsw[reln->smgr_which].smgr_extend(reln, forknum, blocknum,
 										 buffer, skipFsync);
 
@@ -550,6 +607,8 @@ smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		reln->smgr_cached_nblocks[forknum] = blocknum + 1;
 	else
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -563,6 +622,8 @@ void
 smgrzeroextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 			   int nblocks, bool skipFsync)
 {
+	HOLD_INTERRUPTS();
+
 	smgrsw[reln->smgr_which].smgr_zeroextend(reln, forknum, blocknum,
 											 nblocks, skipFsync);
 
@@ -575,6 +636,8 @@ smgrzeroextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		reln->smgr_cached_nblocks[forknum] = blocknum + nblocks;
 	else
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -588,7 +651,13 @@ bool
 smgrprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 			 int nblocks)
 {
-	return smgrsw[reln->smgr_which].smgr_prefetch(reln, forknum, blocknum, nblocks);
+	bool		ret;
+
+	HOLD_INTERRUPTS();
+	ret = smgrsw[reln->smgr_which].smgr_prefetch(reln, forknum, blocknum, nblocks);
+	RESUME_INTERRUPTS();
+
+	return ret;
 }
 
 /*
@@ -601,7 +670,13 @@ uint32
 smgrmaxcombine(SMgrRelation reln, ForkNumber forknum,
 			   BlockNumber blocknum)
 {
-	return smgrsw[reln->smgr_which].smgr_maxcombine(reln, forknum, blocknum);
+	uint32		ret;
+
+	HOLD_INTERRUPTS();
+	ret = smgrsw[reln->smgr_which].smgr_maxcombine(reln, forknum, blocknum);
+	RESUME_INTERRUPTS();
+
+	return ret;
 }
 
 /*
@@ -619,8 +694,10 @@ void
 smgrreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		  void **buffers, BlockNumber nblocks)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_readv(reln, forknum, blocknum, buffers,
 										nblocks);
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -653,8 +730,10 @@ void
 smgrwritev(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		   const void **buffers, BlockNumber nblocks, bool skipFsync)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_writev(reln, forknum, blocknum,
 										 buffers, nblocks, skipFsync);
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -665,8 +744,10 @@ void
 smgrwriteback(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 			  BlockNumber nblocks)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_writeback(reln, forknum, blocknum,
 											nblocks);
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -683,10 +764,14 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
 	if (result != InvalidBlockNumber)
 		return result;
 
+	HOLD_INTERRUPTS();
+
 	result = smgrsw[reln->smgr_which].smgr_nblocks(reln, forknum);
 
 	reln->smgr_cached_nblocks[forknum] = result;
 
+	RESUME_INTERRUPTS();
+
 	return result;
 }
 
@@ -731,6 +816,8 @@ smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks,
 {
 	int			i;
 
+	Assert(!INTERRUPTS_CAN_BE_PROCESSED());
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
@@ -784,7 +871,9 @@ smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks,
 void
 smgrregistersync(SMgrRelation reln, ForkNumber forknum)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_registersync(reln, forknum);
+	RESUME_INTERRUPTS();
 }
 
 /*
@@ -816,7 +905,9 @@ smgrregistersync(SMgrRelation reln, ForkNumber forknum)
 void
 smgrimmedsync(SMgrRelation reln, ForkNumber forknum)
 {
+	HOLD_INTERRUPTS();
 	smgrsw[reln->smgr_which].smgr_immedsync(reln, forknum);
+	RESUME_INTERRUPTS();
 }
 
 /*
-- 
2.48.1.76.g4e746b1a31.dirty

>From ae6b57a919e92af228748a3c3a60d00b455ccf12 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 17 Mar 2025 19:46:01 -0400
Subject: [PATCH v2 2/2] smgr: Make SMgrRelation initialization safer against
 errors

In case the smgr_open callback failed, the ->pincount field would not be
initialized and the relation would not be put onto the unpinned_relns list.

This buglet was introduced in 21d9c3ee4ef7. As that commit is only in HEAD, no
need to backpatch.

Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
---
 src/backend/storage/smgr/smgr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 53a09fe4aaa..24971304b85 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -255,12 +255,12 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
 			reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
 		reln->smgr_which = 0;	/* we only have md.c at present */
 
-		/* implementation-specific initialization */
-		smgrsw[reln->smgr_which].smgr_open(reln);
-
 		/* it is not pinned yet */
 		reln->pincount = 0;
 		dlist_push_tail(&unpinned_relns, &reln->node);
+
+		/* implementation-specific initialization */
+		smgrsw[reln->smgr_which].smgr_open(reln);
 	}
 
 	RESUME_INTERRUPTS();
-- 
2.48.1.76.g4e746b1a31.dirty

Re: md.c vs elog.c vs smgrreleaseall() in barrier

Reply via email to