On Tue, Jul 21, 2020 at 1:17 AM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Tue, Jul 7, 2020 at 12:55 PM Andres Freund <and...@anarazel.de> wrote:
> > What are you proposing? For now we could easily enough work around this
> > by just making it a on_proc_exit() callback, but that doesn't really
> > change the fundamental issue imo.
>
> I think it would be more correct for it to be an on_proc_exit()
> callback, because before_shmem_exit() callbacks can and do perform
> actions which rely on an awful lot of the system being still in a
> working state. RemoveTempRelationsCallback() is a good example: it
> thinks it can start and end transactions and make a bunch of catalog
> changes. I don't know that any of that could use JIT, but moving the
> JIT cleanup to the on_shmem_exit() stage seems better. At that point,
> there shouldn't be anybody doing anything that relies on being able to
> perform logical changes to the database; we're just shutting down
> low-level subsystems at that point, and thus presumably not doing
> anything that could possibly need JIT.
>

I looked at what actually llvm_shutdown() does? It frees up JIT stacks,
also if exists perf related resource, using LLVMOrcDisposeInstance() and
LLVMOrcUnregisterPerf(), that were dynamically allocated in
llvm_session_initialize through a JIT library function
LLVMOrcCreateInstance() [1].

It looks like there is no problem in moving llvm_shutdown to either
on_shmem_exit() or on_proc_exit().

[1] - https://llvm.org/doxygen/OrcCBindings_8cpp_source.html

>
> But I also agree that what pg_start_backup() was doing before v13 was
> wrong; that's why I committed
> 303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
> back-patch it is because the consequences are so minor I didn't think
> it was worth worrying about. We could, though. I'd be somewhat
> inclined to both do that and also change LLVM to use on_proc_exit() in
> master, but I don't feel super-strongly about it.
>

Patch: v1-0001-Move-llvm_shutdown-to-on_proc_exit-list-from-befo.patch
Moved llvm_shutdown to on_proc_exit() call back list. Request to consider
this change for master, if possible <=13 versions. Basic JIT use cases and
regression tests are working fine with the patch.

Patches: PG11-0001-Fix-minor-problems-with-non-exclusive-backup-clea.patch
and PG12-0001-Fix-minor-problems-with-non-exclusive-backup-cleanup.patch
Request to consider the commit
303640199d0436c5e7acdf50b837a027b5726594(above two patches are for this
commit) to versions < 13, to fix the abort issue. Please note that the
above two patches have no difference in the code, just I made it applicable
on PG11.

Patch: v1-0001-Modify-cancel_before_shmem_exit-comments.patch
This patch, modifies cancel_before_shmem_exit() function comment to reflect
the safe usage of before_shmem_exit_list callback mechanism and also
removes the point "For simplicity, only the latest entry can be
removed*********" as this gives a meaning that there is still scope for
improvement in cancel_before_shmem_exit() search mechanism.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 5fae4b3a046789fb7b54e689c979b01cb322aaf9 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Thu, 23 Jul 2020 17:56:21 +0530
Subject: [PATCH v1] Move llvm_shutdown to on_proc_exit list from
 before_shmem_exit.

llvm_shutdown frees up dynamically allocated memory for jit
compilers in a session/backend. Having this as on_proc_exit doesn't
cause any harm.
---
 src/backend/jit/llvm/llvmjit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index af8b34aaaf..43bed78a52 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -683,7 +683,7 @@ llvm_session_initialize(void)
 	}
 #endif
 
-	before_shmem_exit(llvm_shutdown, 0);
+	on_proc_exit(llvm_shutdown, 0);
 
 	llvm_session_initialized = true;
 
-- 
2.25.1

From 3b0a048afd7ad6d8564a54d13397c72f6eadc5da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 24 Jul 2020 08:55:52 +0530
Subject: [PATCH v5] Fix minor problems with non-exclusive backup cleanup.

The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.
---
 src/backend/access/transam/xlog.c      | 32 +++++++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c | 17 ++------------
 src/backend/replication/basebackup.c   | 16 ++-----------
 src/include/access/xlog.h              |  3 ++-
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28292393d1..aad37d9509 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11329,23 +11329,30 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
  * system out of backup mode, thus making it a lot more safe to call from
  * an error handler.
  *
+ * The caller can pass 'arg' as 'true' or 'false' to control whether a warning
+ * is emitted.
+ *
  * NB: This is only for aborting a non-exclusive backup that doesn't write
  * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().
+ *
+ * NB: This gets used as a before_shmem_exit handler, hence the odd-looking
+ * signature.
  */
 void
-do_pg_abort_backup(void)
+do_pg_abort_backup(int code, Datum arg)
 {
+	bool	emit_warning = DatumGetBool(arg);
+
 	/*
 	 * Quick exit if session is not keeping around a non-exclusive backup
 	 * already started.
 	 */
-	if (sessionBackupState == SESSION_BACKUP_NONE)
+	if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
 		return;
 
 	WALInsertLockAcquireExclusive();
 	Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-	Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
 	XLogCtl->Insert.nonExclusiveBackups--;
 
 	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
@@ -11354,6 +11361,25 @@ do_pg_abort_backup(void)
 		XLogCtl->Insert.forcePageWrites = false;
 	}
 	WALInsertLockRelease();
+
+	if (emit_warning)
+		ereport(WARNING,
+				(errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+}
+
+/*
+ * Register a handler that will warn about unterminated backups at end of
+ * session, unless this has already been done.
+ */
+void
+register_persistent_abort_backup_handler(void)
+{
+	static bool already_done = false;
+
+	if (already_done)
+		return;
+	before_shmem_exit(do_pg_abort_backup, DatumGetBool(true));
+	already_done = true;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 9731742978..cc5e97c38b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -42,18 +42,6 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
-/*
- * Called when the backend exits with a running non-exclusive base backup,
- * to clean up state.
- */
-static void
-nonexclusive_base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-	ereport(WARNING,
-			(errmsg("aborting backup due to backend exiting before pg_stop_backup was called")));
-}
-
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
@@ -101,10 +89,10 @@ pg_start_backup(PG_FUNCTION_ARGS)
 		tblspc_map_file = makeStringInfo();
 		MemoryContextSwitchTo(oldcontext);
 
+		register_persistent_abort_backup_handler();
+
 		startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
 										NULL, tblspc_map_file, false, true);
-
-		before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 	}
 
 	PG_RETURN_LSN(startpoint);
@@ -246,7 +234,6 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 		 * and tablespace map so they can be written to disk by the caller.
 		 */
 		stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-		cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 
 		values[1] = CStringGetTextDatum(label_file->data);
 		values[2] = CStringGetTextDatum(tblspc_map_file->data);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 2eba0dc21a..3e53b3df6f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -66,7 +66,6 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
 			 bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
-static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
@@ -231,17 +230,6 @@ static const struct exclude_list_item noChecksumFiles[] = {
 	{NULL, false}
 };
 
-
-/*
- * Called when ERROR or FATAL happens in perform_base_backup() after
- * we have started the backup - make sure we end it!
- */
-static void
-base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-}
-
 /*
  * Actually do a base backup for the specified tablespaces.
  *
@@ -280,7 +268,7 @@ perform_base_backup(basebackup_options *opt)
 	 * do_pg_stop_backup() should be inside the error cleanup block!
 	 */
 
-	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -389,7 +377,7 @@ perform_base_backup(basebackup_options *opt)
 
 		endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 
 
 	if (opt->includewal)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1d9c87cb82..e422a9dc8b 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -326,7 +326,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
 				   bool needtblspcmapfile);
 extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
 				  TimeLineID *stoptli_p);
-extern void do_pg_abort_backup(void);
+extern void do_pg_abort_backup(int code, Datum arg);
+extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
-- 
2.25.1

From 303640199d0436c5e7acdf50b837a027b5726594 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 19 Dec 2019 09:06:54 -0500
Subject: [PATCH] Fix minor problems with non-exclusive backup cleanup.

The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.

Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who
preferred a different approach, but got outvoted), Fujii Masao,
and Tom Lane, and with comments by various others.

Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com
---
 src/backend/access/transam/xlog.c      | 32 +++++++++++++++++++++++---
 src/backend/access/transam/xlogfuncs.c | 17 ++------------
 src/backend/replication/basebackup.c   | 16 ++-----------
 src/include/access/xlog.h              |  3 ++-
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71b8389ba1..edee0c0f22 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11133,23 +11133,30 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
  * system out of backup mode, thus making it a lot more safe to call from
  * an error handler.
  *
+ * The caller can pass 'arg' as 'true' or 'false' to control whether a warning
+ * is emitted.
+ *
  * NB: This is only for aborting a non-exclusive backup that doesn't write
  * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().
+ *
+ * NB: This gets used as a before_shmem_exit handler, hence the odd-looking
+ * signature.
  */
 void
-do_pg_abort_backup(void)
+do_pg_abort_backup(int code, Datum arg)
 {
+	bool	emit_warning = DatumGetBool(arg);
+
 	/*
 	 * Quick exit if session is not keeping around a non-exclusive backup
 	 * already started.
 	 */
-	if (sessionBackupState == SESSION_BACKUP_NONE)
+	if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
 		return;
 
 	WALInsertLockAcquireExclusive();
 	Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-	Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
 	XLogCtl->Insert.nonExclusiveBackups--;
 
 	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
@@ -11158,6 +11165,25 @@ do_pg_abort_backup(void)
 		XLogCtl->Insert.forcePageWrites = false;
 	}
 	WALInsertLockRelease();
+
+	if (emit_warning)
+		ereport(WARNING,
+				(errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+}
+
+/*
+ * Register a handler that will warn about unterminated backups at end of
+ * session, unless this has already been done.
+ */
+void
+register_persistent_abort_backup_handler(void)
+{
+	static bool already_done = false;
+
+	if (already_done)
+		return;
+	before_shmem_exit(do_pg_abort_backup, DatumGetBool(true));
+	already_done = true;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 1fccf29a36..5fd12152b2 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -44,18 +44,6 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
-/*
- * Called when the backend exits with a running non-exclusive base backup,
- * to clean up state.
- */
-static void
-nonexclusive_base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-	ereport(WARNING,
-			(errmsg("aborting backup due to backend exiting before pg_stop_backup was called")));
-}
-
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
@@ -103,10 +91,10 @@ pg_start_backup(PG_FUNCTION_ARGS)
 		tblspc_map_file = makeStringInfo();
 		MemoryContextSwitchTo(oldcontext);
 
+		register_persistent_abort_backup_handler();
+
 		startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
 										NULL, tblspc_map_file, false, true);
-
-		before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 	}
 
 	PG_RETURN_LSN(startpoint);
@@ -248,7 +236,6 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 		 * and tablespace map so they can be written to disk by the caller.
 		 */
 		stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-		cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
 
 		values[1] = CStringGetTextDatum(label_file->data);
 		values[2] = CStringGetTextDatum(tblspc_map_file->data);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..1bb72a0a57 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -65,7 +65,6 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
 						  bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
-static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
@@ -216,17 +215,6 @@ static const char *const noChecksumFiles[] = {
 	NULL,
 };
 
-
-/*
- * Called when ERROR or FATAL happens in perform_base_backup() after
- * we have started the backup - make sure we end it!
- */
-static void
-base_backup_cleanup(int code, Datum arg)
-{
-	do_pg_abort_backup();
-}
-
 /*
  * Actually do a base backup for the specified tablespaces.
  *
@@ -265,7 +253,7 @@ perform_base_backup(basebackup_options *opt)
 	 * do_pg_stop_backup() should be inside the error cleanup block!
 	 */
 
-	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 	{
 		ListCell   *lc;
 		tablespaceinfo *ti;
@@ -374,7 +362,7 @@ perform_base_backup(basebackup_options *opt)
 
 		endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
 
 
 	if (opt->includewal)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9b588c87a5..3fea1993bc 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -349,7 +349,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
 									 bool needtblspcmapfile);
 extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
 									TimeLineID *stoptli_p);
-extern void do_pg_abort_backup(void);
+extern void do_pg_abort_backup(int code, Datum arg);
+extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
-- 
2.25.1

From b1c8e83b5c81102070a66d7761ee2aa8894d6ec2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 24 Jul 2020 09:35:54 +0530
Subject: [PATCH v1] Modify cancel_before_shmem_exit() comments

Currently, there is no mention of safe usage of cancel_before_shmem_exit()
in the function comment and also it has a point that, there is still
scope for improving the way cancel_before_shmem_exit() looks for callback
to removed from before_shmem_exit_list. So, this patch modifies the comment
to reflect how the caller needs to use before_shmem_exit_list mechanism.
---
 src/backend/storage/ipc/ipc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index bdbc2c3ac4..4ade61fe38 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -381,9 +381,9 @@ on_shmem_exit(pg_on_exit_callback function, Datum arg)
  *		cancel_before_shmem_exit
  *
  *		this function removes a previously-registered before_shmem_exit
- *		callback.  For simplicity, only the latest entry can be
- *		removed.  (We could work harder but there is no need for
- *		current uses.)
+ *		callback.  We only look at the latest entry for removal, as we
+ * 		expect the caller to use before_shmem_exit callback mechanism
+ * 		in the LIFO order. Un-ordered removal of callbacks may be risky.
  * ----------------------------------------------------------------
  */
 void
-- 
2.25.1

Reply via email to