On Mon, Oct 17, 2022 at 1:09 PM Michael Paquier <mich...@paquier.xyz> wrote:
>
> On Fri, Oct 14, 2022 at 09:56:31PM +0000, Cary Huang wrote:
> > I applied your v5 patch on the current master and run valgrind on it
> > while doing a basebackup with simulated error. No memory leak
> > related to backup is observed. Regression is also passing.
>
> Echoing with what I mentioned upthread in [1], I don't quite
> understand why this patch needs to touch basebackup.c, walsender.c
> and postgres.c.  In the case of a replication command processed by a
> WAL sender, memory allocations happen in the memory context created
> for replication commands, which is itself, as far as I understand, the
> message memory context when we get a 'Q' message for a simple query.
> Why do we need more code for a cleanup that should be already
> happening?  Am I missing something obvious?
>
> [1]: https://www.postgresql.org/message-id/YzPKpKEk/jmjh...@paquier.xyz

My bad, I missed that. You are right. We have "Replication command
context" as a child of "MessageContext" memory context for base backup
that gets cleaned upon error in PostgresMain() [1].

> xlogfuncs.c, by storing stuff in the TopMemoryContext of the process
> running the SQL commands pg_backup_start/stop() is different, of
> course.  Perhaps the point of centralizing the base backup context in
> xlogbackup.c makes sense, but my guess that it makes more sense to
> keep that with the SQL functions as these are the only ones in need of
> a cleanup, coming down to the fact that the start and stop functions
> happen in different queries, aka these are not bind to a message
> context.

Yes, they're not related to "MessageContext" memory context.

Please see the attached v6 patch that deals with memory leaks for
backup SQL-callable functions.

[1]
(gdb) bt
#0  MemoryContextDelete (context=0x558b7cd0de50) at mcxt.c:378
#1  0x0000558b7c655733 in MemoryContextDeleteChildren
(context=0x558b7ccda8c0) at mcxt.c:430
#2  0x0000558b7c65546d in MemoryContextReset (context=0x558b7ccda8c0)
at mcxt.c:309
#3  0x0000558b7c43b5cd in PostgresMain (dbname=0x558b7cd11fb8 "",
username=0x558b7ccd6298 "ubuntu")
    at postgres.c:4358
#4  0x0000558b7c364a88 in BackendRun (port=0x558b7cd09620) at postmaster.c:4482
#5  0x0000558b7c36431b in BackendStartup (port=0x558b7cd09620) at
postmaster.c:4210
#6  0x0000558b7c3603be in ServerLoop () at postmaster.c:1804
#7  0x0000558b7c35fb1b in PostmasterMain (argc=3, argv=0x558b7ccd4200)
at postmaster.c:1476
#8  0x0000558b7c229a0e in main (argc=3, argv=0x558b7ccd4200) at main.c:197

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 8fcf6c5c6bd60de215ad8e066cc9c759aa3a7dd5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 19 Oct 2022 06:18:12 +0000
Subject: [PATCH v6] Avoid memory leaks during backups using SQL-callable
 functions

Postgres currently can leak memory if a failure occurs while
taking backup using SQL-callable functions pg_backup_start() and
pg_backup_stop(). The palloc'd memory such as backup_state or
tablespace_map in xlogfuncs.c or any other, is left-out which may
cause memory bloat on the server eventually.

To solve the above problem, we create a separate memory context
and add clean up callback in PostgresMain()'s error handling code
using sigsetjmp(). The new memory context is created as a child of
TopMemoryContext in pg_backup_start() which gets deleted upon any
error or at the end of pg_backup_stop().
---
 src/backend/access/transam/xlogfuncs.c | 101 +++++++++++++++++++------
 src/backend/tcop/postgres.c            |   4 +
 src/include/access/xlogbackup.h        |   1 +
 3 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index a801a94fe8..b7dd5cf8f3 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -45,6 +45,30 @@
 static BackupState *backup_state = NULL;
 static StringInfo tablespace_map = NULL;
 
+/* A workspace for SQL-callable backup functions. */
+static MemoryContext backupcontext = NULL;
+
+/*
+ * A session-level variable to define the scope of SQL-callable backup
+ * functions memory context clean up.
+ */
+static bool allowedToCleanBackupMemCxt = false;
+
+/*
+ * Clean up memory context created for SQL-callable backup functions.
+ */
+void
+BackupMemoryContextCleanup(void)
+{
+	if (backupcontext == NULL || allowedToCleanBackupMemCxt == false)
+		return;		/* Nothing to do. */
+
+	MemoryContextDelete(backupcontext);
+	backupcontext = NULL;
+	backup_state = NULL;
+	tablespace_map = NULL;
+}
+
 /*
  * pg_backup_start: set up for taking an on-line backup dump
  *
@@ -71,34 +95,41 @@ pg_backup_start(PG_FUNCTION_ARGS)
 				 errmsg("a backup is already in progress in this session")));
 
 	/*
-	 * backup_state and tablespace_map need to be long-lived as they are used
-	 * in pg_backup_stop().
+	 * Let's clean backup memory context while an error occurs within this
+	 * function.
 	 */
-	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
-
-	/* Allocate backup state or reset it, if it comes from a previous run */
-	if (backup_state == NULL)
-		backup_state = (BackupState *) palloc0(sizeof(BackupState));
-	else
-		MemSet(backup_state, 0, sizeof(BackupState));
+	allowedToCleanBackupMemCxt = true;
 
 	/*
-	 * tablespace_map may have been created in a previous backup, so take this
-	 * occasion to clean it.
+	 * backup_state and tablespace_map need to be long-lived as they are used
+	 * in pg_backup_stop(). Create a special memory context as a direct child
+	 * of TopMemoryContext so that the memory allocated is carried till the
+	 * pg_backup_stop() and cleaned in case of errors (see error handling code
+	 * using sigsetjmp() in PostgresMain()).
 	 */
-	if (tablespace_map != NULL)
-	{
-		pfree(tablespace_map->data);
-		pfree(tablespace_map);
-		tablespace_map = NULL;
-	}
+	if (backupcontext == NULL)
+		backupcontext = AllocSetContextCreate(TopMemoryContext,
+											  "on-line backup context",
+											  ALLOCSET_START_SMALL_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(backupcontext);
 
+	Assert(backup_state == NULL);
+	Assert(tablespace_map == NULL);
+	backup_state = (BackupState *) palloc0(sizeof(BackupState));
 	tablespace_map = makeStringInfo();
-	MemoryContextSwitchTo(oldcontext);
 
 	register_persistent_abort_backup_handler();
 	do_pg_backup_start(backupidstr, fast, NULL, backup_state, tablespace_map);
 
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Let's not clean backup memory context while an error occurs after this
+	 * function until we reach pg_backup_stop().
+	 */
+	allowedToCleanBackupMemCxt = false;
+
 	PG_RETURN_LSN(backup_state->startpoint);
 }
 
@@ -132,6 +163,7 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 	bool		waitforarchive = PG_GETARG_BOOL(0);
 	char	   *backup_label;
 	SessionBackupState status = get_backup_status();
+	MemoryContext oldcontext;
 
 	/* Initialize attributes information in the tuple descriptor */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
@@ -143,8 +175,21 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 				 errmsg("backup is not in progress"),
 				 errhint("Did you call pg_backup_start()?")));
 
+	/*
+	 * Let's clean backup memory context while an error occurs within this
+	 * function.
+	 */
+	allowedToCleanBackupMemCxt = true;
+
 	Assert(backup_state != NULL);
 	Assert(tablespace_map != NULL);
+	Assert(backupcontext != NULL);
+
+	/*
+	 * We switch to backup memory context to clean up the allocated memory even
+	 * if an error occurs.
+	 */
+	oldcontext = MemoryContextSwitchTo(backupcontext);
 
 	/* Stop the backup */
 	do_pg_backup_stop(backup_state, waitforarchive);
@@ -152,17 +197,23 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 	/* Build the contents of backup_label */
 	backup_label = build_backup_content(backup_state, false);
 
+	/*
+	 * We switch back from backup memory context so that the results are
+	 * created in right memory context.
+	 */
+	MemoryContextSwitchTo(oldcontext);
+
 	values[0] = LSNGetDatum(backup_state->stoppoint);
 	values[1] = CStringGetTextDatum(backup_label);
 	values[2] = CStringGetTextDatum(tablespace_map->data);
 
-	/* Deallocate backup-related variables */
-	pfree(backup_state);
-	backup_state = NULL;
-	pfree(tablespace_map->data);
-	pfree(tablespace_map);
-	tablespace_map = NULL;
-	pfree(backup_label);
+	BackupMemoryContextCleanup();
+
+	/*
+	 * Reset session-level variable for further backups that may happen within
+	 * this session.
+	 */
+	allowedToCleanBackupMemCxt = false;
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..09ee71b03e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -31,6 +31,7 @@
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
+#include "access/xlogbackup.h"
 #include "catalog/pg_type.h"
 #include "commands/async.h"
 #include "commands/prepare.h"
@@ -4277,6 +4278,9 @@ PostgresMain(const char *dbname, const char *username)
 		 */
 		AbortCurrentTransaction();
 
+		/* Clean up memory context created for SQL-callable backup functions. */
+		BackupMemoryContextCleanup();
+
 		if (am_walsender)
 			WalSndErrorCleanup();
 
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index 8ec3d88b0a..1f88511d61 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -37,5 +37,6 @@ typedef struct BackupState
 
 extern char *build_backup_content(BackupState *state,
 								  bool ishistoryfile);
+extern void BackupMemoryContextCleanup(void);
 
 #endif							/* XLOG_BACKUP_H */
-- 
2.34.1

Reply via email to