On Mon, Oct 17, 2022 at 1:09 PM Michael Paquier <[email protected]> wrote:
>
> On Fri, Oct 14, 2022 at 09:56:31PM +0000, Cary Huang wrote:
> > I applied your v5 patch on the current master and run valgrind on it
> > while doing a basebackup with simulated error. No memory leak
> > related to backup is observed. Regression is also passing.
>
> Echoing with what I mentioned upthread in [1], I don't quite
> understand why this patch needs to touch basebackup.c, walsender.c
> and postgres.c. In the case of a replication command processed by a
> WAL sender, memory allocations happen in the memory context created
> for replication commands, which is itself, as far as I understand, the
> message memory context when we get a 'Q' message for a simple query.
> Why do we need more code for a cleanup that should be already
> happening? Am I missing something obvious?
>
> [1]: https://www.postgresql.org/message-id/YzPKpKEk/[email protected]
My bad, I missed that. You are right. We have "Replication command
context" as a child of "MessageContext" memory context for base backup
that gets cleaned upon error in PostgresMain() [1].
> xlogfuncs.c, by storing stuff in the TopMemoryContext of the process
> running the SQL commands pg_backup_start/stop() is different, of
> course. Perhaps the point of centralizing the base backup context in
> xlogbackup.c makes sense, but my guess that it makes more sense to
> keep that with the SQL functions as these are the only ones in need of
> a cleanup, coming down to the fact that the start and stop functions
> happen in different queries, aka these are not bind to a message
> context.
Yes, they're not related to "MessageContext" memory context.
Please see the attached v6 patch that deals with memory leaks for
backup SQL-callable functions.
[1]
(gdb) bt
#0 MemoryContextDelete (context=0x558b7cd0de50) at mcxt.c:378
#1 0x0000558b7c655733 in MemoryContextDeleteChildren
(context=0x558b7ccda8c0) at mcxt.c:430
#2 0x0000558b7c65546d in MemoryContextReset (context=0x558b7ccda8c0)
at mcxt.c:309
#3 0x0000558b7c43b5cd in PostgresMain (dbname=0x558b7cd11fb8 "",
username=0x558b7ccd6298 "ubuntu")
at postgres.c:4358
#4 0x0000558b7c364a88 in BackendRun (port=0x558b7cd09620) at postmaster.c:4482
#5 0x0000558b7c36431b in BackendStartup (port=0x558b7cd09620) at
postmaster.c:4210
#6 0x0000558b7c3603be in ServerLoop () at postmaster.c:1804
#7 0x0000558b7c35fb1b in PostmasterMain (argc=3, argv=0x558b7ccd4200)
at postmaster.c:1476
#8 0x0000558b7c229a0e in main (argc=3, argv=0x558b7ccd4200) at main.c:197
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 8fcf6c5c6bd60de215ad8e066cc9c759aa3a7dd5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 19 Oct 2022 06:18:12 +0000
Subject: [PATCH v6] Avoid memory leaks during backups using SQL-callable
functions
Postgres currently can leak memory if a failure occurs while
taking backup using SQL-callable functions pg_backup_start() and
pg_backup_stop(). The palloc'd memory such as backup_state or
tablespace_map in xlogfuncs.c or any other, is left-out which may
cause memory bloat on the server eventually.
To solve the above problem, we create a separate memory context
and add clean up callback in PostgresMain()'s error handling code
using sigsetjmp(). The new memory context is created as a child of
TopMemoryContext in pg_backup_start() which gets deleted upon any
error or at the end of pg_backup_stop().
---
src/backend/access/transam/xlogfuncs.c | 101 +++++++++++++++++++------
src/backend/tcop/postgres.c | 4 +
src/include/access/xlogbackup.h | 1 +
3 files changed, 81 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index a801a94fe8..b7dd5cf8f3 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -45,6 +45,30 @@
static BackupState *backup_state = NULL;
static StringInfo tablespace_map = NULL;
+/* A workspace for SQL-callable backup functions. */
+static MemoryContext backupcontext = NULL;
+
+/*
+ * A session-level variable to define the scope of SQL-callable backup
+ * functions memory context clean up.
+ */
+static bool allowedToCleanBackupMemCxt = false;
+
+/*
+ * Clean up memory context created for SQL-callable backup functions.
+ */
+void
+BackupMemoryContextCleanup(void)
+{
+ if (backupcontext == NULL || allowedToCleanBackupMemCxt == false)
+ return; /* Nothing to do. */
+
+ MemoryContextDelete(backupcontext);
+ backupcontext = NULL;
+ backup_state = NULL;
+ tablespace_map = NULL;
+}
+
/*
* pg_backup_start: set up for taking an on-line backup dump
*
@@ -71,34 +95,41 @@ pg_backup_start(PG_FUNCTION_ARGS)
errmsg("a backup is already in progress in this session")));
/*
- * backup_state and tablespace_map need to be long-lived as they are used
- * in pg_backup_stop().
+ * Let's clean backup memory context while an error occurs within this
+ * function.
*/
- oldcontext = MemoryContextSwitchTo(TopMemoryContext);
-
- /* Allocate backup state or reset it, if it comes from a previous run */
- if (backup_state == NULL)
- backup_state = (BackupState *) palloc0(sizeof(BackupState));
- else
- MemSet(backup_state, 0, sizeof(BackupState));
+ allowedToCleanBackupMemCxt = true;
/*
- * tablespace_map may have been created in a previous backup, so take this
- * occasion to clean it.
+ * backup_state and tablespace_map need to be long-lived as they are used
+ * in pg_backup_stop(). Create a special memory context as a direct child
+ * of TopMemoryContext so that the memory allocated is carried till the
+ * pg_backup_stop() and cleaned in case of errors (see error handling code
+ * using sigsetjmp() in PostgresMain()).
*/
- if (tablespace_map != NULL)
- {
- pfree(tablespace_map->data);
- pfree(tablespace_map);
- tablespace_map = NULL;
- }
+ if (backupcontext == NULL)
+ backupcontext = AllocSetContextCreate(TopMemoryContext,
+ "on-line backup context",
+ ALLOCSET_START_SMALL_SIZES);
+
+ oldcontext = MemoryContextSwitchTo(backupcontext);
+ Assert(backup_state == NULL);
+ Assert(tablespace_map == NULL);
+ backup_state = (BackupState *) palloc0(sizeof(BackupState));
tablespace_map = makeStringInfo();
- MemoryContextSwitchTo(oldcontext);
register_persistent_abort_backup_handler();
do_pg_backup_start(backupidstr, fast, NULL, backup_state, tablespace_map);
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Let's not clean backup memory context while an error occurs after this
+ * function until we reach pg_backup_stop().
+ */
+ allowedToCleanBackupMemCxt = false;
+
PG_RETURN_LSN(backup_state->startpoint);
}
@@ -132,6 +163,7 @@ pg_backup_stop(PG_FUNCTION_ARGS)
bool waitforarchive = PG_GETARG_BOOL(0);
char *backup_label;
SessionBackupState status = get_backup_status();
+ MemoryContext oldcontext;
/* Initialize attributes information in the tuple descriptor */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
@@ -143,8 +175,21 @@ pg_backup_stop(PG_FUNCTION_ARGS)
errmsg("backup is not in progress"),
errhint("Did you call pg_backup_start()?")));
+ /*
+ * Let's clean backup memory context while an error occurs within this
+ * function.
+ */
+ allowedToCleanBackupMemCxt = true;
+
Assert(backup_state != NULL);
Assert(tablespace_map != NULL);
+ Assert(backupcontext != NULL);
+
+ /*
+ * We switch to backup memory context to clean up the allocated memory even
+ * if an error occurs.
+ */
+ oldcontext = MemoryContextSwitchTo(backupcontext);
/* Stop the backup */
do_pg_backup_stop(backup_state, waitforarchive);
@@ -152,17 +197,23 @@ pg_backup_stop(PG_FUNCTION_ARGS)
/* Build the contents of backup_label */
backup_label = build_backup_content(backup_state, false);
+ /*
+ * We switch back from backup memory context so that the results are
+ * created in right memory context.
+ */
+ MemoryContextSwitchTo(oldcontext);
+
values[0] = LSNGetDatum(backup_state->stoppoint);
values[1] = CStringGetTextDatum(backup_label);
values[2] = CStringGetTextDatum(tablespace_map->data);
- /* Deallocate backup-related variables */
- pfree(backup_state);
- backup_state = NULL;
- pfree(tablespace_map->data);
- pfree(tablespace_map);
- tablespace_map = NULL;
- pfree(backup_label);
+ BackupMemoryContextCleanup();
+
+ /*
+ * Reset session-level variable for further backups that may happen within
+ * this session.
+ */
+ allowedToCleanBackupMemCxt = false;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..09ee71b03e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -31,6 +31,7 @@
#include "access/parallel.h"
#include "access/printtup.h"
#include "access/xact.h"
+#include "access/xlogbackup.h"
#include "catalog/pg_type.h"
#include "commands/async.h"
#include "commands/prepare.h"
@@ -4277,6 +4278,9 @@ PostgresMain(const char *dbname, const char *username)
*/
AbortCurrentTransaction();
+ /* Clean up memory context created for SQL-callable backup functions. */
+ BackupMemoryContextCleanup();
+
if (am_walsender)
WalSndErrorCleanup();
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index 8ec3d88b0a..1f88511d61 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -37,5 +37,6 @@ typedef struct BackupState
extern char *build_backup_content(BackupState *state,
bool ishistoryfile);
+extern void BackupMemoryContextCleanup(void);
#endif /* XLOG_BACKUP_H */
--
2.34.1