On Mon, Oct 17, 2022 at 1:09 PM Michael Paquier <mich...@paquier.xyz> wrote: > > On Fri, Oct 14, 2022 at 09:56:31PM +0000, Cary Huang wrote: > > I applied your v5 patch on the current master and run valgrind on it > > while doing a basebackup with simulated error. No memory leak > > related to backup is observed. Regression is also passing. > > Echoing with what I mentioned upthread in [1], I don't quite > understand why this patch needs to touch basebackup.c, walsender.c > and postgres.c. In the case of a replication command processed by a > WAL sender, memory allocations happen in the memory context created > for replication commands, which is itself, as far as I understand, the > message memory context when we get a 'Q' message for a simple query. > Why do we need more code for a cleanup that should be already > happening? Am I missing something obvious? > > [1]: https://www.postgresql.org/message-id/YzPKpKEk/jmjh...@paquier.xyz
My bad, I missed that. You are right. We have "Replication command context" as a child of "MessageContext" memory context for base backup that gets cleaned upon error in PostgresMain() [1]. > xlogfuncs.c, by storing stuff in the TopMemoryContext of the process > running the SQL commands pg_backup_start/stop() is different, of > course. Perhaps the point of centralizing the base backup context in > xlogbackup.c makes sense, but my guess that it makes more sense to > keep that with the SQL functions as these are the only ones in need of > a cleanup, coming down to the fact that the start and stop functions > happen in different queries, aka these are not bind to a message > context. Yes, they're not related to "MessageContext" memory context. Please see the attached v6 patch that deals with memory leaks for backup SQL-callable functions. [1] (gdb) bt #0 MemoryContextDelete (context=0x558b7cd0de50) at mcxt.c:378 #1 0x0000558b7c655733 in MemoryContextDeleteChildren (context=0x558b7ccda8c0) at mcxt.c:430 #2 0x0000558b7c65546d in MemoryContextReset (context=0x558b7ccda8c0) at mcxt.c:309 #3 0x0000558b7c43b5cd in PostgresMain (dbname=0x558b7cd11fb8 "", username=0x558b7ccd6298 "ubuntu") at postgres.c:4358 #4 0x0000558b7c364a88 in BackendRun (port=0x558b7cd09620) at postmaster.c:4482 #5 0x0000558b7c36431b in BackendStartup (port=0x558b7cd09620) at postmaster.c:4210 #6 0x0000558b7c3603be in ServerLoop () at postmaster.c:1804 #7 0x0000558b7c35fb1b in PostmasterMain (argc=3, argv=0x558b7ccd4200) at postmaster.c:1476 #8 0x0000558b7c229a0e in main (argc=3, argv=0x558b7ccd4200) at main.c:197 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 8fcf6c5c6bd60de215ad8e066cc9c759aa3a7dd5 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Wed, 19 Oct 2022 06:18:12 +0000 Subject: [PATCH v6] Avoid memory leaks during backups using SQL-callable functions Postgres currently can leak memory if a failure occurs while taking backup using SQL-callable functions pg_backup_start() and pg_backup_stop(). The palloc'd memory such as backup_state or tablespace_map in xlogfuncs.c or any other, is left-out which may cause memory bloat on the server eventually. To solve the above problem, we create a separate memory context and add clean up callback in PostgresMain()'s error handling code using sigsetjmp(). The new memory context is created as a child of TopMemoryContext in pg_backup_start() which gets deleted upon any error or at the end of pg_backup_stop(). --- src/backend/access/transam/xlogfuncs.c | 101 +++++++++++++++++++------ src/backend/tcop/postgres.c | 4 + src/include/access/xlogbackup.h | 1 + 3 files changed, 81 insertions(+), 25 deletions(-) diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index a801a94fe8..b7dd5cf8f3 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -45,6 +45,30 @@ static BackupState *backup_state = NULL; static StringInfo tablespace_map = NULL; +/* A workspace for SQL-callable backup functions. */ +static MemoryContext backupcontext = NULL; + +/* + * A session-level variable to define the scope of SQL-callable backup + * functions memory context clean up. + */ +static bool allowedToCleanBackupMemCxt = false; + +/* + * Clean up memory context created for SQL-callable backup functions. + */ +void +BackupMemoryContextCleanup(void) +{ + if (backupcontext == NULL || allowedToCleanBackupMemCxt == false) + return; /* Nothing to do. */ + + MemoryContextDelete(backupcontext); + backupcontext = NULL; + backup_state = NULL; + tablespace_map = NULL; +} + /* * pg_backup_start: set up for taking an on-line backup dump * @@ -71,34 +95,41 @@ pg_backup_start(PG_FUNCTION_ARGS) errmsg("a backup is already in progress in this session"))); /* - * backup_state and tablespace_map need to be long-lived as they are used - * in pg_backup_stop(). + * Let's clean backup memory context while an error occurs within this + * function. */ - oldcontext = MemoryContextSwitchTo(TopMemoryContext); - - /* Allocate backup state or reset it, if it comes from a previous run */ - if (backup_state == NULL) - backup_state = (BackupState *) palloc0(sizeof(BackupState)); - else - MemSet(backup_state, 0, sizeof(BackupState)); + allowedToCleanBackupMemCxt = true; /* - * tablespace_map may have been created in a previous backup, so take this - * occasion to clean it. + * backup_state and tablespace_map need to be long-lived as they are used + * in pg_backup_stop(). Create a special memory context as a direct child + * of TopMemoryContext so that the memory allocated is carried till the + * pg_backup_stop() and cleaned in case of errors (see error handling code + * using sigsetjmp() in PostgresMain()). */ - if (tablespace_map != NULL) - { - pfree(tablespace_map->data); - pfree(tablespace_map); - tablespace_map = NULL; - } + if (backupcontext == NULL) + backupcontext = AllocSetContextCreate(TopMemoryContext, + "on-line backup context", + ALLOCSET_START_SMALL_SIZES); + + oldcontext = MemoryContextSwitchTo(backupcontext); + Assert(backup_state == NULL); + Assert(tablespace_map == NULL); + backup_state = (BackupState *) palloc0(sizeof(BackupState)); tablespace_map = makeStringInfo(); - MemoryContextSwitchTo(oldcontext); register_persistent_abort_backup_handler(); do_pg_backup_start(backupidstr, fast, NULL, backup_state, tablespace_map); + MemoryContextSwitchTo(oldcontext); + + /* + * Let's not clean backup memory context while an error occurs after this + * function until we reach pg_backup_stop(). + */ + allowedToCleanBackupMemCxt = false; + PG_RETURN_LSN(backup_state->startpoint); } @@ -132,6 +163,7 @@ pg_backup_stop(PG_FUNCTION_ARGS) bool waitforarchive = PG_GETARG_BOOL(0); char *backup_label; SessionBackupState status = get_backup_status(); + MemoryContext oldcontext; /* Initialize attributes information in the tuple descriptor */ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) @@ -143,8 +175,21 @@ pg_backup_stop(PG_FUNCTION_ARGS) errmsg("backup is not in progress"), errhint("Did you call pg_backup_start()?"))); + /* + * Let's clean backup memory context while an error occurs within this + * function. + */ + allowedToCleanBackupMemCxt = true; + Assert(backup_state != NULL); Assert(tablespace_map != NULL); + Assert(backupcontext != NULL); + + /* + * We switch to backup memory context to clean up the allocated memory even + * if an error occurs. + */ + oldcontext = MemoryContextSwitchTo(backupcontext); /* Stop the backup */ do_pg_backup_stop(backup_state, waitforarchive); @@ -152,17 +197,23 @@ pg_backup_stop(PG_FUNCTION_ARGS) /* Build the contents of backup_label */ backup_label = build_backup_content(backup_state, false); + /* + * We switch back from backup memory context so that the results are + * created in right memory context. + */ + MemoryContextSwitchTo(oldcontext); + values[0] = LSNGetDatum(backup_state->stoppoint); values[1] = CStringGetTextDatum(backup_label); values[2] = CStringGetTextDatum(tablespace_map->data); - /* Deallocate backup-related variables */ - pfree(backup_state); - backup_state = NULL; - pfree(tablespace_map->data); - pfree(tablespace_map); - tablespace_map = NULL; - pfree(backup_label); + BackupMemoryContextCleanup(); + + /* + * Reset session-level variable for further backups that may happen within + * this session. + */ + allowedToCleanBackupMemCxt = false; /* Returns the record as Datum */ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index a9a1851c94..09ee71b03e 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -31,6 +31,7 @@ #include "access/parallel.h" #include "access/printtup.h" #include "access/xact.h" +#include "access/xlogbackup.h" #include "catalog/pg_type.h" #include "commands/async.h" #include "commands/prepare.h" @@ -4277,6 +4278,9 @@ PostgresMain(const char *dbname, const char *username) */ AbortCurrentTransaction(); + /* Clean up memory context created for SQL-callable backup functions. */ + BackupMemoryContextCleanup(); + if (am_walsender) WalSndErrorCleanup(); diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h index 8ec3d88b0a..1f88511d61 100644 --- a/src/include/access/xlogbackup.h +++ b/src/include/access/xlogbackup.h @@ -37,5 +37,6 @@ typedef struct BackupState extern char *build_backup_content(BackupState *state, bool ishistoryfile); +extern void BackupMemoryContextCleanup(void); #endif /* XLOG_BACKUP_H */ -- 2.34.1