On Tue, Feb 14, 2023 at 4:07 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Tue, Feb 14, 2023 at 6:25 AM Andres Freund <and...@anarazel.de> wrote: > > > > Hi, > > > > On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote: > > > More concretely, it looks like GetWALRecordInfo() calls > > > CStringGetTextDatum/cstring_to_text in a way that accumulates way too > > > much memory in ExprContext. > > > > Additionally, we leak two stringinfos for each record. > > > > > > > This could be avoided by using a separate memory context that is reset > > > periodically, or something else along the same lines. > > > > Everything other than a per-row memory context that's reset each time seems > > hard to manage in this case. > > > > Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily > > dilligent about cleaning up O(1) memory, after not caring about O(N) > > memory... > > Thanks for reporting. I'll get back to you on this soon.
The memory usage goes up with many WAL records in GetWALRecordsInfo(). The affected functions are pg_get_wal_records_info() and pg_get_wal_records_info_till_end_of_wal(). I think the best way to fix this is to use a temporary memory context (like the jsonfuncs.c), reset it after every tuple is put into the tuple store. This fix keeps the memory under limits. I'm attaching the patches here. For HEAD, I'd want to be a bit defensive and use the temporary memory context for pg_get_wal_fpi_info() too. And, the fix also needs to be back-patched to PG15. [1] HEAD: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1105979 ubuntu 20 0 28.5g 28.4g 150492 R 80.7 93.0 1:47.12 postgres PATCHED: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13149 ubuntu 20 0 173244 156872 150688 R 79.0 0.5 1:25.09 postgres postgres=# select count(*) from pg_get_wal_records_info_till_end_of_wal('0/1000000'); count ---------- 35285649 (1 row) postgres=# select pg_backend_pid(); pg_backend_pid ---------------- 13149 (1 row) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 56ce258a6c4b4118a4cbe6612fbafb6cb172f7e3 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Wed, 15 Feb 2023 06:26:30 +0000 Subject: [PATCH v1] Limit memory usage of pg_walinspect functions Some of the pg_walinspect functions can overuse and leak memory across WAL records iterations and can cause OOM if there are many WAL records to output are present. Fix this by using a temporary memory context that's reset for each WAL record iteraion. Reported-by: Peter Geoghegan Author: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/CAH2-WznLEJjn7ghmKOABOEZYuJvkTk%3DGKU3m0%2B-XBAH%2BerPiJQ%40mail.gmail.com Backpatch-through: 15 --- contrib/pg_walinspect/pg_walinspect.c | 28 +++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c index 91b740ed27..9d429c6d73 100644 --- a/contrib/pg_walinspect/pg_walinspect.c +++ b/contrib/pg_walinspect/pg_walinspect.c @@ -304,6 +304,8 @@ pg_get_wal_fpi_info(PG_FUNCTION_ARGS) XLogRecPtr start_lsn; XLogRecPtr end_lsn; XLogReaderState *xlogreader; + MemoryContext old_cxt; + MemoryContext tmp_cxt; start_lsn = PG_GETARG_LSN(0); end_lsn = PG_GETARG_LSN(1); @@ -314,14 +316,26 @@ pg_get_wal_fpi_info(PG_FUNCTION_ARGS) xlogreader = InitXLogReaderState(start_lsn); + tmp_cxt = AllocSetContextCreate(CurrentMemoryContext, + "pg_get_wal_fpi_info temporary cxt", + ALLOCSET_DEFAULT_SIZES); + while (ReadNextXLogRecord(xlogreader) && xlogreader->EndRecPtr <= end_lsn) { + /* Use the tmp context so we can clean up after each tuple is done */ + old_cxt = MemoryContextSwitchTo(tmp_cxt); + GetWALFPIInfo(fcinfo, xlogreader); + /* clean up and switch back */ + MemoryContextSwitchTo(old_cxt); + MemoryContextReset(tmp_cxt); + CHECK_FOR_INTERRUPTS(); } + MemoryContextDelete(tmp_cxt); pfree(xlogreader->private_data); XLogReaderFree(xlogreader); @@ -440,23 +454,37 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; Datum values[PG_GET_WAL_RECORDS_INFO_COLS] = {0}; bool nulls[PG_GET_WAL_RECORDS_INFO_COLS] = {0}; + MemoryContext old_cxt; + MemoryContext tmp_cxt; InitMaterializedSRF(fcinfo, 0); xlogreader = InitXLogReaderState(start_lsn); + tmp_cxt = AllocSetContextCreate(CurrentMemoryContext, + "GetWALRecordsInfo temporary cxt", + ALLOCSET_DEFAULT_SIZES); + while (ReadNextXLogRecord(xlogreader) && xlogreader->EndRecPtr <= end_lsn) { + /* Use the tmp context so we can clean up after each tuple is done */ + old_cxt = MemoryContextSwitchTo(tmp_cxt); + GetWALRecordInfo(xlogreader, values, nulls, PG_GET_WAL_RECORDS_INFO_COLS); tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + /* clean up and switch back */ + MemoryContextSwitchTo(old_cxt); + MemoryContextReset(tmp_cxt); + CHECK_FOR_INTERRUPTS(); } + MemoryContextDelete(tmp_cxt); pfree(xlogreader->private_data); XLogReaderFree(xlogreader); -- 2.34.1
From 46e407f22b237265040e9decebd18896e31e496a Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Wed, 15 Feb 2023 06:30:29 +0000 Subject: [PATCH v1] PG15 Limit memory usage of pg_walinspect functions Some of the pg_walinspect functions can overuse and leak memory across WAL records iterations and can cause OOM if there are many WAL records to output are present. Fix this by using a temporary memory context that's reset for each WAL record iteraion. Reported-by: Peter Geoghegan Author: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/CAH2-WznLEJjn7ghmKOABOEZYuJvkTk%3DGKU3m0%2B-XBAH%2BerPiJQ%40mail.gmail.com Backpatch-through: 15 --- contrib/pg_walinspect/pg_walinspect.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c index 6b6c7d46e2..826c9345ae 100644 --- a/contrib/pg_walinspect/pg_walinspect.c +++ b/contrib/pg_walinspect/pg_walinspect.c @@ -332,6 +332,8 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; Datum values[PG_GET_WAL_RECORDS_INFO_COLS]; bool nulls[PG_GET_WAL_RECORDS_INFO_COLS]; + MemoryContext old_cxt; + MemoryContext tmp_cxt; InitMaterializedSRF(fcinfo, 0); @@ -340,18 +342,30 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, MemSet(values, 0, sizeof(values)); MemSet(nulls, 0, sizeof(nulls)); + tmp_cxt = AllocSetContextCreate(CurrentMemoryContext, + "GetWALRecordsInfo temporary cxt", + ALLOCSET_DEFAULT_SIZES); + while (ReadNextXLogRecord(xlogreader) && xlogreader->EndRecPtr <= end_lsn) { + /* Use the tmp context so we can clean up after each tuple is done */ + old_cxt = MemoryContextSwitchTo(tmp_cxt); + GetWALRecordInfo(xlogreader, values, nulls, PG_GET_WAL_RECORDS_INFO_COLS); tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + /* clean up and switch back */ + MemoryContextSwitchTo(old_cxt); + MemoryContextReset(tmp_cxt); + CHECK_FOR_INTERRUPTS(); } + MemoryContextDelete(tmp_cxt); pfree(xlogreader->private_data); XLogReaderFree(xlogreader); -- 2.34.1