Hi Tom, Richard,

On Mon, Jul 24, 2023 at 8:17 AM Richard Guo <guofengli...@gmail.com> wrote:
>
> Thanks for pushing it!

With this fix, I saw a noticeable increase in the memory consumption
of planner. I was running experiments mentioned in [1] The reason is
the Bitmapset created by bms_union() are not freed during planning and
when there are thousands of child joins involved, bitmapsets takes up
a large memory and there any a large number of bitmaps.

Attached 0002 patch fixes the memory consumption by calculating
appinfos only once and using them twice. The number look like below

Number of tables joined | without patch | with patch |
------------------------------------------------------
                      2 |      40.8 MiB |   40.3 MiB |
                      3 |     151.6 MiB |  146.9 MiB |
                      4 |     463.9 MiB |  445.5 MiB |
                      5 |    1663.9 MiB | 1563.3 MiB |

The memory consumption is prominent at higher number of joins as that
exponentially increases the number of child joins.

Attached setup.sql and queries.sql and patch 0001 were used to measure
memory similar to [1].

I don't think it's a problem with the patch itself. We should be able
to use Bitmapset APIs similar to what patch is doing. But there's a
problem with our Bitmapset implementation. It's not space efficient
for thousands of partitions. A quick calculation reveals this.

If the number of partitions is 1000, the matching partitions will
usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
the relids will be {b 1000, 2000, 3000, ...}. To represent a single
1000th digit current Bitmapset implementation will allocate 1000/8 =
125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
memory. This is even true for lower relid numbers since they will be
1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
joins require 625KiB memory. Doing this as many times as the number of
joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
difference we see above.

Even if we allocate a list to hold 5 integers it will not take 625
bytes. I think we need to improve our Bitmapset representation to be
efficient in such cases. Of course whatever representation we choose
has to be space efficient for a small number of tables as well and
should gel well with our planner logic. So I guess some kind of
dynamic representation which changes the underlying layout based on
the contents is required. I have looked up past hacker threads to see
if this has been discussed previously.

I don't think this is the thread to discuss it and also I am not
planning to work on it in the immediate future. But I thought I would
mention it where I found it.

[1] 
https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
From 0f00fc48b9ebf73b54724ba82cec4a69d5f63846 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/2] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  7 ++++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9f859949f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 					planduration;
 		BufferUsage bufusage_start,
 					bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+						- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-					   &planduration, (es->buffers ? &bufusage : NULL));
+					   &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+										(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+					- mem_consumed;
 
 	/* calc differences of buffer counters. */
 	if (es->buffers)
@@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
-						   &planduration, (es->buffers ? &bufusage : NULL));
+						   &planduration, (es->buffers ? &bufusage : NULL),
+						   &mem_consumed);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
 							  paramLI, queryEnv);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..43af271f33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children,
 								 grand_totals.totalspace - grand_totals.freespace)));
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+	MemoryContextCounters grand_totals;
+
+	memset(&grand_totals, 0, sizeof(grand_totals));
+
+	MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false);
+
+	return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *		One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..21e3d2f309 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ExplainState *es, const char *queryString,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
-						   const BufferUsage *bufusage);
+						   const BufferUsage *bufusage,
+						   const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 21640d62a6..d7c477f229 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
 									 bool print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 												bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.25.1

From 1817c1662577cbe7bca98548315ab14c78b1d5de Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH 2/2] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++++++----
 src/backend/optimizer/util/relnode.c  | 18 +++---------------
 src/include/optimizer/pathnode.h      |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 015a0b3cbe..e4835c10fc 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1545,6 +1545,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
+		Bitmapset  *child_relids = NULL;
 
 		if (joinrel->partbounds_merged)
 		{
@@ -1640,9 +1641,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 											   child_rel2->relids);
 
 		/* Find the AppendRelInfo structures */
-		appinfos = find_appinfos_by_relids(root,
-										   bms_union(child_rel1->relids,
-													 child_rel2->relids),
+		child_relids = bms_union(child_rel1->relids, child_rel2->relids);
+		appinfos = find_appinfos_by_relids(root, child_relids,
 										   &nappinfos);
 
 		/*
@@ -1660,7 +1660,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		{
 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
 												 joinrel, child_restrictlist,
-												 child_sjinfo);
+												 child_sjinfo, appinfos,
+												 nappinfos);
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 			joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts);
 			joinrel->all_partrels = bms_add_members(joinrel->all_partrels,
@@ -1678,6 +1679,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 									child_restrictlist);
 
 		pfree(appinfos);
+		bms_free(child_relids);
 	}
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 76dad17e33..50008cd1bc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -855,15 +855,15 @@ build_join_rel(PlannerInfo *root,
  * 'restrictlist': list of RestrictInfo nodes that apply to this particular
  *		pair of joinable relations
  * 'sjinfo': child join's join-type details
+ * 'appinfos' and 'nappinfos': AppendRelInfo array for child relids
  */
 RelOptInfo *
 build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 					 RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
-					 List *restrictlist, SpecialJoinInfo *sjinfo)
+					 List *restrictlist, SpecialJoinInfo *sjinfo,
+					 AppendRelInfo **appinfos, int nappinfos)
 {
 	RelOptInfo *joinrel = makeNode(RelOptInfo);
-	AppendRelInfo **appinfos;
-	int			nappinfos;
 
 	/* Only joins between "other" relations land here. */
 	Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
@@ -871,16 +871,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* The parent joinrel should have consider_partitionwise_join set. */
 	Assert(parent_joinrel->consider_partitionwise_join);
 
-	/*
-	 * Find the AppendRelInfo structures for the child baserels.  We'll need
-	 * these for computing the child join's relid set, and later for mapping
-	 * Vars to the child rel.
-	 */
-	appinfos = find_appinfos_by_relids(root,
-									   bms_union(outer_rel->relids,
-												 inner_rel->relids),
-									   &nappinfos);
-
 	joinrel->reloptkind = RELOPT_OTHER_JOINREL;
 	joinrel->relids = adjust_child_relids(parent_joinrel->relids,
 										  nappinfos, appinfos);
@@ -995,8 +985,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 										nappinfos, appinfos,
 										parent_joinrel, joinrel);
 
-	pfree(appinfos);
-
 	return joinrel;
 }
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 001e75b5b7..23579ab8e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -338,6 +338,7 @@ extern Bitmapset *get_param_path_clause_serials(Path *path);
 extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										RelOptInfo *parent_joinrel, List *restrictlist,
-										SpecialJoinInfo *sjinfo);
+										SpecialJoinInfo *sjinfo,
+										AppendRelInfo **appinfos, int nappinfos);
 
 #endif							/* PATHNODE_H */
-- 
2.25.1

Attachment: queries.sql
Description: application/sql

Attachment: setup.sql
Description: application/sql

Reply via email to