Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-02-06 Thread Ashutosh Bapat
On Mon, Nov 6, 2023 at 2:48 PM Ashutosh Bapat
 wrote:
>
> explain.out in 0001 needed some adjustments. Without those CIbot shows
> failures. Fixed in the attached patchset. 0001 is just for diagnosis,
> not for actual commit. 0002 which is the actual patch has no changes
> wrt to the previous version.
>

First patch is no longer required. PFA rebased second patch.

-- 
Best Wishes,
Ashutosh Bapat
From 7cf3ad0fd9a7cfffa52a8faad759f16988427a70 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..d9dbe0e930 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1526,6 +1526,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
+		Bitmapset  *child_relids = NULL;
 
 		if (joinrel->partbounds_merged)
 		{
@@ -1621,9 +1622,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 			   child_rel2->relids);
 
 		/* Find the AppendRelInfo structures */
-		appinfos = find_appinfos_by_relids(root,
-		   bms_union(child_rel1->relids,
-	 child_rel2->relids),
+		child_relids = bms_union(child_rel1->relids, child_rel2->relids);
+		appinfos = find_appinfos_by_relids(root, child_relids,
 		   );
 
 		/*
@@ -1641,7 +1641,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		{
 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
  joinrel, child_restrictlist,
- child_sjinfo);
+ child_sjinfo, appinfos,
+ nappinfos);
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 			joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts);
 			joinrel->all_partrels = bms_add_members(joinrel->all_partrels,
@@ -1659,6 +1660,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 	child_restrictlist);
 
 		pfree(appinfos);
+		bms_free(child_relids);
 	}
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..98558f4661 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -869,15 +869,15 @@ build_join_rel(PlannerInfo *root,
  * 'restrictlist': list of RestrictInfo nodes that apply to this particular
  *		pair of joinable relations
  * 'sjinfo': child join's join-type details
+ * 'appinfos' and 'nappinfos': AppendRelInfo array for child relids
  */
 RelOptInfo *
 build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	 RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
-	 List *restrictlist, SpecialJoinInfo *sjinfo)
+	 List *restrictlist, SpecialJoinInfo *sjinfo,
+	 AppendRelInfo **appinfos, int nappinfos)
 {
 	RelOptInfo *joinrel = makeNode(RelOptInfo);
-	AppendRelInfo **appinfos;
-	int			nappinfos;
 
 	/* Only joins between "other" relations land here. */
 	Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
@@ -885,16 +885,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* The parent joinrel should have consider_partitionwise_join set. */
 	Assert(parent_joinrel->consider_partitionwise_join);
 
-	/*
-	 * Find the AppendRelInfo structures for the child baserels.  We'll need
-	 * these for computing the child join's relid set, and later for mapping
-	 * Vars to the child rel.
-	 */
-	appinfos = find_appinfos_by_relids(root,
-	   bms_union(outer_rel->relids,
- inner_rel->relids),
-	   );
-
 	joinrel->reloptkind = RELOPT_OTHER_JOINREL;
 	joinrel->relids = adjust_child_relids(parent_joinrel->relids,
 		  nappinfos, appinfos);
@@ -1010,8 +1000,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 		nappinfos, appinfos,
 		parent_joinrel, joinrel);
 
-	pfree(appinfos);
-
 	return joinrel;
 }
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..90f062ce96 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -342,6 +342,7 @@ extern Bitmapset *get_param_path_clause_serials(Path *path);
 extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 		RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 		RelOptInfo *parent_joinrel, List *restrictlist,
-		

Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2023-11-06 Thread Ashutosh Bapat
explain.out in 0001 needed some adjustments. Without those CIbot shows
failures. Fixed in the attached patchset. 0001 is just for diagnosis,
not for actual commit. 0002 which is the actual patch has no changes
wrt to the previous version.

On Tue, Oct 31, 2023 at 11:06 AM Ashutosh Bapat
 wrote:
>
> On Fri, Sep 8, 2023 at 3:22 PM Ashutosh Bapat
>  wrote:
> >
> > On Fri, Jul 28, 2023 at 3:16 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Hi Tom, Richard,
> > >
> > > On Mon, Jul 24, 2023 at 8:17 AM Richard Guo  
> > > wrote:
> > > >
> > > > Thanks for pushing it!
> > >
> > > With this fix, I saw a noticeable increase in the memory consumption
> > > of planner. I was running experiments mentioned in [1] The reason is
> > > the Bitmapset created by bms_union() are not freed during planning and
> > > when there are thousands of child joins involved, bitmapsets takes up
> > > a large memory and there any a large number of bitmaps.
> > >
> > > Attached 0002 patch fixes the memory consumption by calculating
> > > appinfos only once and using them twice. The number look like below
> > >
> > > Number of tables joined | without patch | with patch |
> > > --
> > >   2 |  40.8 MiB |   40.3 MiB |
> > >   3 | 151.6 MiB |  146.9 MiB |
> > >   4 | 463.9 MiB |  445.5 MiB |
> > >   5 |1663.9 MiB | 1563.3 MiB |
> > >
> > > The memory consumption is prominent at higher number of joins as that
> > > exponentially increases the number of child joins.
> > >
> > > Attached setup.sql and queries.sql and patch 0001 were used to measure
> > > memory similar to [1].
> > >
> > > I don't think it's a problem with the patch itself. We should be able
> > > to use Bitmapset APIs similar to what patch is doing. But there's a
> > > problem with our Bitmapset implementation. It's not space efficient
> > > for thousands of partitions. A quick calculation reveals this.
> > >
> > > If the number of partitions is 1000, the matching partitions will
> > > usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
> > > the relids will be {b 1000, 2000, 3000, ...}. To represent a single
> > > 1000th digit current Bitmapset implementation will allocate 1000/8 =
> > > 125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
> > > memory. This is even true for lower relid numbers since they will be
> > > 1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
> > > joins require 625KiB memory. Doing this as many times as the number of
> > > joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
> > > difference we see above.
> > >
> > > Even if we allocate a list to hold 5 integers it will not take 625
> > > bytes. I think we need to improve our Bitmapset representation to be
> > > efficient in such cases. Of course whatever representation we choose
> > > has to be space efficient for a small number of tables as well and
> > > should gel well with our planner logic. So I guess some kind of
> > > dynamic representation which changes the underlying layout based on
> > > the contents is required. I have looked up past hacker threads to see
> > > if this has been discussed previously.
> > >
> > > I don't think this is the thread to discuss it and also I am not
> > > planning to work on it in the immediate future. But I thought I would
> > > mention it where I found it.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
> > >
> >
> > Adding this small patch to the commitfest in case somebody finds it
> > worth fixing this specific memory consumption. With a new subject.
>
> Rebased patches.
> 0001 - is same as the squashed version of patches at
> https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.
> 0002 is the actual fix described earlier
>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Best Wishes,
Ashutosh Bapat
From 84d59123ba6cec8717368ce9cefa290b711d7b3d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH 2/2] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d03ace50a1..1feb1f99c0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1526,6 +1526,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		

Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2023-10-30 Thread Ashutosh Bapat
On Fri, Sep 8, 2023 at 3:22 PM Ashutosh Bapat
 wrote:
>
> On Fri, Jul 28, 2023 at 3:16 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Tom, Richard,
> >
> > On Mon, Jul 24, 2023 at 8:17 AM Richard Guo  wrote:
> > >
> > > Thanks for pushing it!
> >
> > With this fix, I saw a noticeable increase in the memory consumption
> > of planner. I was running experiments mentioned in [1] The reason is
> > the Bitmapset created by bms_union() are not freed during planning and
> > when there are thousands of child joins involved, bitmapsets takes up
> > a large memory and there any a large number of bitmaps.
> >
> > Attached 0002 patch fixes the memory consumption by calculating
> > appinfos only once and using them twice. The number look like below
> >
> > Number of tables joined | without patch | with patch |
> > --
> >   2 |  40.8 MiB |   40.3 MiB |
> >   3 | 151.6 MiB |  146.9 MiB |
> >   4 | 463.9 MiB |  445.5 MiB |
> >   5 |1663.9 MiB | 1563.3 MiB |
> >
> > The memory consumption is prominent at higher number of joins as that
> > exponentially increases the number of child joins.
> >
> > Attached setup.sql and queries.sql and patch 0001 were used to measure
> > memory similar to [1].
> >
> > I don't think it's a problem with the patch itself. We should be able
> > to use Bitmapset APIs similar to what patch is doing. But there's a
> > problem with our Bitmapset implementation. It's not space efficient
> > for thousands of partitions. A quick calculation reveals this.
> >
> > If the number of partitions is 1000, the matching partitions will
> > usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
> > the relids will be {b 1000, 2000, 3000, ...}. To represent a single
> > 1000th digit current Bitmapset implementation will allocate 1000/8 =
> > 125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
> > memory. This is even true for lower relid numbers since they will be
> > 1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
> > joins require 625KiB memory. Doing this as many times as the number of
> > joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
> > difference we see above.
> >
> > Even if we allocate a list to hold 5 integers it will not take 625
> > bytes. I think we need to improve our Bitmapset representation to be
> > efficient in such cases. Of course whatever representation we choose
> > has to be space efficient for a small number of tables as well and
> > should gel well with our planner logic. So I guess some kind of
> > dynamic representation which changes the underlying layout based on
> > the contents is required. I have looked up past hacker threads to see
> > if this has been discussed previously.
> >
> > I don't think this is the thread to discuss it and also I am not
> > planning to work on it in the immediate future. But I thought I would
> > mention it where I found it.
> >
> > [1] 
> > https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
> >
>
> Adding this small patch to the commitfest in case somebody finds it
> worth fixing this specific memory consumption. With a new subject.

Rebased patches.
0001 - is same as the squashed version of patches at
https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.
0002 is the actual fix described earlier

-- 
Best Wishes,
Ashutosh Bapat
From e46e9cfe528ecbb04b17c21bb79c55cb8e23289b Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH 2/2] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 015a0b3cbe..e4835c10fc 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1545,6 +1545,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
+		Bitmapset  *child_relids = NULL;
 
 		if (joinrel->partbounds_merged)
 		{
@@ -1640,9 +1641,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 			   child_rel2->relids);
 
 		/* Find the AppendRelInfo structures */
-		appinfos = find_appinfos_by_relids(root,
-		   bms_union(child_rel1->relids,
-	 child_rel2->relids),
+		child_relids = 

Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2023-09-08 Thread Ashutosh Bapat
On Fri, Jul 28, 2023 at 3:16 PM Ashutosh Bapat
 wrote:
>
> Hi Tom, Richard,
>
> On Mon, Jul 24, 2023 at 8:17 AM Richard Guo  wrote:
> >
> > Thanks for pushing it!
>
> With this fix, I saw a noticeable increase in the memory consumption
> of planner. I was running experiments mentioned in [1] The reason is
> the Bitmapset created by bms_union() are not freed during planning and
> when there are thousands of child joins involved, bitmapsets takes up
> a large memory and there any a large number of bitmaps.
>
> Attached 0002 patch fixes the memory consumption by calculating
> appinfos only once and using them twice. The number look like below
>
> Number of tables joined | without patch | with patch |
> --
>   2 |  40.8 MiB |   40.3 MiB |
>   3 | 151.6 MiB |  146.9 MiB |
>   4 | 463.9 MiB |  445.5 MiB |
>   5 |1663.9 MiB | 1563.3 MiB |
>
> The memory consumption is prominent at higher number of joins as that
> exponentially increases the number of child joins.
>
> Attached setup.sql and queries.sql and patch 0001 were used to measure
> memory similar to [1].
>
> I don't think it's a problem with the patch itself. We should be able
> to use Bitmapset APIs similar to what patch is doing. But there's a
> problem with our Bitmapset implementation. It's not space efficient
> for thousands of partitions. A quick calculation reveals this.
>
> If the number of partitions is 1000, the matching partitions will
> usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
> the relids will be {b 1000, 2000, 3000, ...}. To represent a single
> 1000th digit current Bitmapset implementation will allocate 1000/8 =
> 125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
> memory. This is even true for lower relid numbers since they will be
> 1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
> joins require 625KiB memory. Doing this as many times as the number of
> joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
> difference we see above.
>
> Even if we allocate a list to hold 5 integers it will not take 625
> bytes. I think we need to improve our Bitmapset representation to be
> efficient in such cases. Of course whatever representation we choose
> has to be space efficient for a small number of tables as well and
> should gel well with our planner logic. So I guess some kind of
> dynamic representation which changes the underlying layout based on
> the contents is required. I have looked up past hacker threads to see
> if this has been discussed previously.
>
> I don't think this is the thread to discuss it and also I am not
> planning to work on it in the immediate future. But I thought I would
> mention it where I found it.
>
> [1] 
> https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
>

Adding this small patch to the commitfest in case somebody finds it
worth fixing this specific memory consumption. With a new subject.



-- 
Best Wishes,
Ashutosh Bapat