Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> On 2022-Nov-14, Tom Lane wrote:
>> For discussion's sake, here's my current version of that 0004 patch,
>> rewritten to use list-of-bitmapset as the data structure.

> I feel that there should be more commentary that explains what a
> multi-bms is.  Not sure where, maybe just put it near the function that
> first appears in the file.

Right.  I split the new functions out to new files multibitmapset.h/.c,
so that the file headers can carry the overall explanation.  (I duplicated
the text between .h and .c, which is also true of bitmapset.h/.c, but
maybe that's overkill.)

>> * The multi_bms prefix is a bit wordy, so I was thinking of shortening
>> the function names to mbms_xxx.  Maybe that's too brief.

> I don't think the "ulti_" bytes add a lot, and short names are better.

Yeah, after sleeping on it I like mbms.

>> * This is a pretty short list of functions so far.  I'm not eager
>> to build out a bunch of dead code though.  Is it OK to leave it
>> with just this much functionality until someone needs more?

> I agree with not adding dead code.

I concluded that the only thing that makes this an odd set of functions
to start out with is the lack of mbms_is_member; it seems asymmetric
to have mbms_add_member but not mbms_is_member.  So I added that.
I'm content to let the rest grow out as needed.

>> * I'm a little hesitant about whether the API actually should be
>> List-of-Bitmapset, or some dedicated struct as I had in the previous
>> version of 0004.  This version is way less invasive in prepjointree.c
>> than that was, but the reason is there's ambiguity about what the
>> forced_null_vars Lists actually contain, which feels error-prone.

> Hmm ... if somebody makes a mistake, does the functionality break in
> obvious ways, or is it very hard to pinpoint what happened?

Now that Bitmapset is a full-fledged Node type, we can make use of
castNode checks to verify that the input Lists contain what we expect.
That seems probably sufficient to catch coding errors.

>> +multi_bms_add_member(List *mbms, int index1, int index2)

> Maybe s/index1/listidx/ or bitmapidx and s/index2/bitnr/ ?

Right.  I used listidx and bitidx.

                        regards, tom lane

diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index d7b4261b47..4368c30fdb 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	extensible.o \
 	list.o \
 	makefuncs.o \
+	multibitmapset.o \
 	nodeFuncs.o \
 	nodes.o \
 	outfuncs.o \
diff --git a/src/backend/nodes/README b/src/backend/nodes/README
index 64937fe254..489a67eb89 100644
--- a/src/backend/nodes/README
+++ b/src/backend/nodes/README
@@ -58,6 +58,7 @@ FILES IN THIS DIRECTORY (src/backend/nodes/)
     Specialized manipulation functions:
 	bitmapset.c	- Bitmapset support
 	list.c		- generic list support
+	multibitmapset.c - List-of-Bitmapset support
 	params.c	- Param support
 	tidbitmap.c	- TIDBitmap support
 	value.c		- support for value nodes
diff --git a/src/backend/nodes/meson.build b/src/backend/nodes/meson.build
index 8e0d4039f2..c4f3897ef2 100644
--- a/src/backend/nodes/meson.build
+++ b/src/backend/nodes/meson.build
@@ -3,6 +3,7 @@ backend_sources += files(
   'extensible.c',
   'list.c',
   'makefuncs.c',
+  'multibitmapset.c',
   'nodeFuncs.c',
   'nodes.c',
   'params.c',
diff --git a/src/backend/nodes/multibitmapset.c b/src/backend/nodes/multibitmapset.c
new file mode 100644
index 0000000000..9a2ac8c81b
--- /dev/null
+++ b/src/backend/nodes/multibitmapset.c
@@ -0,0 +1,158 @@
+/*-------------------------------------------------------------------------
+ *
+ * multibitmapset.c
+ *	  Lists of Bitmapsets
+ *
+ * A multibitmapset is useful in situations where members of a set can
+ * be identified by two small integers; for example, varno and varattno
+ * of a group of Vars within a query.  The implementation is a List of
+ * Bitmapsets, so that the empty set can be represented by NIL.  (But,
+ * as with Bitmapsets, that's not the only allowed representation.)
+ * The zero-based index of a List element is the first identifying value,
+ * and the (also zero-based) index of a bit within that Bitmapset is
+ * the second identifying value.  There is no expectation that the
+ * Bitmapsets should all be the same size.
+ *
+ * The available operations on multibitmapsets are intended to parallel
+ * those on bitmapsets, for example union and intersection.  So far only
+ * a small fraction of that has been built out; we'll add more as needed.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/nodes/multibitmapset.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/multibitmapset.h"
+
+
+/*
+ * mbms_add_member
+ *		Add a new member to a multibitmapset.
+ *
+ * The new member is identified by "listidx", the zero-based index of the
+ * List element it should go into, and "bitidx", which specifies the bit
+ * number to be set therein.
+ *
+ * This is like bms_add_member, but for multibitmapsets.
+ */
+List *
+mbms_add_member(List *a, int listidx, int bitidx)
+{
+	Bitmapset  *bms;
+	ListCell   *lc;
+
+	if (listidx < 0 || bitidx < 0)
+		elog(ERROR, "negative multibitmapset member index not allowed");
+	/* Add empty elements as needed */
+	while (list_length(a) <= listidx)
+		a = lappend(a, NULL);
+	/* Update the target element */
+	lc = list_nth_cell(a, listidx);
+	bms = lfirst_node(Bitmapset, lc);
+	bms = bms_add_member(bms, bitidx);
+	lfirst(lc) = bms;
+	return a;
+}
+
+/*
+ * mbms_add_members
+ *		Add all members of set b to set a.
+ *
+ * This is like bms_add_members, but for multibitmapsets.
+ */
+List *
+mbms_add_members(List *a, const List *b)
+{
+	ListCell   *lca,
+			   *lcb;
+
+	/* Add empty elements to a, as needed */
+	while (list_length(a) < list_length(b))
+		a = lappend(a, NULL);
+	/* forboth will stop at the end of the shorter list, which is fine */
+	forboth(lca, a, lcb, b)
+	{
+		Bitmapset  *bmsa = lfirst_node(Bitmapset, lca);
+		const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
+
+		bmsa = bms_add_members(bmsa, bmsb);
+		lfirst(lca) = bmsa;
+	}
+	return a;
+}
+
+/*
+ * mbms_int_members
+ *		Reduce set a to its intersection with set b.
+ *
+ * This is like bms_int_members, but for multibitmapsets.
+ */
+List *
+mbms_int_members(List *a, const List *b)
+{
+	ListCell   *lca,
+			   *lcb;
+
+	/* Remove any elements of a that are no longer of use */
+	a = list_truncate(a, list_length(b));
+	/* forboth will stop at the end of the shorter list, which is fine */
+	forboth(lca, a, lcb, b)
+	{
+		Bitmapset  *bmsa = lfirst_node(Bitmapset, lca);
+		const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
+
+		bmsa = bms_int_members(bmsa, bmsb);
+		lfirst(lca) = bmsa;
+	}
+	return a;
+}
+
+/*
+ * mbms_is_member
+ *		Is listidx/bitidx a member of A?
+ *
+ * This is like bms_is_member, but for multibitmapsets.
+ */
+bool
+mbms_is_member(int listidx, int bitidx, const List *a)
+{
+	const Bitmapset *bms;
+
+	/* XXX better to just return false for negative indexes? */
+	if (listidx < 0 || bitidx < 0)
+		elog(ERROR, "negative multibitmapset member index not allowed");
+	if (listidx >= list_length(a))
+		return false;
+	bms = list_nth_node(Bitmapset, a, listidx);
+	return bms_is_member(bitidx, bms);
+}
+
+/*
+ * mbms_overlap_sets
+ *		Identify the bitmapsets having common members in a and b.
+ *
+ * The result is a bitmapset of the list indexes of bitmapsets that overlap.
+ */
+Bitmapset *
+mbms_overlap_sets(const List *a, const List *b)
+{
+	Bitmapset  *result = NULL;
+	ListCell   *lca,
+			   *lcb;
+
+	/* forboth will stop at the end of the shorter list, which is fine */
+	forboth(lca, a, lcb, b)
+	{
+		const Bitmapset *bmsa = lfirst_node(Bitmapset, lca);
+		const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
+
+		if (bms_overlap(bmsa, bmsb))
+			result = bms_add_member(result, foreach_current_index(lca));
+	}
+	return result;
+}
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index b4b9099eb6..f4cdb879c2 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -28,6 +28,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "nodes/makefuncs.h"
+#include "nodes/multibitmapset.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -2769,7 +2770,7 @@ reduce_outer_joins_pass1(Node *jtnode)
  *	state: state data collected by phase 1 for this node
  *	root: toplevel planner state
  *	nonnullable_rels: set of base relids forced non-null by upper quals
- *	forced_null_vars: list of Vars forced null by upper quals
+ *	forced_null_vars: multibitmapset of Vars forced null by upper quals
  */
 static void
 reduce_outer_joins_pass2(Node *jtnode,
@@ -2799,8 +2800,8 @@ reduce_outer_joins_pass2(Node *jtnode,
 		pass_nonnullable_rels = bms_add_members(pass_nonnullable_rels,
 												nonnullable_rels);
 		pass_forced_null_vars = find_forced_null_vars(f->quals);
-		pass_forced_null_vars = list_concat(pass_forced_null_vars,
-											forced_null_vars);
+		pass_forced_null_vars = mbms_add_members(pass_forced_null_vars,
+												 forced_null_vars);
 		/* And recurse --- but only into interesting subtrees */
 		Assert(list_length(f->fromlist) == list_length(state->sub_states));
 		forboth(l, f->fromlist, s, state->sub_states)
@@ -2897,7 +2898,7 @@ reduce_outer_joins_pass2(Node *jtnode,
 		if (jointype == JOIN_LEFT)
 		{
 			List	   *nonnullable_vars;
-			List	   *overlap;
+			Bitmapset  *overlap;
 
 			/* Find Vars in j->quals that must be non-null in joined rows */
 			nonnullable_vars = find_nonnullable_vars(j->quals);
@@ -2907,11 +2908,8 @@ reduce_outer_joins_pass2(Node *jtnode,
 			 * forced_null_vars overlap: we need to know if the overlap
 			 * includes any RHS variables.
 			 */
-			overlap = list_intersection(nonnullable_vars,
-										forced_null_vars);
-			if (overlap != NIL &&
-				bms_overlap(pull_varnos(root, (Node *) overlap),
-							right_state->relids))
+			overlap = mbms_overlap_sets(nonnullable_vars, forced_null_vars);
+			if (bms_overlap(overlap, right_state->relids))
 				jointype = JOIN_ANTI;
 		}
 
@@ -2964,8 +2962,8 @@ reduce_outer_joins_pass2(Node *jtnode,
 					/* OK to merge upper and local constraints */
 					local_nonnullable_rels = bms_add_members(local_nonnullable_rels,
 															 nonnullable_rels);
-					local_forced_null_vars = list_concat(local_forced_null_vars,
-														 forced_null_vars);
+					local_forced_null_vars = mbms_add_members(local_forced_null_vars,
+															  forced_null_vars);
 				}
 			}
 			else
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 317c10c2b9..33790a4f46 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -31,6 +31,7 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "nodes/multibitmapset.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
 #include "nodes/supportnodes.h"
@@ -1566,7 +1567,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
  * find_nonnullable_vars
  *		Determine which Vars are forced nonnullable by given clause.
  *
- * Returns a list of all level-zero Vars that are referenced in the clause in
+ * Returns the set of all level-zero Vars that are referenced in the clause in
  * such a way that the clause cannot possibly return TRUE if any of these Vars
  * is NULL.  (It is OK to err on the side of conservatism; hence the analysis
  * here is simplistic.)
@@ -1578,8 +1579,9 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
  * the expression to have been AND/OR flattened and converted to implicit-AND
  * format.
  *
- * The result is a palloc'd List, but we have not copied the member Var nodes.
- * Also, we don't bother trying to eliminate duplicate entries.
+ * Attnos of the identified Vars are returned in a multibitmapset (a List of
+ * Bitmapsets).  List indexes correspond to relids (varnos), while the per-rel
+ * Bitmapsets hold varattnos offset by FirstLowInvalidHeapAttributeNumber.
  *
  * top_level is true while scanning top-level AND/OR structure; here, showing
  * the result is either FALSE or NULL is good enough.  top_level is false when
@@ -1608,7 +1610,9 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
 		Var		   *var = (Var *) node;
 
 		if (var->varlevelsup == 0)
-			result = list_make1(var);
+			result = mbms_add_member(result,
+									 var->varno,
+									 var->varattno - FirstLowInvalidHeapAttributeNumber);
 	}
 	else if (IsA(node, List))
 	{
@@ -1623,9 +1627,9 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
 		 */
 		foreach(l, (List *) node)
 		{
-			result = list_concat(result,
-								 find_nonnullable_vars_walker(lfirst(l),
-															  top_level));
+			result = mbms_add_members(result,
+									  find_nonnullable_vars_walker(lfirst(l),
+																   top_level));
 		}
 	}
 	else if (IsA(node, FuncExpr))
@@ -1657,7 +1661,12 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
 		switch (expr->boolop)
 		{
 			case AND_EXPR:
-				/* At top level we can just recurse (to the List case) */
+
+				/*
+				 * At top level we can just recurse (to the List case), since
+				 * the result should be the union of what we can prove in each
+				 * arm.
+				 */
 				if (top_level)
 				{
 					result = find_nonnullable_vars_walker((Node *) expr->args,
@@ -1689,7 +1698,7 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
 					if (result == NIL)	/* first subresult? */
 						result = subresult;
 					else
-						result = list_intersection(result, subresult);
+						result = mbms_int_members(result, subresult);
 
 					/*
 					 * If the intersection is empty, we can stop looking. This
@@ -1788,8 +1797,8 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
  * side of conservatism; hence the analysis here is simplistic.  In fact,
  * we only detect simple "var IS NULL" tests at the top level.)
  *
- * The result is a palloc'd List, but we have not copied the member Var nodes.
- * Also, we don't bother trying to eliminate duplicate entries.
+ * As with find_nonnullable_vars, we return the varattnos of the identified
+ * Vars in a multibitmapset.
  */
 List *
 find_forced_null_vars(Node *node)
@@ -1804,7 +1813,9 @@ find_forced_null_vars(Node *node)
 	var = find_forced_null_var(node);
 	if (var)
 	{
-		result = list_make1(var);
+		result = mbms_add_member(result,
+								 var->varno,
+								 var->varattno - FirstLowInvalidHeapAttributeNumber);
 	}
 	/* Otherwise, handle AND-conditions */
 	else if (IsA(node, List))
@@ -1815,8 +1826,8 @@ find_forced_null_vars(Node *node)
 		 */
 		foreach(l, (List *) node)
 		{
-			result = list_concat(result,
-								 find_forced_null_vars(lfirst(l)));
+			result = mbms_add_members(result,
+									  find_forced_null_vars((Node *) lfirst(l)));
 		}
 	}
 	else if (IsA(node, BoolExpr))
diff --git a/src/include/nodes/multibitmapset.h b/src/include/nodes/multibitmapset.h
new file mode 100644
index 0000000000..8c6695aeda
--- /dev/null
+++ b/src/include/nodes/multibitmapset.h
@@ -0,0 +1,39 @@
+/*-------------------------------------------------------------------------
+ *
+ * multibitmapset.h
+ *	  Lists of Bitmapsets
+ *
+ * A multibitmapset is useful in situations where members of a set can
+ * be identified by two small integers; for example, varno and varattno
+ * of a group of Vars within a query.  The implementation is a List of
+ * Bitmapsets, so that the empty set can be represented by NIL.  (But,
+ * as with Bitmapsets, that's not the only allowed representation.)
+ * The zero-based index of a List element is the first identifying value,
+ * and the (also zero-based) index of a bit within that Bitmapset is
+ * the second identifying value.  There is no expectation that the
+ * Bitmapsets should all be the same size.
+ *
+ * The available operations on multibitmapsets are intended to parallel
+ * those on bitmapsets, for example union and intersection.  So far only
+ * a small fraction of that has been built out; we'll add more as needed.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * src/include/nodes/multibitmapset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MULTIBITMAPSET_H
+#define MULTIBITMAPSET_H
+
+#include "nodes/bitmapset.h"
+#include "nodes/pg_list.h"
+
+extern List *mbms_add_member(List *a, int listidx, int bitidx);
+extern List *mbms_add_members(List *a, const List *b);
+extern List *mbms_int_members(List *a, const List *b);
+extern bool mbms_is_member(int listidx, int bitidx, const List *a);
+extern Bitmapset *mbms_overlap_sets(const List *a, const List *b);
+
+#endif							/* MULTIBITMAPSET_H */

Reply via email to