Hi,

while experimenting with BRIN indexes after a couple FOSDEM discussions,
I ran into the existing limitation that BRIN indexes don't handle array
scan keys. So BRIN indexes can be used for conditions like

    WHERE a IN (1,2,3,4,5)

but we essentially treat the values as individual scan keys, and for
each one we scan the BRIN index and build/update the bitmap. Which for
large indexes may be fairly expensive - the cost is proportional to the
number of values, so if building the bitmap for 1 value takes 10ms, for
100 values it'll take ~1000ms.

It's not hard to construct cases like this (e.g. when using indexes with
small pages_per_range values) etc. Of course, if the query does a lot of
other expensive stuff, this cost may be insignificant.

I'm not sure how often people do queries with such conditions. But I've
been experimenting with features that'd build such paths, so I took a
stab at a PoC, which can significantly reduce the time needed to build
the bitmaps. And there's a couple more interesting opportunities.


0001 - Support SK_SEARCHARRAY in BRIN minmax
--------------------------------------------
The 0001 part does a "naive" SK_SEARCHARRAY implementation for minmax.
It simply sets amsearcharray=true and then tweaks the consistent
function to handle both the scalar and array scan keys.

This is obviously rather inefficient, because the array is searched
linearly. So yes, we don't walk the index repeatedly, but we have to
compare each range to (almost-)all values.


0002 - Sort the array in brinrescan() and do binsearch
------------------------------------------------------
There's a simple way to optimize the naive approach by sorting the array
and then searching in this array. If the array is sorted, we can search
for the first value >= minvalue, and see if that is consistent (i.e. if
it's <= maxval).

In my experiments this cuts the time needed to build the bitmap for
array to pretty much the same as for a single value.

I think this is similar to the preprocessing of scan keys in b-tree, so
brinrescan() is a natural way to do the sort. The problem however is
where to store the result.

Ideally, we'd store it in BrinOpaque (just like BTScanOpaque in btree),
but the problem is we don't pass that to the consistent functions. Those
only get the ScanKeys and stuff extracted from BrinOpaque.

We might add a parameter to the "consistent" function, but that
conflicts with not wanting to break existing extensions implementing
their own BRIN indexes. We allow the opclasses to define "consistent"
with either 4 or 5 arguments. Adding an argument would mean 5 or 6
arguments, but because of the backwards compatibility we'd need to
support existing opclasses, and 5 is ambiguous :-/

In hindsight, I would probably not chose supporting both 4 and 5
arguments again. It makes it harder for us to maintain the code to make
life easier for extensions, but I'm not aware of any out-of-core BRIN
opclasses anyway. So I'd probably just change the API, it's pretty easy
to update existing extensions.

This patch however does a much simpler thing - it just replaces the
array in the SK_SEARCHARRAY scan key with a sorted one. That works for
for minmax, but not for bloom/inclusion, because those are not based on
sorting. And the ArrayType is not great for minmax either, because it
means we need to deconstruct it again and again, for each range. It'd be
much better to deconstruct the array once.

I'll get back to this ...


0003 - Support SK_SEARCHARRAY in BRIN inclusion
-----------------------------------------------
Trivial modification to support array scan keys, can't benefit from
sorting the array.


0004 - Support SK_SEARCHARRAY in BRIN bloom
-------------------------------------------
Trivial modification to support array scan keys, can't benefit from
sorted array either.

But we might "preprocess" the keys in a different way - bloom needs to
calculate two hashes per key, and at the moment it happens again and
again for each range. So if you have 1M ranges, and SK_SEARCHARRAY query
with 100 values, we'll do 100M calls to PROCNUM_HASH and 200M calls to
hash_uint32_extended(). And our hash functions are pretty expensive,
certainly compared to the fast functions often used for bloom filters.

So the preprocessing might actually calculate the hash functions once,
and then only reuse those in the "consistent" function.

0005 is a dirty PoC illustrating the benefit of caching the hashes.

Unfortunately, this complicates things, because it means:

* The scan key preprocessing is not universal for all BRIN opclasses,
  because some opclasses, i.e. each BRIN opclass might have optional
  BRIN_PROCNUM_PREPROCESS which would preprocess the keys the way the
  opclass would like.

* We can simply replace the array in the scan key the way minmax does
  that with the sorted array, because the data type is not the same
  (hashes are uint64).


When I started to write this e-mail I thought there's pretty much just
one way to move this forward:

1) Add a BRIN_PROCNUM_PREPROCESS to BRIN, doing the preprocessing (if
   not defined, the key is not preprocessed.

2) Store the preprocessed keys in BrinOpaque.

3) Modify the BRIN API to allow passing the preprocessed keys.

As mentioned earlier, I'm not sure how difficult would it be to maintain
backwards compatibility, considering the number of arguments of the
consistent function would be ambiguous.

Maybe the existence of BRIN_PROCNUM_PREPROCESS would be enough to decide
this - if it's decided, no keys are preprocessed (and the opclass would
not support SK_SEARCHARRAY).


But now I realize maybe we can do without adding parameters to the
"consistent" function. We might stash "preprocessed" scankeys into
BrinOpaque, and pass them to the consistent function instead of the
"original" scan keys (at least when the BRIN_PROCNUM_PREPROCESS). In
fact, I see ScanKey even allows AM-specific flags, maybe it'd be useful
to to mark the preprocessed keys.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 3de2f0cbe238c3fbbceca1e3a9ed3c205870a6dd Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Fri, 10 Feb 2023 16:07:57 +0100
Subject: [PATCH 1/6] Support SK_SEARCHARRAY in BRIN minmax

Set "amsearcharray=true" for BRIN, and extend the minmax opclass to
handle both scalar values and arrays. This allows handling conditions

    ... WHERE a IN (1, 2, 43, 2132, 134)

    ... WHERE a = ANY(ARRAY[34, 45, -1, 234])

    ... WHERE a <= ANY(ARRAY[4938, 282, 2934])

more efficiently - until now we simply built the bitmap for each
scalar value, so we walked the BRIN index many times. Which may be quite
expensive for indexes with many ranges (large tables and/or low
pages_per_range).

There's a couple problems / open questions / TODO items:

- The other opclasses don't handle SK_SEARCHARRAY yet.

- The array is always searched linearly, so this may be costly for large
  arrays (with many elements).

- The array is deconstructed again for each range. We should reuse this,
  somehow.
---
 src/backend/access/brin/brin.c        |   2 +-
 src/backend/access/brin/brin_minmax.c | 177 ++++++++++++++++++++------
 2 files changed, 140 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0..a600f37c15 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -101,7 +101,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanunique = false;
 	amroutine->amcanmulticol = true;
 	amroutine->amoptionalkey = true;
-	amroutine->amsearcharray = false;
+	amroutine->amsearcharray = true;
 	amroutine->amsearchnulls = true;
 	amroutine->amstorage = true;
 	amroutine->amclusterable = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be6..c054d8ff42 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,7 @@
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
@@ -157,46 +158,146 @@ brin_minmax_consistent(PG_FUNCTION_ARGS)
 	attno = key->sk_attno;
 	subtype = key->sk_subtype;
 	value = key->sk_argument;
-	switch (key->sk_strategy)
+
+	/*
+	 * For regular (scalar) scan keys, we simply compare the value to the
+	 * range min/max values, and we're done. For SK_SEARCHARRAY keys we
+	 * need to deparse the array and loop through the values.
+	 */
+	if (likely(!(key->sk_flags & SK_SEARCHARRAY)))
 	{
-		case BTLessStrategyNumber:
-		case BTLessEqualStrategyNumber:
-			finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-												 key->sk_strategy);
-			matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
-										value);
-			break;
-		case BTEqualStrategyNumber:
-
-			/*
-			 * In the equality case (WHERE col = someval), we want to return
-			 * the current page range if the minimum value in the range <=
-			 * scan key, and the maximum value >= scan key.
-			 */
-			finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-												 BTLessEqualStrategyNumber);
-			matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
-										value);
-			if (!DatumGetBool(matches))
+		switch (key->sk_strategy)
+		{
+			case BTLessStrategyNumber:
+			case BTLessEqualStrategyNumber:
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 key->sk_strategy);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
+											value);
+				break;
+			case BTEqualStrategyNumber:
+
+				/*
+				 * In the equality case (WHERE col = someval), we want to return
+				 * the current page range if the minimum value in the range <=
+				 * scan key, and the maximum value >= scan key.
+				 */
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 BTLessEqualStrategyNumber);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
+											value);
+				if (!DatumGetBool(matches))
+					break;
+				/* max() >= scankey */
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 BTGreaterEqualStrategyNumber);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+											value);
+				break;
+			case BTGreaterEqualStrategyNumber:
+			case BTGreaterStrategyNumber:
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 key->sk_strategy);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+											value);
+				break;
+			default:
+				/* shouldn't happen */
+				elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+				matches = 0;
+				break;
+		}
+	}
+	else
+	{
+		ArrayType  *arrayval;
+		int16		elmlen;
+		bool		elmbyval;
+		char		elmalign;
+		int			num_elems;
+		Datum	   *elem_values;
+		bool	   *elem_nulls;
+
+		arrayval = DatumGetArrayTypeP(key->sk_argument);
+
+		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+							 &elmlen, &elmbyval, &elmalign);
+
+		deconstruct_array(arrayval,
+						  ARR_ELEMTYPE(arrayval),
+						  elmlen, elmbyval, elmalign,
+						  &elem_values, &elem_nulls, &num_elems);
+
+		switch (key->sk_strategy)
+		{
+			case BTLessStrategyNumber:
+			case BTLessEqualStrategyNumber:
+
+				for (int j = 0; j < num_elems; j++)
+				{
+					Datum val = elem_values[j];
+
+					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+														 key->sk_strategy);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
+												val);
+
+					if (DatumGetBool(matches))
+						break;
+				}
+				break;
+			case BTEqualStrategyNumber:
+
+				/*
+				 * In the equality case (WHERE col = someval), we want to return
+				 * the current page range if the minimum value in the range <=
+				 * scan key, and the maximum value >= scan key.
+				 *
+				 * XXX For any of the array values.
+				 */
+				for (int j = 0; j < num_elems; j++)
+				{
+					Datum val = elem_values[j];
+
+					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+														 BTLessEqualStrategyNumber);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
+												val);
+					if (!DatumGetBool(matches))
+						continue;
+
+					/* max() >= scankey */
+					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+														 BTGreaterEqualStrategyNumber);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+												val);
+
+					if (DatumGetBool(matches))
+						break;
+				}
+				break;
+			case BTGreaterEqualStrategyNumber:
+			case BTGreaterStrategyNumber:
+
+				for (int j = 0; j < num_elems; j++)
+				{
+					Datum val = elem_values[j];
+
+					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+														 key->sk_strategy);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+												val);
+
+					if (DatumGetBool(matches))
+						break;
+				}
+				break;
+			default:
+				/* shouldn't happen */
+				elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+				matches = 0;
 				break;
-			/* max() >= scankey */
-			finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-												 BTGreaterEqualStrategyNumber);
-			matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
-										value);
-			break;
-		case BTGreaterEqualStrategyNumber:
-		case BTGreaterStrategyNumber:
-			finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-												 key->sk_strategy);
-			matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
-										value);
-			break;
-		default:
-			/* shouldn't happen */
-			elog(ERROR, "invalid strategy number %d", key->sk_strategy);
-			matches = 0;
-			break;
+		}
 	}
 
 	PG_RETURN_DATUM(matches);
-- 
2.39.1

From b567a8a34ea9b384cd2115700de56ea20b80510f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Fri, 10 Feb 2023 22:35:43 +0100
Subject: [PATCH 2/6] Sort the array in brinrescan() and do binsearch

This allows optimizing the check in brin_minmax_consistent.
---
 src/backend/access/brin/brin.c        |  69 ++++++++++
 src/backend/access/brin/brin_minmax.c | 175 +++++++++++++++++++++-----
 2 files changed, 210 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index a600f37c15..40a631d376 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -38,6 +38,7 @@
 #include "utils/datum.h"
 #include "utils/guc.h"
 #include "utils/index_selfuncs.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -721,6 +722,16 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	return totalpages * 10;
 }
 
+static int
+compare_array_values(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	SortSupport	ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
 /*
  * Re-initialize state for a BRIN index scan
  */
@@ -736,6 +747,64 @@ brinrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	 * here someday, too.
 	 */
 
+	/*
+	 * Presort the array for SK_SEARCHARRAY scan keys.
+	 *
+	 * We simply stash the value back into the ScanKey, because that way it's
+	 * transparent for the opclass. But there's a couple issues with this:
+	 *
+	 * 1) On rescans, we'll preprocess the array again, unnecessarily. And the
+	 * memory is likely retained too, so this may be a memory leak.
+	 *
+	 * 2) It assumes we want to preprocess the keys into the same data type.
+	 * That works for minmax (where we just sort the array), but may not work
+	 * for other opclasses - e.g. for "bloom" we might want pre-compute the
+	 * h1/h2 hashes (two uint64 values) and "inclusion" might try building
+	 * a union of the values, or something like that. And none of this fits
+	 * into an array of the same data type (which may be swapped into ScanKey).
+	 *
+	 * FIXME Needs to handle NULLs correctly.
+	 */
+	for (int i = 0; i < nscankeys; i++)
+	{
+		ScanKey		key = &scankey[i];
+		ArrayType  *arrayval;
+		int16		elmlen;
+		bool		elmbyval;
+		char		elmalign;
+		int			num_elems;
+		Datum	   *elem_values;
+		bool	   *elem_nulls;
+		TypeCacheEntry *type;
+		SortSupportData ssup;
+
+		if (!(key->sk_flags & SK_SEARCHNULL))
+			continue;
+
+		arrayval = DatumGetArrayTypeP(key->sk_argument);
+
+		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+							 &elmlen, &elmbyval, &elmalign);
+
+		deconstruct_array(arrayval,
+						  ARR_ELEMTYPE(arrayval),
+						  elmlen, elmbyval, elmalign,
+						  &elem_values, &elem_nulls, &num_elems);
+
+		type = lookup_type_cache(ARR_ELEMTYPE(arrayval), TYPECACHE_LT_OPR);
+
+		memset(&ssup, 0, sizeof(SortSupportData));
+		PrepareSortSupportFromOrderingOp(type->lt_opr, &ssup);
+
+		qsort_interruptible(elem_values, num_elems, sizeof(Datum),
+							compare_array_values, &ssup);
+
+		arrayval = construct_array_builtin(elem_values, num_elems,
+										   ARR_ELEMTYPE(arrayval));
+
+		key->sk_argument = PointerGetDatum(arrayval);
+	}
+
 	if (scankey && scan->numberOfKeys > 0)
 		memmove(scan->keyData, scankey,
 				scan->numberOfKeys * sizeof(ScanKeyData));
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index c054d8ff42..facc01901a 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -22,6 +22,7 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/sortsupport.h"
 
 typedef struct MinmaxOpaque
 {
@@ -127,6 +128,63 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(updated);
 }
 
+
+static int
+compare_array_values(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	SortSupport	ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * lower_boundary
+ *		Determine lowest index so that (values[index] >= minvalue).
+ *
+ * The array of values is expected to be sorted, so this is the first value
+ * that may fall into the [minvalue, maxvalue] range, as it exceeds minval.
+ * It's not guaranteed, though, as it might exceed maxvalue too.
+ */
+static int
+lower_boundary(Datum *values, int nvalues, Datum minvalue, SortSupport ssup)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything exceeds minval and might match */
+	if (compare_array_values(&minvalue, &values[start], ssup) <= 0)
+		return 0;
+
+	/* nothing could match */
+	if (compare_array_values(&minvalue, &values[end], ssup) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = compare_array_values(&minvalue, &values[midpoint], ssup);
+
+		if (r > 0)
+			start = Max(midpoint, start + 1);
+		else
+			end = midpoint;
+	}
+
+	/* the value should meet the (v >=minvalue) requirement */
+	Assert(compare_array_values(&values[start], &minvalue, ssup) >= 0);
+
+	/* we know start can't be 0, so it's legal to subtract 1 */
+	Assert(compare_array_values(&values[start-1], &minvalue, ssup) < 0);
+
+	return start;
+}
+
 /*
  * Given an index tuple corresponding to a certain page range and a scan key,
  * return whether the scan key is consistent with the index tuple's min/max
@@ -223,6 +281,10 @@ brin_minmax_consistent(PG_FUNCTION_ARGS)
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 
+		/*
+		 * FIXME We shouldn't deconstruct the array over and over for each page
+		 * range. We should preprocess it once, and then just use it.
+		 */
 		deconstruct_array(arrayval,
 						  ARR_ELEMTYPE(arrayval),
 						  elmlen, elmbyval, elmalign,
@@ -232,19 +294,14 @@ brin_minmax_consistent(PG_FUNCTION_ARGS)
 		{
 			case BTLessStrategyNumber:
 			case BTLessEqualStrategyNumber:
-
-				for (int j = 0; j < num_elems; j++)
-				{
-					Datum val = elem_values[j];
-
-					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-														 key->sk_strategy);
-					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
-												val);
-
-					if (DatumGetBool(matches))
-						break;
-				}
+				/*
+				 * Check the last (largest) value in the array - at least this
+				 * value has to exceed the range minval.
+				 */
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 key->sk_strategy);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
+											elem_values[num_elems-1]);
 				break;
 			case BTEqualStrategyNumber:
 
@@ -253,44 +310,94 @@ brin_minmax_consistent(PG_FUNCTION_ARGS)
 				 * the current page range if the minimum value in the range <=
 				 * scan key, and the maximum value >= scan key.
 				 *
-				 * XXX For any of the array values.
+				 * We do this in two phases. We check the array min/max values to see
+				 * if there even can be a matching value, and if yes we do a binary
+				 * search to find the first value that exceeds range minval. And then
+				 * we check if it actually matches the range.
+				 *
+				 * XXX The first phase is probably unnecessary, because lower_bound()
+				 * does pretty much exactly that too.
 				 */
-				for (int j = 0; j < num_elems; j++)
 				{
-					Datum val = elem_values[j];
+					Datum val;
+					SortSupportData ssup;
+					int			lower;
+					TypeCacheEntry *type;
+
+					/* Is the first (smallest) value after the BRIN range? */
+					val = elem_values[0];
 
 					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
 														 BTLessEqualStrategyNumber);
-					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
-												val);
+					matches = FunctionCall2Coll(finfo, colloid, val, column->bv_values[1]);
+
+					/* minval > max(range values) */
 					if (!DatumGetBool(matches))
-						continue;
+						break;
+
+					/* Is the last (largest) value before the BRIN range? */
+					val = elem_values[num_elems-1];
 
-					/* max() >= scankey */
 					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
 														 BTGreaterEqualStrategyNumber);
-					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
-												val);
+					matches = FunctionCall2Coll(finfo, colloid, val, column->bv_values[0]);
 
-					if (DatumGetBool(matches))
+					/* maxval < min(range values) */
+					if (!DatumGetBool(matches))
 						break;
-				}
-				break;
-			case BTGreaterEqualStrategyNumber:
-			case BTGreaterStrategyNumber:
 
-				for (int j = 0; j < num_elems; j++)
-				{
-					Datum val = elem_values[j];
+					/*
+					 * OK, there might be some values matching the range. We have
+					 * to search them one by one, or perhaps try binsearch.
+					 */
+					type = lookup_type_cache(ARR_ELEMTYPE(arrayval), TYPECACHE_LT_OPR);
+
+					memset(&ssup, 0, sizeof(SortSupportData));
+					PrepareSortSupportFromOrderingOp(type->lt_opr, &ssup);
 
+					lower = lower_boundary(elem_values, num_elems, column->bv_values[0], &ssup);
+
+					/* no elements can possibly match */
+					if (lower == num_elems)
+					{
+						matches = BoolGetDatum(false);
+						break;
+					}
+
+					/*
+					 * OK, the first element must match the upper boundary too
+					 * (if it does not, no following elements can).
+					 */
+					val = elem_values[lower];
+
+					/*
+					 * In the equality case (WHERE col = someval), we want to return
+					 * the current page range if the minimum value in the range <=
+					 * scan key, and the maximum value >= scan key.
+					 */
 					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
-														 key->sk_strategy);
-					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+														 BTLessEqualStrategyNumber);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[0],
 												val);
-
-					if (DatumGetBool(matches))
+					if (!DatumGetBool(matches))
 						break;
+					/* max() >= scankey */
+					finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+														 BTGreaterEqualStrategyNumber);
+					matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+												val);
+					break;
 				}
+			case BTGreaterEqualStrategyNumber:
+			case BTGreaterStrategyNumber:
+				/*
+				 * Check the first (smallest) value in the array - at least this
+				 * value has to be smaller than the range maxval.
+				 */
+				finfo = minmax_get_strategy_procinfo(bdesc, attno, subtype,
+													 key->sk_strategy);
+				matches = FunctionCall2Coll(finfo, colloid, column->bv_values[1],
+											elem_values[0]);
 				break;
 			default:
 				/* shouldn't happen */
-- 
2.39.1

From a0125596277647b593bc7d1b5df2bbbfe859cbd1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 11 Feb 2023 19:43:38 +0100
Subject: [PATCH 3/6] Support SK_SEARCHARRAY in BRIN minmax-multi

Similar approach to minmax, but the issues with deconstructing the array
over and over are even more serious.
---
 src/backend/access/brin/brin_minmax_multi.c | 388 ++++++++++++++++----
 1 file changed, 322 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 0ace6035be..43e9143f2a 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2562,6 +2562,63 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(modified);
 }
 
+
+static int
+compare_array_values(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	SortSupport	ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * lower_boundary
+ *		Determine lowest index so that (values[index] >= minvalue).
+ *
+ * The array of values is expected to be sorted, so this is the first value
+ * that may fall into the [minvalue, maxvalue] range, as it exceeds minval.
+ * It's not guaranteed, though, as it might exceed maxvalue too.
+ */
+static int
+lower_boundary(Datum *values, int nvalues, Datum minvalue, SortSupport ssup)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything exceeds minval and might match */
+	if (compare_array_values(&minvalue, &values[start], ssup) <= 0)
+		return 0;
+
+	/* nothing could match */
+	if (compare_array_values(&minvalue, &values[end], ssup) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = compare_array_values(&minvalue, &values[midpoint], ssup);
+
+		if (r > 0)
+			start = Max(midpoint, start + 1);
+		else
+			end = midpoint;
+	}
+
+	/* the value should meet the (v >=minvalue) requirement */
+	Assert(compare_array_values(&values[start], &minvalue, ssup) >= 0);
+
+	/* we know start can't be 0, so it's legal to subtract 1 */
+	Assert(compare_array_values(&values[start-1], &minvalue, ssup) < 0);
+
+	return start;
+}
+
 /*
  * Given an index tuple corresponding to a certain page range and a scan key,
  * return whether the scan key is consistent with the index tuple's min/max
@@ -2591,6 +2648,15 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
 	serialized = (SerializedRanges *) PG_DETOAST_DATUM(column->bv_values[0]);
 	ranges = brin_range_deserialize(serialized->maxvalues, serialized);
 
+	/*
+	 * XXX Would it make sense to have a quick initial check on the whole
+	 * summary? We know most page ranges are not expected to match, and we
+	 * know the ranges/values are sorted so we could check global min/max
+	 * (essentially what regular minmax is doing) and bail if no match is
+	 * possible. That should be cheap and might save a lot on inspecting
+	 * the individual ranges/values.
+	 */
+
 	/* inspect the ranges, and for each one evaluate the scan keys */
 	for (rangeno = 0; rangeno < ranges->nranges; rangeno++)
 	{
@@ -2611,67 +2677,205 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
 			attno = key->sk_attno;
 			subtype = key->sk_subtype;
 			value = key->sk_argument;
-			switch (key->sk_strategy)
+
+			if (likely(!(key->sk_flags & SK_SEARCHARRAY)))
 			{
-				case BTLessStrategyNumber:
-				case BTLessEqualStrategyNumber:
-					finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
-															   key->sk_strategy);
-					/* first value from the array */
-					matches = FunctionCall2Coll(finfo, colloid, minval, value);
-					break;
+				switch (key->sk_strategy)
+				{
+					case BTLessStrategyNumber:
+					case BTLessEqualStrategyNumber:
+						finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																   key->sk_strategy);
+						/* first value from the array */
+						matches = FunctionCall2Coll(finfo, colloid, minval, value);
+						break;
 
-				case BTEqualStrategyNumber:
-					{
-						Datum		compar;
-						FmgrInfo   *cmpFn;
+					case BTEqualStrategyNumber:
+						{
+							Datum		compar;
+							FmgrInfo   *cmpFn;
+
+							/* by default this range does not match */
+							matches = false;
+
+							/*
+							 * Otherwise, need to compare the new value with
+							 * boundaries of all the ranges. First check if it's
+							 * less than the absolute minimum, which is the first
+							 * value in the array.
+							 */
+							cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTGreaterStrategyNumber);
+							compar = FunctionCall2Coll(cmpFn, colloid, minval, value);
+
+							/* smaller than the smallest value in this range */
+							if (DatumGetBool(compar))
+								break;
+
+							cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTLessStrategyNumber);
+							compar = FunctionCall2Coll(cmpFn, colloid, maxval, value);
+
+							/* larger than the largest value in this range */
+							if (DatumGetBool(compar))
+								break;
+
+							/*
+							 * We haven't managed to eliminate this range, so
+							 * consider it matching.
+							 */
+							matches = true;
 
-						/* by default this range does not match */
-						matches = false;
+							break;
+						}
+					case BTGreaterEqualStrategyNumber:
+					case BTGreaterStrategyNumber:
+						finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																   key->sk_strategy);
+						/* last value from the array */
+						matches = FunctionCall2Coll(finfo, colloid, maxval, value);
+						break;
 
-						/*
-						 * Otherwise, need to compare the new value with
-						 * boundaries of all the ranges. First check if it's
-						 * less than the absolute minimum, which is the first
-						 * value in the array.
-						 */
-						cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
-																   BTGreaterStrategyNumber);
-						compar = FunctionCall2Coll(cmpFn, colloid, minval, value);
+					default:
+						/* shouldn't happen */
+						elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+						matches = 0;
+						break;
+				}
+			}
+			else
+			{
+				/*
+				 * FIXME This is really wrong, because it deserializes the
+				 * array over and over for each value in the minmax-multi
+				 * summary range.
+				 *
+				 * FIXME In fact, this is even worse than for brin_minmax.c
+				 * because we deconstruct it for every range in the summary
+				 * (so if there are 32 values, that's 16 ranges, and we'll
+				 * deconstruct it again for each of those).
+				 *
+				 * XXX We could deconstruct it once, when we need it for the
+				 * first time. But even better we should do it only once while
+				 * preprocessing the scan keys.
+				 */
+				ArrayType  *arrayval;
+				int16		elmlen;
+				bool		elmbyval;
+				char		elmalign;
+				int			num_elems;
+				Datum	   *elem_values;
+				bool	   *elem_nulls;
 
-						/* smaller than the smallest value in this range */
-						if (DatumGetBool(compar))
-							break;
+				arrayval = DatumGetArrayTypeP(key->sk_argument);
 
-						cmpFn = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
-																   BTLessStrategyNumber);
-						compar = FunctionCall2Coll(cmpFn, colloid, maxval, value);
+				get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+									 &elmlen, &elmbyval, &elmalign);
 
-						/* larger than the largest value in this range */
-						if (DatumGetBool(compar))
-							break;
+				deconstruct_array(arrayval,
+								  ARR_ELEMTYPE(arrayval),
+								  elmlen, elmbyval, elmalign,
+								  &elem_values, &elem_nulls, &num_elems);
+
+				switch (key->sk_strategy)
+				{
+					case BTLessStrategyNumber:
+					case BTLessEqualStrategyNumber:
+						finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																   key->sk_strategy);
+						/* first value from the array */
+						matches = FunctionCall2Coll(finfo, colloid, minval,
+													elem_values[num_elems-1]);
+						break;
+
+					case BTEqualStrategyNumber:
 
 						/*
-						 * We haven't managed to eliminate this range, so
-						 * consider it matching.
+						 * See brin_minmax.c for description of what this is doing.
 						 */
-						matches = true;
-
+						{
+							Datum val;
+							SortSupportData ssup;
+							int			lower;
+							TypeCacheEntry *type;
+
+							/* Is the first (smallest) value after the BRIN range? */
+							val = elem_values[0];
+
+							finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTLessEqualStrategyNumber);
+							matches = FunctionCall2Coll(finfo, colloid, val, maxval);
+
+							/* minval > max(range values) */
+							if (!DatumGetBool(matches))
+								break;
+
+							/* Is the last (largest) value before the BRIN range? */
+							val = elem_values[num_elems-1];
+
+							finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTGreaterEqualStrategyNumber);
+							matches = FunctionCall2Coll(finfo, colloid, val, minval);
+
+							/* maxval < min(range values) */
+							if (!DatumGetBool(matches))
+								break;
+
+							/*
+							 * OK, there might be some values matching the range. We have
+							 * to search them one by one, or perhaps try binsearch.
+							 */
+							type = lookup_type_cache(ARR_ELEMTYPE(arrayval), TYPECACHE_LT_OPR);
+
+							memset(&ssup, 0, sizeof(SortSupportData));
+							PrepareSortSupportFromOrderingOp(type->lt_opr, &ssup);
+
+							lower = lower_boundary(elem_values, num_elems, minval, &ssup);
+
+							/* no elements can possibly match */
+							if (lower == num_elems)
+							{
+								matches = BoolGetDatum(false);
+								break;
+							}
+
+							/*
+							 * OK, the first element must match the upper boundary too
+							 * (if it does not, no following elements can).
+							 */
+							val = elem_values[lower];
+
+							/*
+							 * In the equality case (WHERE col = someval), we want to return
+							 * the current page range if the minimum value in the range <=
+							 * scan key, and the maximum value >= scan key.
+							 */
+							finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTLessEqualStrategyNumber);
+							matches = FunctionCall2Coll(finfo, colloid, minval, val);
+							if (!DatumGetBool(matches))
+								break;
+							/* max() >= scankey */
+							finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																	   BTGreaterEqualStrategyNumber);
+							matches = FunctionCall2Coll(finfo, colloid, maxval, val);
+							break;
+						}
+					case BTGreaterEqualStrategyNumber:
+					case BTGreaterStrategyNumber:
+						finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																   key->sk_strategy);
+						/* last value from the array */
+						matches = FunctionCall2Coll(finfo, colloid, maxval,
+													elem_values[0]);
 						break;
-					}
-				case BTGreaterEqualStrategyNumber:
-				case BTGreaterStrategyNumber:
-					finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
-															   key->sk_strategy);
-					/* last value from the array */
-					matches = FunctionCall2Coll(finfo, colloid, maxval, value);
-					break;
 
-				default:
-					/* shouldn't happen */
-					elog(ERROR, "invalid strategy number %d", key->sk_strategy);
-					matches = 0;
-					break;
+					default:
+						/* shouldn't happen */
+						elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+						matches = 0;
+						break;
+				}
 			}
 
 			/* the range has to match all the scan keys */
@@ -2713,28 +2917,80 @@ brin_minmax_multi_consistent(PG_FUNCTION_ARGS)
 			attno = key->sk_attno;
 			subtype = key->sk_subtype;
 			value = key->sk_argument;
-			switch (key->sk_strategy)
+			if (likely(!(key->sk_flags & SK_SEARCHARRAY)))
 			{
-				case BTLessStrategyNumber:
-				case BTLessEqualStrategyNumber:
-				case BTEqualStrategyNumber:
-				case BTGreaterEqualStrategyNumber:
-				case BTGreaterStrategyNumber:
-
-					finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
-															   key->sk_strategy);
-					matches = FunctionCall2Coll(finfo, colloid, val, value);
-					break;
+				switch (key->sk_strategy)
+				{
+					case BTLessStrategyNumber:
+					case BTLessEqualStrategyNumber:
+					case BTEqualStrategyNumber:
+					case BTGreaterEqualStrategyNumber:
+					case BTGreaterStrategyNumber:
+
+						finfo = minmax_multi_get_strategy_procinfo(bdesc, attno, subtype,
+																   key->sk_strategy);
+						matches = FunctionCall2Coll(finfo, colloid, val, value);
+						break;
 
-				default:
-					/* shouldn't happen */
-					elog(ERROR, "invalid strategy number %d", key->sk_strategy);
-					matches = 0;
-					break;
+					default:
+						/* shouldn't happen */
+						elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+						matches = 0;
+						break;
+				}
+			}
+			else
+			{
+				/*
+				 * FIXME This is really wrong, because it deserializes the
+				 * array over and over for each value in the minmax-multi
+				 * summary.
+				 */
+				ArrayType  *arrayval;
+				int16		elmlen;
+				bool		elmbyval;
+				char		elmalign;
+				int			num_elems;
+				Datum	   *elem_values;
+				bool	   *elem_nulls;
+
+				SortSupportData ssup;
+				int			lower;
+				TypeCacheEntry *type;
+
+				arrayval = DatumGetArrayTypeP(key->sk_argument);
+
+				get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+									 &elmlen, &elmbyval, &elmalign);
+
+				deconstruct_array(arrayval,
+								  ARR_ELEMTYPE(arrayval),
+								  elmlen, elmbyval, elmalign,
+								  &elem_values, &elem_nulls, &num_elems);
+
+				/* assume not maches */
+				matches = false;
+
+				/*
+				 * OK, there might be some values matching the range. We have
+				 * to search them one by one, or perhaps try binsearch.
+				 */
+				type = lookup_type_cache(ARR_ELEMTYPE(arrayval), TYPECACHE_LT_OPR);
+
+				memset(&ssup, 0, sizeof(SortSupportData));
+				PrepareSortSupportFromOrderingOp(type->lt_opr, &ssup);
+
+				lower = lower_boundary(elem_values, num_elems, value, &ssup);
+
+				if ((lower < num_elems) &&
+					(compare_array_values(&elem_values[lower], &value, &ssup) == 0))
+				{
+					matches = true;
+				}
 			}
 
 			/* the range has to match all the scan keys */
-			matching &= DatumGetBool(matches);
+			matching &= matches;
 
 			/* once we find a non-matching key, we're done */
 			if (!matching)
-- 
2.39.1

From f9d5aee070eef54d5898d4ee7862d7900aa59209 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 11 Feb 2023 15:23:25 +0100
Subject: [PATCH 4/6] Support SK_SEARCHARRAY in BRIN inclusion

---
 src/backend/access/brin/brin_inclusion.c | 177 ++++++++++++++++-------
 1 file changed, 123 insertions(+), 54 deletions(-)

diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 248116c149..935452d09d 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -30,6 +30,7 @@
 #include "access/skey.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
@@ -239,43 +240,20 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
 }
 
 /*
- * BRIN inclusion consistent function
- *
- * We're no longer dealing with NULL keys in the consistent function, that is
- * now handled by the AM code. That means we should not get any all-NULL ranges
- * either, because those can't be consistent with regular (not [IS] NULL) keys.
+ * Check consistency of a single scalar value with the BRIN range.
  *
- * All of the strategies are optional.
+ * Called for both scalar scankeys and for each value in SK_SEARCHARRAY.
  */
-Datum
-brin_inclusion_consistent(PG_FUNCTION_ARGS)
+static bool
+brin_inclusion_consistent_value(BrinDesc *bdesc, BrinValues *column,
+								AttrNumber attno,
+								StrategyNumber strategy, Oid subtype,
+								Oid colloid, Datum unionval, Datum query)
 {
-	BrinDesc   *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
-	BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
-	ScanKey		key = (ScanKey) PG_GETARG_POINTER(2);
-	Oid			colloid = PG_GET_COLLATION(),
-				subtype;
-	Datum		unionval;
-	AttrNumber	attno;
-	Datum		query;
 	FmgrInfo   *finfo;
 	Datum		result;
 
-	/* This opclass uses the old signature with only three arguments. */
-	Assert(PG_NARGS() == 3);
-
-	/* Should not be dealing with all-NULL ranges. */
-	Assert(!column->bv_allnulls);
-
-	/* It has to be checked, if it contains elements that are not mergeable. */
-	if (DatumGetBool(column->bv_values[INCLUSION_UNMERGEABLE]))
-		PG_RETURN_BOOL(true);
-
-	attno = key->sk_attno;
-	subtype = key->sk_subtype;
-	query = key->sk_argument;
-	unionval = column->bv_values[INCLUSION_UNION];
-	switch (key->sk_strategy)
+	switch (strategy)
 	{
 			/*
 			 * Placement strategies
@@ -294,49 +272,49 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTOverRightStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTOverLeftStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTRightStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTOverRightStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTLeftStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTRightStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTOverLeftStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTBelowStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTOverAboveStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTOverBelowStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTAboveStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTOverAboveStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTBelowStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		case RTAboveStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTOverBelowStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 			/*
 			 * Overlap and contains strategies
@@ -352,9 +330,9 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 		case RTSubStrategyNumber:
 		case RTSubEqualStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
-													key->sk_strategy);
+													strategy);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_DATUM(result);
+			return (DatumGetBool(result));
 
 			/*
 			 * Contained by strategies
@@ -374,9 +352,9 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 													RTOverlapStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
 			if (DatumGetBool(result))
-				PG_RETURN_BOOL(true);
+				return (true);
 
-			PG_RETURN_DATUM(column->bv_values[INCLUSION_CONTAINS_EMPTY]);
+			return (column->bv_values[INCLUSION_CONTAINS_EMPTY]);
 
 			/*
 			 * Adjacent strategy
@@ -393,12 +371,12 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 													RTOverlapStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
 			if (DatumGetBool(result))
-				PG_RETURN_BOOL(true);
+				return (true);
 
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTAdjacentStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_DATUM(result);
+			return (DatumGetBool(result));
 
 			/*
 			 * Basic comparison strategies
@@ -428,9 +406,9 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 													RTRightStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
 			if (!DatumGetBool(result))
-				PG_RETURN_BOOL(true);
+				return (true);
 
-			PG_RETURN_DATUM(column->bv_values[INCLUSION_CONTAINS_EMPTY]);
+			return (column->bv_values[INCLUSION_CONTAINS_EMPTY]);
 
 		case RTSameStrategyNumber:
 		case RTEqualStrategyNumber:
@@ -438,30 +416,121 @@ brin_inclusion_consistent(PG_FUNCTION_ARGS)
 													RTContainsStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
 			if (DatumGetBool(result))
-				PG_RETURN_BOOL(true);
+				return (true);
 
-			PG_RETURN_DATUM(column->bv_values[INCLUSION_CONTAINS_EMPTY]);
+			return (column->bv_values[INCLUSION_CONTAINS_EMPTY]);
 
 		case RTGreaterEqualStrategyNumber:
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTLeftStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
 			if (!DatumGetBool(result))
-				PG_RETURN_BOOL(true);
+				return (true);
 
-			PG_RETURN_DATUM(column->bv_values[INCLUSION_CONTAINS_EMPTY]);
+			return (column->bv_values[INCLUSION_CONTAINS_EMPTY]);
 
 		case RTGreaterStrategyNumber:
 			/* no need to check for empty elements */
 			finfo = inclusion_get_strategy_procinfo(bdesc, attno, subtype,
 													RTLeftStrategyNumber);
 			result = FunctionCall2Coll(finfo, colloid, unionval, query);
-			PG_RETURN_BOOL(!DatumGetBool(result));
+			return (!DatumGetBool(result));
 
 		default:
 			/* shouldn't happen */
-			elog(ERROR, "invalid strategy number %d", key->sk_strategy);
-			PG_RETURN_BOOL(false);
+			elog(ERROR, "invalid strategy number %d", strategy);
+			return (false);
+	}
+}
+
+/*
+ * BRIN inclusion consistent function
+ *
+ * We're no longer dealing with NULL keys in the consistent function, that is
+ * now handled by the AM code. That means we should not get any all-NULL ranges
+ * either, because those can't be consistent with regular (not [IS] NULL) keys.
+ *
+ * All of the strategies are optional.
+ */
+Datum
+brin_inclusion_consistent(PG_FUNCTION_ARGS)
+{
+	BrinDesc   *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
+	BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
+	ScanKey		key = (ScanKey) PG_GETARG_POINTER(2);
+	Oid			colloid = PG_GET_COLLATION(),
+				subtype;
+	Datum		unionval;
+	AttrNumber	attno;
+	Datum		query;
+
+	/* This opclass uses the old signature with only three arguments. */
+	Assert(PG_NARGS() == 3);
+
+	/* Should not be dealing with all-NULL ranges. */
+	Assert(!column->bv_allnulls);
+
+	/* It has to be checked, if it contains elements that are not mergeable. */
+	if (DatumGetBool(column->bv_values[INCLUSION_UNMERGEABLE]))
+		PG_RETURN_BOOL(true);
+
+	attno = key->sk_attno;
+	subtype = key->sk_subtype;
+	query = key->sk_argument;
+	unionval = column->bv_values[INCLUSION_UNION];
+
+	/*
+	 * For regular (scalar) scan keys, we simply compare the value to the
+	 * range min/max values, and we're done. For SK_SEARCHARRAY keys we
+	 * need to deparse the array and loop through the values.
+	 */
+	if (likely(!(key->sk_flags & SK_SEARCHARRAY)))
+	{
+		bool tmp;
+
+		tmp = brin_inclusion_consistent_value(bdesc, column, attno,
+											  key->sk_strategy,
+											  subtype, colloid,
+											  unionval, query);
+		PG_RETURN_BOOL(tmp);
+	}
+	else
+	{
+		ArrayType  *arrayval;
+		int16		elmlen;
+		bool		elmbyval;
+		char		elmalign;
+		int			num_elems;
+		Datum	   *elem_values;
+		bool	   *elem_nulls;
+		bool		matches = false;
+
+		arrayval = DatumGetArrayTypeP(key->sk_argument);
+
+		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+							 &elmlen, &elmbyval, &elmalign);
+
+		deconstruct_array(arrayval,
+						  ARR_ELEMTYPE(arrayval),
+						  elmlen, elmbyval, elmalign,
+						  &elem_values, &elem_nulls, &num_elems);
+
+		/* have to loop through all elements, having them sorted does not help */
+		for (int i = 0; i < num_elems; i++)
+		{
+			Datum 	query_element = elem_values[i];
+
+			matches = brin_inclusion_consistent_value(bdesc, column, attno,
+													  key->sk_strategy,
+													  subtype, colloid,
+													  unionval, query_element);
+
+			if (matches)
+				break;
+		}
+
+		/* we could get here for empty array, e.g. with "@> '{}'::point[]" */
+		PG_RETURN_BOOL(matches);
 	}
 }
 
-- 
2.39.1

From c67e0d995d1cf1c104270974624834940fa5ee16 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 11 Feb 2023 20:50:03 +0100
Subject: [PATCH 5/6] Support SK_SEARCHARRAY in BRIN bloom

---
 src/backend/access/brin/brin_bloom.c | 100 ++++++++++++++++++++++-----
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index e4953a9d37..c2939ac809 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -125,6 +125,7 @@
 #include "access/stratnum.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_amop.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
@@ -596,26 +597,87 @@ brin_bloom_consistent(PG_FUNCTION_ARGS)
 		attno = key->sk_attno;
 		value = key->sk_argument;
 
-		switch (key->sk_strategy)
+		if (likely(!(key->sk_flags & SK_SEARCHARRAY)))
 		{
-			case BloomEqualStrategyNumber:
-
-				/*
-				 * In the equality case (WHERE col = someval), we want to
-				 * return the current page range if the minimum value in the
-				 * range <= scan key, and the maximum value >= scan key.
-				 */
-				finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
-
-				hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid, value));
-				matches &= bloom_contains_value(filter, hashValue);
-
-				break;
-			default:
-				/* shouldn't happen */
-				elog(ERROR, "invalid strategy number %d", key->sk_strategy);
-				matches = 0;
-				break;
+			switch (key->sk_strategy)
+			{
+				case BloomEqualStrategyNumber:
+
+					/*
+					 * In the equality case (WHERE col = someval), we want to
+					 * return the current page range if the minimum value in the
+					 * range <= scan key, and the maximum value >= scan key.
+					 */
+					finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
+
+					hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid, value));
+					matches &= bloom_contains_value(filter, hashValue);
+
+					break;
+				default:
+					/* shouldn't happen */
+					elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+					matches = 0;
+					break;
+			}
+		}
+		else
+		{
+			ArrayType  *arrayval;
+			int16		elmlen;
+			bool		elmbyval;
+			char		elmalign;
+			int			num_elems;
+			Datum	   *elem_values;
+			bool	   *elem_nulls;
+
+			arrayval = DatumGetArrayTypeP(key->sk_argument);
+
+			get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
+								 &elmlen, &elmbyval, &elmalign);
+
+			deconstruct_array(arrayval,
+							  ARR_ELEMTYPE(arrayval),
+							  elmlen, elmbyval, elmalign,
+							  &elem_values, &elem_nulls, &num_elems);
+
+			switch (key->sk_strategy)
+			{
+				case BloomEqualStrategyNumber:
+					{
+						/* assume no match */
+						matches = BoolGetDatum(false);
+
+						/*
+						 * In the equality case (WHERE col = someval), we want to
+						 * return the current page range if the minimum value in the
+						 * range <= scan key, and the maximum value >= scan key.
+						 */
+						finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
+
+						for (int i = 0; i < num_elems; i++)
+						{
+							bool	tmp = false;
+							Datum	element = elem_values[i];
+
+							hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid,
+																		 element));
+							tmp = bloom_contains_value(filter, hashValue);
+
+							if (DatumGetBool(tmp))
+							{
+								matches = BoolGetDatum(true);
+								break;
+							}
+						}
+					}
+					break;
+				default:
+					/* shouldn't happen */
+					elog(ERROR, "invalid strategy number %d", key->sk_strategy);
+					matches = 0;
+					break;
+			}
 		}
 
 		if (!matches)
-- 
2.39.1

From d0dead5dedc0f91d7279d351a6432e2592f8d578 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 11 Feb 2023 23:44:21 +0100
Subject: [PATCH 6/6] PoC caching hash values in BRIN bloom

Cache of hash values initialized when processing the first page range.
The cache is not
---
 src/backend/access/brin/brin_bloom.c | 59 ++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index c2939ac809..b57db6be14 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -129,6 +129,7 @@
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -260,6 +261,8 @@ typedef struct BloomFilter
 	char		data[FLEXIBLE_ARRAY_MEMBER];
 } BloomFilter;
 
+static uint64 *cache_h1 = NULL;
+static uint64 *cache_h2 = NULL;
 
 /*
  * bloom_init
@@ -405,6 +408,32 @@ bloom_contains_value(BloomFilter *filter, uint32 value)
 	return true;
 }
 
+/*
+ * bloom_contains_value
+ * 		Check if the bloom filter contains a particular value.
+ */
+static bool
+bloom_contains_hashes(BloomFilter *filter, uint64 h1, uint64 h2)
+{
+	int			i;
+
+	/* compute the requested number of hashes */
+	for (i = 0; i < filter->nhashes; i++)
+	{
+		/* h1 + h2 + f(i) */
+		uint32		h = (h1 + i * h2) % filter->nbits;
+		uint32		byte = (h / 8);
+		uint32		bit = (h % 8);
+
+		/* if the bit is not set, the value is not there */
+		if (!(filter->data[byte] & (0x01 << bit)))
+			return false;
+	}
+
+	/* all hashes found in bloom filter */
+	return true;
+}
+
 typedef struct BloomOpaque
 {
 	/*
@@ -440,6 +469,9 @@ brin_bloom_opcinfo(PG_FUNCTION_ARGS)
 		MAXALIGN((char *) result + SizeofBrinOpcInfo(1));
 	result->oi_typcache[0] = lookup_type_cache(PG_BRIN_BLOOM_SUMMARYOID, 0);
 
+	cache_h1 = NULL;
+	cache_h2 = NULL;
+
 	PG_RETURN_POINTER(result);
 }
 
@@ -641,6 +673,28 @@ brin_bloom_consistent(PG_FUNCTION_ARGS)
 							  elmlen, elmbyval, elmalign,
 							  &elem_values, &elem_nulls, &num_elems);
 
+			if (cache_h1 == NULL && cache_h2 == NULL)
+			{
+				MemoryContext oldcxt;
+
+				oldcxt = MemoryContextSwitchTo(bdesc->bd_context);
+				cache_h1 = palloc0(num_elems * sizeof(uint64));
+				cache_h2 = palloc0(num_elems * sizeof(uint64));
+				MemoryContextSwitchTo(oldcxt);
+
+				finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
+
+				for (int i = 0; i < num_elems; i++)
+				{
+					Datum	element = elem_values[i];
+
+					hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid, element));
+
+					cache_h1[i] = hash_bytes_uint32_extended(hashValue, BLOOM_SEED_1) % filter->nbits;
+					cache_h2[i] = hash_bytes_uint32_extended(hashValue, BLOOM_SEED_2) % filter->nbits;
+				}
+			}
+
 			switch (key->sk_strategy)
 			{
 				case BloomEqualStrategyNumber:
@@ -658,11 +712,8 @@ brin_bloom_consistent(PG_FUNCTION_ARGS)
 						for (int i = 0; i < num_elems; i++)
 						{
 							bool	tmp = false;
-							Datum	element = elem_values[i];
 
-							hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid,
-																		 element));
-							tmp = bloom_contains_value(filter, hashValue);
+							tmp = bloom_contains_hashes(filter, cache_h1[i], cache_h2[i]);
 
 							if (DatumGetBool(tmp))
 							{
-- 
2.39.1

Reply via email to