Hi

And here is a proposed code change, alternative to the doc change.

I hope that it is ok to relax the restriction like so, as
`cost_bitmap_node_and` is already resigned to inputs not being independent:
* We estimate AND selectivity on the assumption that the inputs are
* independent.  This is probably often wrong, but we don't have the info
* to do better.

I've ran "make check", and one test have changed (the last one in the "test
extraction of restriction OR clauses from join OR clause" group - in an
expected way, IMO).
I am attaching the regression.diff for convenience.

Is this a generally desirable change to pursue?

Best regards, Dmytro


On Wed, Feb 26, 2025 at 7:31 PM Dmytro Astapov <dasta...@gmail.com> wrote:

> Hi!
>
> I am (still) very unsure if the code change I mentioned will make sense,
> but documentation chage could perhaps look like something along these lines?
>
>
>
> Best regards, Dmytro
>
>
> On Tue, Feb 25, 2025 at 9:14 PM Dmytro Astapov <dasta...@gmail.com> wrote:
>
>> Hi!
>>
>> I've been investigating why postgres does not do BitmapAnd of two
>> well-suited indexes, and reading indxpath.c
>>
>> In my case, there is a table (d date, col1 int, col2 int) -- types not
>> really important -- and there are two indices on (d,col1) and (d, col2).
>>
>> For queries that do WHERE d>=X AND col1=Y AND col2=Z postgres will never
>> BitmapAnd those two indices because both indexes include (d) and we have a
>> condition on (d). Here is a full example, which could also be seen here:
>> https://www.db-fiddle.com/f/uPLx5bRtDEoZw3Dx4kkwKh/0:
>>
>> begin;
>>
>> CREATE TABLE test_table (
>>     d date,
>>     col1 int,
>>     col2 int
>> );
>>
>> INSERT INTO test_table (d, col1, col2)
>> SELECT
>>     d.date,
>>     c1.val as col1,
>>     c2.val as col2
>> FROM
>>     generate_series(
>>         '2023-01-01'::date,
>>         '2023-12-31'::date,
>>         '1 day'::interval
>>     ) as d(date),
>>     generate_series(1, 1000) as c1(val),
>>     generate_series(1, 1000) as c2(val)
>> WHERE
>>     random() < 0.001;
>>
>> create index on test_table(col1,d);
>> create index on test_table(col2,d);
>>
>> -- This uses BitmapAnd
>> explain select * from test_table where col1=123 and col2=321;
>>
>> -- This does not use BitmapAnd, even though it could!
>> explain select * from test_table where col1=123 and col2=321 and d >=
>> '2023-05-05';
>>
>> I checked that BitmapAnd is rejected by this
>> <https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/indxpath.c#L1878>
>> line in choose_bitmap_and:
>>
>>    if (bms_overlap(pathinfo->clauseids, clauseidsofar))
>>       continue; /* consider it redundant */
>>
>> There is a comment on choose_bitmap_and that explains the rationale of
>> this check, but reading it I can't help but feel that what the comment
>> describes is this condition:
>>
>>    if (bms_is_subset(pathinfo->clauseids, clauseidsofar))
>>       continue; /* consider it redundant */
>>
>> And indeed, in my (admittedly not super-extensive) testing changing
>> bms_overlap to bms_is_subset leads to better faster execution plans.
>>
>> Is it possible that this condition could thus be relaxed?
>>
>> Even if I am wrong, and the condition absolutely should be bms_overlap, I
>> feel that this restriction is very very hard to discover and perhaps
>> https://www.postgresql.org/docs/current/indexes-bitmap-scans.html should
>> mention that compound indexes that have columns in common will never be
>> combined?
>>
>> Best regards, Dmytro
>>
>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a43ca16d68..f34d82f2e9 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1767,16 +1767,16 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
 	 * because of the prefiltering step.  The cheapest of these is returned.
 	 *
 	 * We will only consider AND combinations in which no two indexes use the
-	 * same WHERE clause.  This is a bit of a kluge: it's needed because
-	 * costsize.c and clausesel.c aren't very smart about redundant clauses.
-	 * They will usually double-count the redundant clauses, producing a
-	 * too-small selectivity that makes a redundant AND step look like it
-	 * reduces the total cost.  Perhaps someday that code will be smarter and
-	 * we can remove this limitation.  (But note that this also defends
-	 * against flat-out duplicate input paths, which can happen because
-	 * match_join_clauses_to_index will find the same OR join clauses that
-	 * extract_restriction_or_clauses has pulled OR restriction clauses out
-	 * of.)
+	 * WHERE clauses that are subset of another. This is a bit of a kluge:
+	 * it's needed because costsize.c and clausesel.c aren't very smart about
+	 * redundant clauses. They will usually double-count the redundant
+	 * clauses, producing a too-small selectivity that makes a redundant AND
+	 * step look like it reduces the total cost. Perhaps someday that code
+	 * will be smarter and we can remove this limitation. (But note that this
+	 * also defends against flat-out duplicate input paths, which can happen
+	 * because match_join_clauses_to_index will find the same OR join clauses
+	 * that extract_restriction_or_clauses has pulled OR restriction clauses
+	 * out of.)
 	 *
 	 * For the same reason, we reject AND combinations in which an index
 	 * predicate clause duplicates another clause.  Here we find it necessary
@@ -1875,7 +1875,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
 
 			pathinfo = pathinfoarray[j];
 			/* Check for redundancy */
-			if (bms_overlap(pathinfo->clauseids, clauseidsofar))
+			if (bms_is_subset(pathinfo->clauseids, clauseidsofar))
 				continue;		/* consider it redundant */
 			if (pathinfo->preds)
 			{

Attachment: regression.diffs
Description: Binary data

Reply via email to