Hi Ilia, > 1. Cap selectivity with the last MCV item's frequency
I've attached a draft patch. It's split into 4 commits to make it easier to review. It adds the MCV cap for AND (equality, IN, ANY) clauses. Looking forward to your feedback. On 5/19/26 0:13, Ilia Evdokimov wrote: > Postgres only uses multi-column MCVs when the value we are looking for is > in the list. If not, it falls back into individual independent statistics > to estimate selectivity. > However, a miss in a multi-column MCV list still yields valuable > information that it currently throws away: we know that the combination's > frequency is strictly bounded by the frequency of the last (least common) > item in that MCV list. > > LGTM. If the multicolumn MCV statistics exists and the clause combination > is absent from the MCV-list, we can use the least frequent MCV item as an > upper bound. BTW, this only applies to AND-clauses. > Given that the inclusion-exclusion principle (P(A OR B) = P(A) + P(B) - P(A AND B)) is used to calculate OR paths, we could use the same cap to improve the overlap estimation (P(A AND B)). 2. Estimate selectivity as Postgres does for single-column values not in > MCVs > > ============================================================================= > While that significantly improves estimations, we could mirror what > Postgres already does for individual MCVs. Quote from the official > documentation: > > The approach is to use the fact that the value is not in the list, > combined with the knowledge of the frequencies for all of the MCVs: > > That is, add up all the frequencies for the MCVs and subtract them from > one, then divide by the number of other distinct values. > > To achieve this, we need to store an ndistinct estimation alongside the > MCVs that can be used for partial or entire column match. > > P(1, 1, 1) = (1 - sum(MCVs)) / (ndistinct(col_a, col_b, col_c) - > MCV_list_size) > ... > I think this is a cheap way to prevent bad estimations. The storage > overhead of adding an ndistinct field is trivial compared to the MCV list > itself, and the O(1) arithmetic during planning adds no measurable > overhead. I look forward to your feedback before drafting a patch. > > For this, the ndistinct extended statistics already exist. If both MCV and > ndistinct statistics are present on the same column set, the formula is > correct. There are already places in the code that compute ndistinct for > columns without extended ndistinct statistics (see estimate_num_groups) - > but it is worth thinking carefully about whether the added complexity is > justified before going down that path. > I think that the implementation would look similar to the attached patch. The only added complexity is getting the ndistinct estimation. There are 2 ways: - Rely on the extended ndistinct statistic if it exists - Calculate the ndistinct(col_a, col_b, col_c) statistic as part of the MCV. The storage cost is negligible compared to the MCV list and it keeps the MCV statistic complete, so it doesn't need to check if the ndistinct statistic exists. That would be a change on the MCV struct, meaning a change in the mcvlist->type. I can start working on a proposal for this second part. Let me know what you think. Best regards, Enrique.
From 145c56ff65785eb559e720f3c04b908a57c61940 Mon Sep 17 00:00:00 2001 From: Enrique Sanchez Cardoso <[email protected]> Date: Sun, 24 May 2026 19:09:26 +0200 Subject: [PATCH 4/4] Extend multi-column MCV cap to AND clauses inside OR expressions --- src/backend/statistics/extended_stats.c | 12 +++- src/backend/statistics/mcv.c | 68 +++++++++++++++---- .../statistics/extended_stats_internal.h | 3 +- src/test/regress/expected/stats_ext.out | 16 ++++- src/test/regress/sql/stats_ext.sql | 8 ++- 5 files changed, 90 insertions(+), 17 deletions(-) diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index da6f6315698..fc6f7905bb3 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -1915,7 +1915,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli overlap_basesel, mcv_totalsel, clause_sel, - overlap_sel; + overlap_sel, + clause_cap; /* * "Simple" selectivity of the next clause and its overlap @@ -1945,7 +1946,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli &mcv_basesel, &overlap_mcvsel, &overlap_basesel, - &mcv_totalsel); + &mcv_totalsel, + &clause_cap); /* * Combine the simple and multi-column estimates. @@ -1959,11 +1961,17 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli if (bms_is_member(listidx, simple_clauses)) clause_sel = simple_sel; else + { clause_sel = mcv_combine_selectivities(simple_sel, mcv_sel, mcv_basesel, mcv_totalsel); + /* Cap the contribution of values not found in the MCV. */ + if (clause_sel > clause_cap) + clause_sel = clause_cap; + } + overlap_sel = mcv_combine_selectivities(overlap_simple_sel, overlap_mcvsel, overlap_basesel, diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 62761c58e33..df75087391c 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -1577,6 +1577,40 @@ mcv_cap_multiplier(List *clauses) return multiplier; } +/* + * mcv_compute_cap + * Compute a selectivity cap based on the least common MCV frequency. + * + * When one equality/IN clause covers each MCV dimension, value combinations + * not found in the MCV can't be more frequent than the least common tracked + * combination. The cap is: matched MCV frequency plus the number of + * non-MCV combinations times the least common MCV frequency. + * + * Returns 1.0 (no cap) when the clauses don't fully cover all dimensions + * or contain unsupported clause types. + */ +static Selectivity +mcv_compute_cap(MCVList *mcv, List *clauses, Selectivity mcv_sel, + int64 matched_count) +{ + int64 cap_mult; + int64 non_mcv_mult; + Selectivity cap; + + if (list_length(clauses) != mcv->ndimensions) + return 1.0; + + cap_mult = mcv_cap_multiplier(clauses); + non_mcv_mult = cap_mult - matched_count; + + if (non_mcv_mult <= 0) + return 1.0; + + cap = mcv_sel + non_mcv_mult * mcv->items[mcv->nitems - 1].frequency; + CLAMP_PROBABILITY(cap); + return cap; +} + /* * match the attribute/expression to a dimension of the statistic * @@ -2144,17 +2178,7 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, * combination is not among the most common, so it can't be more frequent * than the least common tracked combination. */ - if (mcv->ndimensions == list_length(clauses)) - { - int64 cap_mult = mcv_cap_multiplier(clauses); - int64 non_mcv_mult = cap_mult - matched_count; - - if (non_mcv_mult > 0) - { - *cap = s + non_mcv_mult * mcv->items[mcv->nitems - 1].frequency; - CLAMP_PROBABILITY(*cap); - } - } + *cap = mcv_compute_cap(mcv, clauses, s, matched_count); return s; } @@ -2202,11 +2226,16 @@ Selectivity mcv_clause_selectivity_or(PlannerInfo *root, StatisticExtInfo *stat, MCVList *mcv, Node *clause, bool **or_matches, Selectivity *basesel, Selectivity *overlap_mcvsel, - Selectivity *overlap_basesel, Selectivity *totalsel) + Selectivity *overlap_basesel, Selectivity *totalsel, + Selectivity *clause_cap) { Selectivity s = 0.0; bool *new_matches; int i; + int64 matched_count = 0; + + /* default: no cap on clause selectivity */ + *clause_cap = 1.0; /* build the OR-matches bitmap, if not built already */ if (*or_matches == NULL) @@ -2233,6 +2262,7 @@ mcv_clause_selectivity_or(PlannerInfo *root, StatisticExtInfo *stat, { s += mcv->items[i].frequency; *basesel += mcv->items[i].base_frequency; + matched_count++; if ((*or_matches)[i]) { @@ -2247,6 +2277,20 @@ mcv_clause_selectivity_or(PlannerInfo *root, StatisticExtInfo *stat, pfree(new_matches); + /* + * When there is one equality/IN clause per MCV dimension, cap the + * contribution of value combinations not found in the MCV. Each such + * combination is not among the most common, so it can't be more frequent + * than the least common tracked combination. + */ + if (is_andclause(clause)) + { + BoolExpr *bexpr = (BoolExpr *) clause; + + *clause_cap = mcv_compute_cap(mcv, bexpr->args, s, + matched_count); + } + return s; } diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h index 01b5f67b843..10f41f87564 100644 --- a/src/include/statistics/extended_stats_internal.h +++ b/src/include/statistics/extended_stats_internal.h @@ -140,6 +140,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root, Selectivity *basesel, Selectivity *overlap_mcvsel, Selectivity *overlap_basesel, - Selectivity *totalsel); + Selectivity *totalsel, + Selectivity *clause_cap); #endif /* EXTENDED_STATS_INTERNAL_H */ diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 7ea244f7851..c712679b573 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -2931,7 +2931,7 @@ DROP TABLE mcv_lists_partial; -- P(a=0)=0.5 and P(b=0)=0.5, so the independence estimate is 0.25 * N. -- After building MCV statistics the cap limits the combined estimate to the -- least-common MCV frequency, eliminating most of the over-estimation. -CREATE TABLE mcv_cap (a INT, b INT) WITH (autovacuum_enabled = off); +CREATE TABLE mcv_cap (a INT, b INT, c INT DEFAULT 0) WITH (autovacuum_enabled = off); INSERT INTO mcv_cap SELECT 0, b FROM generate_series(1, 99) b, generate_series(1, 100) r; INSERT INTO mcv_cap @@ -2979,6 +2979,20 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN ( 200 | 0 (1 row) +-- partial MCV match inside OR (a=0, b=99) +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE c = 1 OR (a = 0 AND b IN (0, 99))'); + estimated | actual +-----------+-------- + 200 | 100 +(1 row) + +-- no MCV match inside OR +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE c = 1 OR (a = 0 AND b = 0)'); + estimated | actual +-----------+-------- + 100 | 0 +(1 row) + DROP TABLE mcv_cap; -- check the ability to use multiple MCV lists CREATE TABLE mcv_lists_multi ( diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql index 8e0b8c0eb5c..926dfaa4e6d 100644 --- a/src/test/regress/sql/stats_ext.sql +++ b/src/test/regress/sql/stats_ext.sql @@ -1468,7 +1468,7 @@ DROP TABLE mcv_lists_partial; -- P(a=0)=0.5 and P(b=0)=0.5, so the independence estimate is 0.25 * N. -- After building MCV statistics the cap limits the combined estimate to the -- least-common MCV frequency, eliminating most of the over-estimation. -CREATE TABLE mcv_cap (a INT, b INT) WITH (autovacuum_enabled = off); +CREATE TABLE mcv_cap (a INT, b INT, c INT DEFAULT 0) WITH (autovacuum_enabled = off); INSERT INTO mcv_cap SELECT 0, b FROM generate_series(1, 99) b, generate_series(1, 100) r; @@ -1497,6 +1497,12 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN ( -- no MCV match SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); +-- partial MCV match inside OR (a=0, b=99) +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE c = 1 OR (a = 0 AND b IN (0, 99))'); + +-- no MCV match inside OR +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE c = 1 OR (a = 0 AND b = 0)'); + DROP TABLE mcv_cap; -- check the ability to use multiple MCV lists -- 2.43.0
From 42a36e6d98d7642e87500de8f138e1f54160fe55 Mon Sep 17 00:00:00 2001 From: Enrique Sanchez Cardoso <[email protected]> Date: Sun, 24 May 2026 01:03:14 +0200 Subject: [PATCH 1/4] Cap selectivity when values are not in multi-column mcv Selectivity can't be > last MCV item (least common) selectivity when they are AND clauses and cover all the MCV dimensions. --- src/backend/statistics/extended_stats.c | 11 +++- src/backend/statistics/mcv.c | 43 ++++++++++++++- .../statistics/extended_stats_internal.h | 3 +- src/test/regress/expected/stats_ext.out | 52 +++++++++++++++++++ src/test/regress/sql/stats_ext.sql | 34 ++++++++++++ 5 files changed, 140 insertions(+), 3 deletions(-) diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 2b83355d26e..f8c38653bf9 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -1989,6 +1989,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli mcv_sel, mcv_basesel, mcv_totalsel, + mcv_cap, stat_sel; /* @@ -2006,7 +2007,8 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli mcv_sel = mcv_clauselist_selectivity(root, stat, stat_clauses, varRelid, jointype, sjinfo, rel, &mcv_basesel, - &mcv_totalsel); + &mcv_totalsel, + &mcv_cap); /* Combine the simple and multi-column estimates. */ stat_sel = mcv_combine_selectivities(simple_sel, @@ -2014,6 +2016,13 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli mcv_basesel, mcv_totalsel); + /* + * Cap to the least common MCV frequency when no MCV items + * matched. + */ + if (stat_sel > mcv_cap) + stat_sel = mcv_cap; + /* Factor this into the overall result */ sel *= stat_sel; } diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 0b7da605a4c..df70d00cc3d 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -24,6 +24,7 @@ #include "statistics/statistics.h" #include "utils/array.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/fmgrprotos.h" #include "utils/lsyscache.h" #include "utils/selfuncs.h" @@ -1523,6 +1524,32 @@ pg_mcv_list_send(PG_FUNCTION_ARGS) return byteasend(fcinfo); } +/* + * mcv_is_all_equality_clauses + * Check if all clauses are simple equality conditions (OpExpr with eqsel + * restriction estimator). This mirrors the check done by + * dependency_is_compatible_clause() in dependencies.c. + */ +static bool +mcv_is_all_equality_clauses(List *clauses) +{ + ListCell *lc; + + foreach(lc, clauses) + { + Node *clause = (Node *) lfirst(lc); + + if (IsA(clause, RestrictInfo)) + clause = (Node *) ((RestrictInfo *) clause)->clause; + + if (!is_opclause(clause) || + get_oprrest(((OpExpr *) clause)->opno) != F_EQSEL) + return false; + } + + return true; +} + /* * match the attribute/expression to a dimension of the statistic * @@ -2047,7 +2074,8 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, List *clauses, int varRelid, JoinType jointype, SpecialJoinInfo *sjinfo, RelOptInfo *rel, - Selectivity *basesel, Selectivity *totalsel) + Selectivity *basesel, Selectivity *totalsel, + Selectivity *cap) { int i; MCVList *mcv; @@ -2057,6 +2085,9 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, /* match/mismatch bitmap for each MCV item */ bool *matches = NULL; + /* default: no cap on combined selectivity */ + *cap = 1.0; + /* load the MCV list stored in the statistics object */ mcv = statext_mcv_load(stat->statOid, rte->inh); @@ -2078,6 +2109,16 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, } } + /* + * When no MCV item matched and there is one equality clause per MCV + * dimension, cap the selectivity to the least common MCV frequency. The + * combination is not among the most common, so it can't be more frequent + * than the least common tracked combination. + */ + if (s == 0.0 && mcv->ndimensions == list_length(clauses) && + mcv_is_all_equality_clauses(clauses)) + *cap = mcv->items[mcv->nitems - 1].frequency; + return s; } diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h index c775442f2ee..01b5f67b843 100644 --- a/src/include/statistics/extended_stats_internal.h +++ b/src/include/statistics/extended_stats_internal.h @@ -129,7 +129,8 @@ extern Selectivity mcv_clauselist_selectivity(PlannerInfo *root, SpecialJoinInfo *sjinfo, RelOptInfo *rel, Selectivity *basesel, - Selectivity *totalsel); + Selectivity *totalsel, + Selectivity *cap); extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root, StatisticExtInfo *stat, diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 37070c1a896..1ca26669bb1 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -2928,6 +2928,58 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 (1 row) DROP TABLE mcv_lists_partial; +-- P(a=0)=0.5 and P(b=0)=0.5, so the independence estimate is 0.25 * N. +-- After building MCV statistics the cap limits the combined estimate to the +-- least-common MCV frequency, eliminating most of the over-estimation. +CREATE TABLE mcv_cap (a INT, b INT) WITH (autovacuum_enabled = off); +INSERT INTO mcv_cap + SELECT 0, b FROM generate_series(1, 99) b, generate_series(1, 100) r; +INSERT INTO mcv_cap + SELECT a, 0 FROM generate_series(1, 99) a, generate_series(1, 100) r; +ANALYZE mcv_cap; +-- without MCV statistics: independence gives 0.5 * 0.5 * 19800 = 4950 rows +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0'); + estimated | actual +-----------+-------- + 4950 | 0 +(1 row) + +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); + estimated | actual +-----------+-------- + 5000 | 100 +(1 row) + +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); + estimated | actual +-----------+-------- + 4950 | 0 +(1 row) + +CREATE STATISTICS mcv_cap_stats (mcv) ON a, b FROM mcv_cap; +ANALYZE mcv_cap; +-- with MCV statistics: bounded by least MCV frequency +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0'); + estimated | actual +-----------+-------- + 100 | 0 +(1 row) + +-- IN/ANY equality clauses are not supported, partial MCV match (a=0, b=99) +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); + estimated | actual +-----------+-------- + 5050 | 100 +(1 row) + +-- IN/ANY equality clauses are not supported, no MCV match +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); + estimated | actual +-----------+-------- + 4950 | 0 +(1 row) + +DROP TABLE mcv_cap; -- check the ability to use multiple MCV lists CREATE TABLE mcv_lists_multi ( a INTEGER, diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql index 3cc6012b822..0f67363cd6d 100644 --- a/src/test/regress/sql/stats_ext.sql +++ b/src/test/regress/sql/stats_ext.sql @@ -1465,6 +1465,40 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 DROP TABLE mcv_lists_partial; +-- P(a=0)=0.5 and P(b=0)=0.5, so the independence estimate is 0.25 * N. +-- After building MCV statistics the cap limits the combined estimate to the +-- least-common MCV frequency, eliminating most of the over-estimation. +CREATE TABLE mcv_cap (a INT, b INT) WITH (autovacuum_enabled = off); + +INSERT INTO mcv_cap + SELECT 0, b FROM generate_series(1, 99) b, generate_series(1, 100) r; + +INSERT INTO mcv_cap + SELECT a, 0 FROM generate_series(1, 99) a, generate_series(1, 100) r; + +ANALYZE mcv_cap; + +-- without MCV statistics: independence gives 0.5 * 0.5 * 19800 = 4950 rows +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0'); + +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); + +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); + +CREATE STATISTICS mcv_cap_stats (mcv) ON a, b FROM mcv_cap; +ANALYZE mcv_cap; + +-- with MCV statistics: bounded by least MCV frequency +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0'); + +-- IN/ANY equality clauses are not supported, partial MCV match (a=0, b=99) +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); + +-- IN/ANY equality clauses are not supported, no MCV match +SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); + +DROP TABLE mcv_cap; + -- check the ability to use multiple MCV lists CREATE TABLE mcv_lists_multi ( a INTEGER, -- 2.43.0
From 3bdd48382fdcf0bd8eaaa6a328cde8a47eeee1ac Mon Sep 17 00:00:00 2001 From: Enrique Sanchez Cardoso <[email protected]> Date: Sun, 24 May 2026 13:32:24 +0200 Subject: [PATCH 2/4] Add support for IN/ANY clauses in multi-column MCV cap Extend the MCV-based cap to handle IN and ANY clauses, allowing the selectivity cap to apply when these clauses are used alongside equality filters. --- src/backend/statistics/mcv.c | 60 +++++++++++++++++++------ src/test/regress/expected/stats_ext.out | 6 +-- src/test/regress/sql/stats_ext.sql | 4 +- 3 files changed, 52 insertions(+), 18 deletions(-) diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index df70d00cc3d..2e75f19d8bd 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -1525,14 +1525,19 @@ pg_mcv_list_send(PG_FUNCTION_ARGS) } /* - * mcv_is_all_equality_clauses - * Check if all clauses are simple equality conditions (OpExpr with eqsel - * restriction estimator). This mirrors the check done by - * dependency_is_compatible_clause() in dependencies.c. + * mcv_cap_multiplier + * Compute a multiplier for capping combined selectivity to the least + * common MCV frequency when no MCV items matched. + * + * Returns 0 if the cap should not be applied (unsupported clause types). + * Returns >= 1 as the number of distinct value combinations the clauses + * could match: 1 for each equality clause, N for each IN/ANY clause with + * N elements. */ -static bool -mcv_is_all_equality_clauses(List *clauses) +static int64 +mcv_cap_multiplier(List *clauses) { + int64 multiplier = 1; ListCell *lc; foreach(lc, clauses) @@ -1542,12 +1547,34 @@ mcv_is_all_equality_clauses(List *clauses) if (IsA(clause, RestrictInfo)) clause = (Node *) ((RestrictInfo *) clause)->clause; - if (!is_opclause(clause) || - get_oprrest(((OpExpr *) clause)->opno) != F_EQSEL) - return false; + if (is_opclause(clause)) + { + /* Simple equality: factor 1 */ + if (get_oprrest(((OpExpr *) clause)->opno) != F_EQSEL) + return 0; + } + else if (IsA(clause, ScalarArrayOpExpr)) + { + ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; + Node *arg; + ArrayType *arr; + + /* Only ANY/IN with equality operator */ + if (!saop->useOr || get_oprrest(saop->opno) != F_EQSEL) + return 0; + + arg = (Node *) lsecond(saop->args); + if (!IsA(arg, Const) || ((Const *) arg)->constisnull) + return 0; + + arr = DatumGetArrayTypeP(((Const *) arg)->constvalue); + multiplier *= ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr)); + } + else + return 0; /* unsupported clause type */ } - return true; + return multiplier; } /* @@ -2115,9 +2142,16 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, * combination is not among the most common, so it can't be more frequent * than the least common tracked combination. */ - if (s == 0.0 && mcv->ndimensions == list_length(clauses) && - mcv_is_all_equality_clauses(clauses)) - *cap = mcv->items[mcv->nitems - 1].frequency; + if (s == 0.0 && mcv->ndimensions == list_length(clauses)) + { + int64 cap_mult = mcv_cap_multiplier(clauses); + + if (cap_mult > 0) + { + *cap = cap_mult * mcv->items[mcv->nitems - 1].frequency; + CLAMP_PROBABILITY(*cap); + } + } return s; } diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 1ca26669bb1..82ad9edb568 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -2965,18 +2965,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0' 100 | 0 (1 row) --- IN/ANY equality clauses are not supported, partial MCV match (a=0, b=99) +-- partial MCV match (a=0, b=99) SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); estimated | actual -----------+-------- 5050 | 100 (1 row) --- IN/ANY equality clauses are not supported, no MCV match +-- no MCV match SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); estimated | actual -----------+-------- - 4950 | 0 + 200 | 0 (1 row) DROP TABLE mcv_cap; diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql index 0f67363cd6d..8e0b8c0eb5c 100644 --- a/src/test/regress/sql/stats_ext.sql +++ b/src/test/regress/sql/stats_ext.sql @@ -1491,10 +1491,10 @@ ANALYZE mcv_cap; -- with MCV statistics: bounded by least MCV frequency SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0'); --- IN/ANY equality clauses are not supported, partial MCV match (a=0, b=99) +-- partial MCV match (a=0, b=99) SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); --- IN/ANY equality clauses are not supported, no MCV match +-- no MCV match SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 100)'); DROP TABLE mcv_cap; -- 2.43.0
From fc369280662c9183ccc50ac4ded144acadd84c15 Mon Sep 17 00:00:00 2001 From: Enrique Sanchez Cardoso <[email protected]> Date: Sun, 24 May 2026 14:40:39 +0200 Subject: [PATCH 3/4] Extend multicolumn MCV cap to partial IN/ANY matches --- src/backend/statistics/extended_stats.c | 5 +---- src/backend/statistics/mcv.c | 17 ++++++++++------- src/test/regress/expected/stats_ext.out | 2 +- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index f8c38653bf9..da6f6315698 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -2016,10 +2016,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli mcv_basesel, mcv_totalsel); - /* - * Cap to the least common MCV frequency when no MCV items - * matched. - */ + /* Cap the contribution of values not found in the MCV. */ if (stat_sel > mcv_cap) stat_sel = mcv_cap; diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 2e75f19d8bd..62761c58e33 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -1526,8 +1526,8 @@ pg_mcv_list_send(PG_FUNCTION_ARGS) /* * mcv_cap_multiplier - * Compute a multiplier for capping combined selectivity to the least - * common MCV frequency when no MCV items matched. + * Compute a multiplier for capping combined selectivity to the least + * common MCV frequency. * * Returns 0 if the cap should not be applied (unsupported clause types). * Returns >= 1 as the number of distinct value combinations the clauses @@ -2105,6 +2105,7 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, Selectivity *cap) { int i; + int64 matched_count = 0; MCVList *mcv; Selectivity s = 0.0; RangeTblEntry *rte = root->simple_rte_array[rel->relid]; @@ -2133,22 +2134,24 @@ mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat, { *basesel += mcv->items[i].base_frequency; s += mcv->items[i].frequency; + matched_count++; } } /* - * When no MCV item matched and there is one equality clause per MCV - * dimension, cap the selectivity to the least common MCV frequency. The + * When there is one equality/IN clause per MCV dimension, cap the + * contribution of value combinations not found in the MCV. Each such * combination is not among the most common, so it can't be more frequent * than the least common tracked combination. */ - if (s == 0.0 && mcv->ndimensions == list_length(clauses)) + if (mcv->ndimensions == list_length(clauses)) { int64 cap_mult = mcv_cap_multiplier(clauses); + int64 non_mcv_mult = cap_mult - matched_count; - if (cap_mult > 0) + if (non_mcv_mult > 0) { - *cap = cap_mult * mcv->items[mcv->nitems - 1].frequency; + *cap = s + non_mcv_mult * mcv->items[mcv->nitems - 1].frequency; CLAMP_PROBABILITY(*cap); } } diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 82ad9edb568..7ea244f7851 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -2969,7 +2969,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b = 0' SELECT * FROM check_estimated_rows('SELECT * FROM mcv_cap WHERE a = 0 AND b IN (0, 99)'); estimated | actual -----------+-------- - 5050 | 100 + 200 | 100 (1 row) -- no MCV match -- 2.43.0
