Ignore PlaceHolderVars when looking up statistics

When looking up statistical data about an expression, we failed to
look through PlaceHolderVar nodes, treating them as opaque.  This
could prevent us from matching an expression to base columns, index
expressions, or extended statistics, as examine_variable() relies on
strict structural matching.

As a result, queries involving PlaceHolderVar nodes often fell back to
default selectivity estimates, potentially leading to poor plan
choices.

This patch updates examine_variable() to strip PlaceHolderVars before
analysis.  This is safe during estimation because PlaceHolderVars are
transparent for the purpose of statistics lookup: they do not alter
the value distribution of the underlying expression.

To minimize performance overhead on this hot path, a lightweight
walker first checks for the presence of PlaceHolderVars.  The more
expensive mutator is invoked only when necessary.

There is one ensuing plan change in the regression tests, which is
expected and demonstrates the fix: the rowcount estimate becomes much
more accurate with this patch.

Back-patch to v18.  Although this issue exists before that, changes in
this version made it common enough to notice.  Given the lack of field
reports for older versions, I am not back-patching further.

Reported-by: Haowu Ge <[email protected]>
Author: Richard Guo <[email protected]>
Discussion: 
https://postgr.es/m/[email protected]
Backpatch-through: 18

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/559f9e90dbbd5d72b1da802703317913280c5080

Modified Files
--------------
src/backend/utils/adt/selfuncs.c   | 93 +++++++++++++++++++++++++++++++++-----
src/test/regress/expected/join.out | 27 ++++++++++-
src/test/regress/sql/join.sql      | 10 ++++
3 files changed, 116 insertions(+), 14 deletions(-)

Reply via email to