Correctly calculate "MCV frequency" for a unique column.

In commit bd3e3e9e5, I over-hastily used 1 / rel->rows as the assumed
frequency of entries in a column that ANALYZE has found to be unique.
However, rel->rows is the number of table rows that are estimated to
pass the query's restriction conditions, so that we got a too-large
result if the query has selective restrictions.  What I should have
used is 1 / rel->tuples, since that is the estimated total number of
table rows.  The pre-existing code path that digs a frequency out of
the histogram produces a frequency relative to the whole table, so
surely this new alternative code path must do so as well.  Any
correction needed on the basis of selectivity must be done by the
user of the mcv_freq value.

Fixing this causes all the regression test plans changed by bd3e3e9e5
to revert to what they had been, except for the first change in
join.out.  As I correctly argued in bd3e3e9e5, in that test case we
have no stats and should not risk a hash join.  Evidently I was less
correct to argue that the other changes were improvements.

Reported-by: Joel Jacobson <[email protected]>
Diagnosed-by: Tender Wang <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: 
https://postgr.es/m/[email protected]

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/d80b0225010fd407c784bbecde116a28198b6eab

Modified Files
--------------
src/backend/utils/adt/selfuncs.c             |   9 +-
src/test/regress/expected/join.out           |  16 +-
src/test/regress/expected/partition_join.out | 432 ++++++++++++++-------------
3 files changed, 231 insertions(+), 226 deletions(-)

Reply via email to