Correctly calculate "MCV frequency" for a unique column. In commit bd3e3e9e5, I over-hastily used 1 / rel->rows as the assumed frequency of entries in a column that ANALYZE has found to be unique. However, rel->rows is the number of table rows that are estimated to pass the query's restriction conditions, so that we got a too-large result if the query has selective restrictions. What I should have used is 1 / rel->tuples, since that is the estimated total number of table rows. The pre-existing code path that digs a frequency out of the histogram produces a frequency relative to the whole table, so surely this new alternative code path must do so as well. Any correction needed on the basis of selectivity must be done by the user of the mcv_freq value.
Fixing this causes all the regression test plans changed by bd3e3e9e5 to revert to what they had been, except for the first change in join.out. As I correctly argued in bd3e3e9e5, in that test case we have no stats and should not risk a hash join. Evidently I was less correct to argue that the other changes were improvements. Reported-by: Joel Jacobson <[email protected]> Diagnosed-by: Tender Wang <[email protected]> Author: Tom Lane <[email protected]> Discussion: https://postgr.es/m/[email protected] Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/d80b0225010fd407c784bbecde116a28198b6eab Modified Files -------------- src/backend/utils/adt/selfuncs.c | 9 +- src/test/regress/expected/join.out | 16 +- src/test/regress/expected/partition_join.out | 432 ++++++++++++++------------- 3 files changed, 231 insertions(+), 226 deletions(-)
