There's a complaint in bug #14438 about poor estimation of join size
for a semijoin whose inner side is empty. I think the root of it is that,
having no statistics for the empty table, eqjoinsel_semi distrusts its
estimate of the number of distinct values on the inner side, and falls
back to a conservative calculation that has little to do with reality.
However, it's not really true that the nd2 estimate is based on nothing at
all, because we clamped it to be no more than the estimated size of the
inner relation (which we knew to be small). If we go ahead and use that
number as a valid estimate, we get a far better selectivity estimate ---
in the bug's example, the join size estimate goes from 16000-some to 2,
which is a tad closer to the correct value of 0.

## Advertising

Hence, I propose the attached patch. This would kick in whenever the
inner side of a semi/antijoin has no statistics and an estimated size
of less than 200 rows.
Thoughts?
regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 56943f2..d331adf 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** eqjoinsel_semi(Oid operator,
*** 2511,2520 ****
* We can apply this clamping both with respect to the base relation from
* which the join variable comes (if there is just one), and to the
* immediate inner input relation of the current join.
*/
if (vardata2->rel)
! nd2 = Min(nd2, vardata2->rel->rows);
! nd2 = Min(nd2, inner_rel->rows);
if (HeapTupleIsValid(vardata1->statsTuple))
{
--- 2511,2532 ----
* We can apply this clamping both with respect to the base relation from
* which the join variable comes (if there is just one), and to the
* immediate inner input relation of the current join.
+ *
+ * If we clamp, we can consider that nd2 is not a bogus default estimate.
*/
if (vardata2->rel)
! {
! if (nd2 >= vardata2->rel->rows)
! {
! nd2 = vardata2->rel->rows;
! isdefault2 = false;
! }
! }
! if (nd2 >= inner_rel->rows)
! {
! nd2 = inner_rel->rows;
! isdefault2 = false;
! }
if (HeapTupleIsValid(vardata1->statsTuple))
{

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers