Hi, all!

On 24.06.2023 14:23, Tomas Vondra wrote:
On 6/24/23 02:08, Tom Lane wrote:
Tomas Vondra<tomas.von...@enterprisedb.com>  writes:
The problem is that the selectivity for "IS NULL" is estimated using the
table-level statistics. But the LEFT JOIN entirely breaks the idea that
the null_frac has anything to do with NULLs in the join result.
Right.

I wonder how to improve this, say by adjusting the IS NULL selectivity
when we know to operate on the outer side of the join. We're able to
do this for antijoins, so maybe we could do that here, somehow?
This mess is part of the long-term plan around the work I've been doing
on outer-join-aware Vars.  We now have infrastructure that can let
the estimator routines see "oh, this Var isn't directly from a scan
of its table, it's been passed through a potentially-nulling outer
join --- and I can see which one".  I don't have more than vague ideas
about what happens next, but that is clearly an essential step on the
road to doing better.

I was wondering if that work on outer-join-aware Vars could help with
this, but I wasn't following it very closely. I agree the ability to
check if the Var could be NULL due to an outer join seems useful, as it
says whether applying raw attribute statistics makes sense or not.

I was thinking about what to do for the case when that's not possible,
i.e. when the Var refers to nullable side of the join. Knowing that this
is happening is clearly not enough - we need to know how many new NULLs
are "injected" into the join result, and "communicate" that to the
estimation routines.

Attached is a very ugly experimental patch doing that, and with it the
estimate changes to this:

                                  QUERY PLAN
   ----------------------------------------------------------------------
    Hash Left Join  (cost=3.25..18179.88 rows=999900 width=16)
                    (actual time=0.528..596.151 rows=999900 loops=1)
      Hash Cond: (large.id = small.id)
      Filter: ((small.id IS NULL) OR
               (large.a = ANY ('{1000,2000,3000,4000,5000}'::integer[])))
      Rows Removed by Filter: 100
      ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
                      (actual time=0.069..176.138 rows=1000000 loops=1)
      ->  Hash  (cost=2.00..2.00 rows=100 width=8)
                (actual time=0.371..0.373 rows=100 loops=1)
            Buckets: 1024  Batches: 1  Memory Usage: 12kB
            ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
                          (actual time=0.032..0.146 rows=100 loops=1)
    Planning Time: 3.845 ms
    Execution Time: 712.405 ms
   (10 rows)

Seems nice, but. The patch is pretty ugly, I don't claim it works for
other queries or that this is exactly what we should do. It calculates
"unmatched frequency" next to eqjoinsel_inner, stashes that info into
sjinfo and the estimator (nulltestsel) then uses that to adjust the
nullfrac it gets from the statistics.

The good thing is this helps even for IS NULL checks on non-join-key
columns (where we don't switch to an antijoin), but there's a couple
things that I dislike ...

1) It's not restricted to outer joins or anything like that (this is
mostly just my laziness / interest in one particular query, but also
something the outer-join-aware patch might help with).

2) We probably don't want to pass this kind of information through
sjinfo. That was the simplest thing for an experimental patch, but I
suspect it's not the only piece of information we may need to pass to
the lower levels of estimation code.

3) I kinda doubt we actually want to move this responsibility (to
consider fraction of unmatched rows) to the low-level estimation
routines (e.g. nulltestsel and various others). AFAICS this just
"introduces NULLs" into the relation, so maybe we could "adjust" the
attribute statistics (in examine_variable?) by inflating null_frac and
modifying the other frequencies in MCV/histogram.

4) But I'm not sure we actually want to do that in these low-level
selectivity functions. The outer join essentially produces output with
two subsets - one with matches on the outer side, one without them. But
the side without matches has NULLs in all columns. In a way, we know
exactly how are these columns correlated - if we do the usual estimation
(even with the null_frac adjusted), we just throw this information away.
And when there's a lot of rows without a match, that seems bad.

So maybe we should split the join estimate into two parts, one for each
subset of the join result. One for the rows with a match (and then we
can just do what we do now, with the attribute stats we already have).
And one for the "unmatched part" where we know the values on the outer
side are NULL (and then we can easily "fake" stats with null_frac=1.0).


I really hope what I just wrote makes at least a little bit of sense.


regards

I am also interested in this problem.

I did some refactoring of the source code in the patch, moved the calculation of unmatched_fraction to eqjoinsel_inner. I wrote myself in this commit as a co-author, if you don't mind, and I'm going to continue working.


On 26.06.2023 12:22, Andrey Lepikhov wrote:
On 24/6/2023 17:23, Tomas Vondra wrote:
I really hope what I just wrote makes at least a little bit of sense.
Throw in one more example:

SELECT i AS id INTO l FROM generate_series(1,100000) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;

Here you can see the same kind of underestimation:
Hash Left Join  (... rows=500 width=14) (... rows=99999 ...)

So the eqjoinsel_unmatch_left() function should be modified for the case where nd1<nd2.

Unfortunately, this patch could not fix the cardinality calculation in this request, I'll try to look and figure out what is missing here.

*postgres=# SELECT i AS id INTO l FROM generate_series(1,100000) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
SELECT 100000
CREATE TABLE
INSERT 0 2
ANALYZE
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.04..1819.07 rows=1 width=14) (actual time=0.143..114.792 rows=99999 loops=1)
   Hash Cond: (l.id = r.id)
   Filter: (r.v IS NULL)
   Rows Removed by Filter: 1
   ->  Seq Scan on l  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.027..35.278 rows=100000 loops=1)    ->  Hash  (cost=1.02..1.02 rows=2 width=10) (actual time=0.014..0.017 rows=2 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on r  (cost=0.00..1.02 rows=2 width=10) (actual time=0.005..0.007 rows=2 loops=1)
 Planning Time: 0.900 ms
 Execution Time: 126.180 ms
(10 rows)*


As in the previous query, even with applied the patch, the cardinality is calculated poorly here, I would even say that it has become worse:

EXPLAIN ANALYZE
  SELECT * FROM large FULL JOIN small ON (large.id = small.id)
WHERE (large.a IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=127921.69..299941.59 rows=56503 width=16) (actual time=795.092..795.094 rows=0 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 1000000
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual time=0.038..0.046 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.013..0.022 rows=100 loops=1)    ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8) (actual time=363.016..649.103 rows=1000000 loops=1)          ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=363.012..481.480 rows=1000000 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.009..111.166 rows=1000000 loops=1)
 Planning Time: 0.124 ms
 Execution Time: 797.139 ms
(15 rows)*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=3.25..18179.25 rows=999900 width=16) (actual time=261.480..261.482 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 1000000
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..92.827 rows=1000000 loops=1)    ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual time=0.032..0.034 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8) (actual time=0.008..0.015 rows=100 loops=1)
 Planning Time: 0.151 ms
 Execution Time: 261.529 ms
(10 rows)
*

In addition, I found a few more queries, where the estimation of cardinality with the patch has become better:


EXPLAIN ANALYZE
  SELECT * FROM small LEFT JOIN large ON (large.id = small.id)
WHERE (small.b IS NULL);

MASTER:

*QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=32.74..18758.45 rows=55003 width=16) (actual time=0.100..0.104 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   ->  Seq Scan on large  (cost=0.00..14425.50 rows=1000050 width=8) (never executed)    ->  Hash  (cost=32.60..32.60 rows=11 width=8) (actual time=0.089..0.091 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=11 width=8) (actual time=0.088..0.088 rows=0 loops=1)
               Filter: (b IS NULL)
               Rows Removed by Filter: 100
 Planning Time: 0.312 ms
 Execution Time: 0.192 ms
(10 rows)*

With patch:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=2.01..18177.02 rows=1 width=16) (actual time=0.127..0.132 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8) (never executed)    ->  Hash  (cost=2.00..2.00 rows=1 width=8) (actual time=0.112..0.114 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=1 width=8) (actual time=0.111..0.111 rows=0 loops=1)
               Filter: (b IS NULL)
               Rows Removed by Filter: 100
 Planning Time: 0.984 ms
 Execution Time: 0.237 ms
(10 rows)*

EXPLAIN ANALYZE
  SELECT * FROM large FULL JOIN small ON (large.id = small.id)
WHERE (small.b IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=127921.69..299941.59 rows=56503 width=16) (actual time=339.478..819.232 rows=999900 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (small.b IS NULL)
   Rows Removed by Filter: 100
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual time=0.129..0.136 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.044..0.075 rows=100 loops=1)    ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8) (actual time=339.260..605.444 rows=1000000 loops=1)          ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=339.254..449.930 rows=1000000 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.032..104.484 rows=1000000 loops=1)
 Planning Time: 0.324 ms
 Execution Time: 859.705 ms
(15 rows)
*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=3.25..18179.25 rows=999900 width=16) (actual time=0.162..349.683 rows=999900 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (small.b IS NULL)
   Rows Removed by Filter: 100
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.021..95.972 rows=1000000 loops=1)    ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual time=0.125..0.127 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8) (actual time=0.030..0.059 rows=100 loops=1)
 Planning Time: 0.218 ms
 Execution Time: 385.819 ms
(10 rows)
*

**

EXPLAIN ANALYZE
  SELECT * FROM large RIGHT JOIN small ON (large.id = small.id)
  WHERE (large.a IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=127921.69..299941.59 rows=56503 width=16) (actual time=345.403..345.404 rows=0 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 100
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual time=0.033..0.039 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.012..0.020 rows=100 loops=1)    ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8) (actual time=345.287..345.315 rows=101 loops=1)          ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8) (actual time=345.283..345.295 rows=101 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050 width=8) (actual time=0.009..104.648 rows=1000000 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 347.807 ms
(15 rows)*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=3.25..18179.25 rows=100 width=16) (actual time=209.838..209.842 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 100
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..91.571 rows=1000000 loops=1)    ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual time=0.034..0.036 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8) (actual time=0.008..0.016 rows=100 loops=1)
 Planning Time: 0.168 ms
 Execution Time: 209.883 ms
(10 rows)*
From e8653477fbe7321325122a2d5032798cd6a95a8b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@enterprisedb.com>
Date: Mon, 26 Jun 2023 15:39:11 +0300
Subject: [PATCH] Fixed the case of calculating underestimated cardinality for
 an LEFT JOIN with the restriction "IS NULL" in the clause. This error is
 caused by an incorrect calculation of selectivity in the "IS NULL" clause,
 since it took into account only table-level statistics without zero values in
 the results of the join operation. This patch fixes this by calculating the
 fraction of zero values on the right side of the join without of matching row
 on left.

Co-authored-by: Alena Rybakina <a.rybak...@postgrespro.ru>
---
 src/backend/utils/adt/selfuncs.c | 35 +++++++++++++++++++++++++++++---
 src/include/nodes/pathnodes.h    |  3 +++
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c4fcd0076ea..8e18aa1dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -153,7 +153,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 							  bool isdefault1, bool isdefault2,
 							  AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 							  Form_pg_statistic stats1, Form_pg_statistic stats2,
-							  bool have_mcvs1, bool have_mcvs2);
+							  bool have_mcvs1, bool have_mcvs2, double *unmatched_frac);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 							 VariableStatData *vardata1, VariableStatData *vardata2,
 							 double nd1, double nd2,
@@ -1710,6 +1710,9 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 		stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
 		freq_null = stats->stanullfrac;
 
+		if (sjinfo)
+			freq_null = freq_null + sjinfo->unmatched_frac - freq_null * sjinfo->unmatched_frac;
+
 		switch (nulltesttype)
 		{
 			case IS_NULL:
@@ -2313,13 +2316,24 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	}
 
 	/* We need to compute the inner-join selectivity in all cases */
+	/*
+	 * calculate fraction of right without of matching row on left
+	 *
+	 * FIXME Should be restricted to JOIN_LEFT, we should have similar logic
+	 * for JOIN_FULL.
+	 *
+	 * XXX Probably should calculate unmatched as fraction of the join result,
+	 * not of the relation on the right (because the matched part can have more
+	 * matches per row and thus grow). Not sure. Depends on how it's used later.
+	 */
 	selec_inner = eqjoinsel_inner(opfuncoid, collation,
 								  &vardata1, &vardata2,
 								  nd1, nd2,
 								  isdefault1, isdefault2,
 								  &sslot1, &sslot2,
 								  stats1, stats2,
-								  have_mcvs1, have_mcvs2);
+								  have_mcvs1, have_mcvs2,
+								  &sjinfo->unmatched_frac);
 
 	switch (sjinfo->jointype)
 	{
@@ -2407,7 +2421,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 				bool isdefault1, bool isdefault2,
 				AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 				Form_pg_statistic stats1, Form_pg_statistic stats2,
-				bool have_mcvs1, bool have_mcvs2)
+				bool have_mcvs1, bool have_mcvs2, double *unmatched_frac)
 {
 	double		selec;
 
@@ -2503,7 +2517,10 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 		}
 		CLAMP_PROBABILITY(matchfreq1);
 		CLAMP_PROBABILITY(unmatchfreq1);
+
+		*unmatched_frac = unmatchfreq1;
 		matchfreq2 = unmatchfreq2 = 0.0;
+
 		for (i = 0; i < sslot2->nvalues; i++)
 		{
 			if (hasmatch2[i])
@@ -2581,10 +2598,22 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 		double		nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;
 
 		selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
+
+		/*
+		 * XXX Should this look at nullfrac on either side? Probably depends on
+		 * if we're calculating fraction of NULLs or fraction of unmatched rows.
+		 */
+		// unmatchfreq = (1.0 - nullfrac1) * (1.0 - nullfrac2);
 		if (nd1 > nd2)
+		{
 			selec /= nd1;
+			*unmatched_frac = (nd1 - nd2) * 1.0 / nd1;
+		}
 		else
+		{
 			selec /= nd2;
+			*unmatched_frac = 0.0;
+		}
 	}
 
 	return selec;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7adb..6bc63e648e6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2853,6 +2853,9 @@ struct SpecialJoinInfo
 	bool		semi_can_hash;	/* true if semi_operators are all hash */
 	List	   *semi_operators; /* OIDs of equality join operators */
 	List	   *semi_rhs_exprs; /* righthand-side expressions of these ops */
+
+	/* For outer join, fraction of rows without a match. */
+	Selectivity	unmatched_frac;
 };
 
 /*
-- 
2.34.1

Reply via email to