[HACKERS] Clamping reulst row number of joins.

Kyotaro HORIGUCHI Fri, 06 Mar 2015 00:36:03 -0800

Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.


Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.

EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
         JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

0> Nested Loop  (cost=0.43..8.51 rows=68732 width=4)
1>   ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
2>   ->  Append  (cost=0.43..8.48 rows=2 width=4)
3>         ->  Index Only Scan using i_t1_a on t1
4>               (cost=0.43..8.46 rows=1 width=4)
5>               Index Cond: (a = "*VALUES*".column1)
6>         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
7>               Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
8>               ->  Result  (cost=0.00..0.01 rows=1 width=0)

The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.

This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.

I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.

>  Nested Loop  (cost=0.43..16.51 rows=2 width=4)
>    ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
>    ->  Append  (cost=0.43..16.48 rows=2 width=4)
>          ->  Index Only Scan using i_t1_a on t1 
>                        (cost=0.43..16.45 rows=1 width=4)
>                Index Cond: (a = "*VALUES*".column1)
>          ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
>                Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)

Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?


The detailed test environment is described below.

Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).

After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.

=======
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;

EXPLAIN
  SELECT tt2.a
  FROM (SELECT a
        FROM (SELECT a FROM t1
              UNION ALL SELECT 0) t
        JOIN (VALUES (1)) tt(x) ON tt.x = t.a) tt2
  JOIN t2 ON (tt2.a = t2.a);


======== The plan before fix
 Hash Join  (cost=30832.46..36230.60 rows=68732 width=4)
            (actual time=1258.880..1260.519 rows=1 loops=1)
   Hash Cond: ("*VALUES*".column1 = t2.a)
   ->  Nested Loop  (cost=0.43..8.51 rows=68732 width=8)
                    (actual time=0.071..0.104 rows=1 loops=1)
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
                                      (actual time=0.002..0.003 rows=1 loops=1)
         ->  Append  (cost=0.43..8.48 rows=2 width=4)
                     (actual time=0.063..0.093 rows=1 loops=1)
               ->  Index Only Scan using i_t1_a on t1
                        (cost=0.43..8.46 rows=1 width=4)
                        (actual time=0.059..0.075 rows=1 loops=1)
                     Index Cond: (a = "*VALUES*".column1)
                     Heap Fetches: 2
               ->  Subquery Scan on "*SELECT* 2" 
                     (cost=0.00..0.02 rows=1 width=4)
                     (actual time=0.010..0.010 rows=0 loops=1)
                     Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
                     Rows Removed by Filter: 1
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                                 (actual time=0.001..0.002 rows=1 loops=1)
   ->  Hash  (cost=14425.01..14425.01 rows=1000001 width=4)
             (actual time=1250.274..1250.274 rows=1000001 loops=1)
         Buckets: 131072  Batches: 16  Memory Usage: 3227kB
         ->  Seq Scan on t2  (cost=0.00..14425.01 rows=1000001 width=4)
                             (actual time=0.021..608.245 rows=1000001 loops=1)
 Planning time: 0.319 ms
 Execution time: 1260.792 ms

======== The plan after fix
 Nested Loop  (cost=0.86..17.43 rows=2 width=4)
              (actual time=0.103..0.118 rows=1 loops=1)
   Join Filter: ("*VALUES*".column1 = t2.a)
   ->  Nested Loop  (cost=0.43..16.51 rows=2 width=8)
                    (actual time=0.062..0.073 rows=1 loops=1)
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4) 
                                      (actual time=0.003..0.004 rows=1 loops=1)
         ->  Append  (cost=0.43..16.48 rows=2 width=4)
                     (actual time=0.051..0.060 rows=1 loops=1)
               ->  Index Only Scan using i_t1_a on t1
                     (cost=0.43..16.45 rows=1 width=4)
                     (actual time=0.045..0.046 rows=1 loops=1)
                     Index Cond: (a = "*VALUES*".column1)
                     Heap Fetches: 1
               ->  Subquery Scan on "*SELECT* 2" 
                     (cost=0.00..0.02 rows=1 width=4)
                     (actual time=0.005..0.005 rows=0 loops=1)
                     Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
                     Rows Removed by Filter: 1
                     ->  Result  (cost=0.00..0.01 rows=1 width=0)
                                 (actual time=0.002..0.002 rows=1 loops=1)
   ->  Index Only Scan using i_t2_a on t2  (cost=0.42..0.45 rows=1 width=4)
                                  (actual time=0.026..0.027 rows=1 loops=1)
         Index Cond: (a = t1.a)
         Heap Fetches: 1
 Planning time: 0.968 ms
 Execution time: 0.239 ms
=========

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 9c56e26130188d24a42469e18f3b6b7d84dbbc54 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Fri, 6 Mar 2015 16:17:12 +0900
Subject: [PATCH] Clamp row number of join product by the row number calculated
 from joining paths.

When joins contains Appends containing paths selected parameterized
paths, the estimated rows for the join becomes far larger than the
product of the two row numbers of the joining paths. We can safely
clamp the join's row number by the product.
---
 src/backend/optimizer/path/costsize.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 78ef229..9a6643e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1855,6 +1855,14 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 		ntuples = outer_path_rows * inner_path_rows;
 	}
 
+	/*
+	 * In some cases the path has the row number of cartesian product without
+	 * any restriction on the joining tables. We can safely clamp the number
+	 * by ntuples.
+	 */
+	if (path->path.rows > ntuples)
+		path->path.rows = ntuples;
+	
 	/* CPU costs */
 	cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
 	startup_cost += restrict_qual_cost.startup;
@@ -2293,6 +2301,14 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	else
 		run_cost += bare_inner_cost;
 
+	/*
+	 * In some cases the path has the row number of cartesian product without
+	 * any restriction on the joining tables. We can safely clamp the number
+	 * by mergejointuples. See approx_tuple_count about mergejointuples.
+	 */
+	if (path->jpath.path.rows > mergejointuples)
+		path->jpath.path.rows = mergejointuples;
+	
 	/* CPU costs */
 
 	/*
@@ -2691,6 +2707,14 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 	}
 
 	/*
+	 * In some cases the path has the row number of cartesian product without
+	 * any restriction on the joining tables. We can safely clamp the number
+	 * by hashjointuples. See approx_tuple_count about hashjointuples.
+	 */
+	if (path->jpath.path.rows > hashjointuples)
+		path->jpath.path.rows = hashjointuples;
+	
+	/*
 	 * For each tuple that gets through the hashjoin proper, we charge
 	 * cpu_tuple_cost plus the cost of evaluating additional restriction
 	 * clauses that are to be applied at the join.  (This is pessimistic since
-- 
2.1.0.GIT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Clamping reulst row number of joins.

Reply via email to