Re: [GENERAL] Why hash join cost calculation need reduction

高健 Thu, 13 Jun 2013 22:17:53 -0700

Hello:

Thanks for replying!




I understand it a little more.


And I compared the following statements:

First:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id;

                                                    QUERY
PLAN

------------------------------------------------------------------------------------------------------------------

 Hash Join  (cost=1.07..2.15 rows=3 width=84) (actual time=0.017..0.019
rows=3 loops=1)

   Hash Cond: (s.cust_id = c.cust_id)

   ->  Seq Scan on sales s  (cost=0.00..1.04 rows=4 width=42) (actual
time=0.004..0.004 rows=4 loops=1)

   ->  Hash  (cost=1.03..1.03 rows=3 width=42) (actual time=0.004..0.004
rows=3 loops=1)

         Buckets: 1024  Batches: 1  Memory Usage: 1kB

         ->  Seq Scan on customers c  (cost=0.00..1.03 rows=3 width=42)
(actual time=0.001..0.001 rows=3 loops=1)

 Total runtime: 0.046 ms

(7 rows)



Second:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id =2;

                                                 QUERY
PLAN

------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.00..2.10 rows=1 width=84) (actual time=0.000..0.000
rows=1 loops=1)

   ->  Seq Scan on sales s  (cost=0.00..1.05 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

         Filter: (cust_id = 2)

         Rows Removed by Filter: 3

   ->  Seq Scan on customers c  (cost=0.00..1.04 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

         Filter: (cust_id = 2)

         Rows Removed by Filter: 2

 Total runtime: 0.000 ms

(8 rows)



Third:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id <4;

                                                 QUERY
PLAN

------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.00..2.13 rows=1 width=84) (actual time=0.014..0.018
rows=3 loops=1)

   Join Filter: (s.cust_id = c.cust_id)

   Rows Removed by Join Filter: 9

   ->  Seq Scan on customers c  (cost=0.00..1.04 rows=1 width=42) (actual
time=0.007..0.007 rows=3 loops=1)

         Filter: (cust_id < 4)

   ->  Seq Scan on sales s  (cost=0.00..1.04 rows=4 width=42) (actual
time=0.000..0.000 rows=4 loops=3)

 Total runtime: 0.038 ms

(7 rows)



postgres=#



The first sql statement and third sql statment really drive the
final_cost_hashjoin function to be called.



I think  For the above third one,

 cost_qual_eval(&hash_qual_cost, hashclauses, root)  is  for  <s.cust_id =
c.cust_id>

And

cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root) is for
<s.cust_id = c.cust_id and c.cust_id <4>



I've  found the following calling relation:

hash_inner_and_outer  à try_hashjoin_path  à create_hashjoin_path
àfinal_cost_hashjoin



For the second sql statement ,

In the hash_inner_and_outer function,

 the < if ( hashclauses) > condition is false,  So there is no chance to
try a hashjoin path.



That is :

When I use  the  where condition such as  <cust_id=2>,

postgresql is clever enough to know it is better to make  seqscan and
filter ?

2013/6/13 Tom Lane <t...@sss.pgh.pa.us>

> Stephen Frost <sfr...@snowman.net> writes:
> > * 高健 (luckyjack...@gmail.com) wrote:
> >> Why the reduction  is needed here  for cost calculation?
>
> >       cost_qual_eval(&hash_qual_cost, hashclauses, root);
> > returns the costs for *just the quals which can be used for the
> > hashjoin*, while
> >       cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root);
> > returns the costs for *ALL the quals*
>
> Right.  Note what it says in create_hashjoin_path:
>
>  * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
>  ...
>  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
>  *        (this should be a subset of the restrict_clauses list)
>
> So the two cost_qual_eval() calls are *both* counting the cost of the
> hashclauses, and we have to undo that to get at just the cost of any
> additional clauses beside the hash clauses.  See the comment about the
> usage of qp_qual_cost further down:
>
>     /*
>      * For each tuple that gets through the hashjoin proper, we charge
>      * cpu_tuple_cost plus the cost of evaluating additional restriction
>      * clauses that are to be applied at the join.  (This is pessimistic
> since
>      * not all of the quals may get evaluated at each tuple.)
>      */
>     startup_cost += qp_qual_cost.startup;
>     cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
>     run_cost += cpu_per_tuple * hashjointuples;
>
>             regards, tom lane
>

Re: [GENERAL] Why hash join cost calculation need reduction

Reply via email to