Hello: Thanks for replying!
I understand it a little more. And I compared the following statements: First: postgres=# explain analyze select * from sales s inner join customers c on s.cust_id = c.cust_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Hash Join (cost=1.07..2.15 rows=3 width=84) (actual time=0.017..0.019 rows=3 loops=1) Hash Cond: (s.cust_id = c.cust_id) -> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual time=0.004..0.004 rows=4 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=42) (actual time=0.004..0.004 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on customers c (cost=0.00..1.03 rows=3 width=42) (actual time=0.001..0.001 rows=3 loops=1) Total runtime: 0.046 ms (7 rows) Second: postgres=# explain analyze select * from sales s inner join customers c on s.cust_id = c.cust_id and c.cust_id =2; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..2.10 rows=1 width=84) (actual time=0.000..0.000 rows=1 loops=1) -> Seq Scan on sales s (cost=0.00..1.05 rows=1 width=42) (actual time=0.000..0.000 rows=1 loops=1) Filter: (cust_id = 2) Rows Removed by Filter: 3 -> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual time=0.000..0.000 rows=1 loops=1) Filter: (cust_id = 2) Rows Removed by Filter: 2 Total runtime: 0.000 ms (8 rows) Third: postgres=# explain analyze select * from sales s inner join customers c on s.cust_id = c.cust_id and c.cust_id <4; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..2.13 rows=1 width=84) (actual time=0.014..0.018 rows=3 loops=1) Join Filter: (s.cust_id = c.cust_id) Rows Removed by Join Filter: 9 -> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual time=0.007..0.007 rows=3 loops=1) Filter: (cust_id < 4) -> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual time=0.000..0.000 rows=4 loops=3) Total runtime: 0.038 ms (7 rows) postgres=# The first sql statement and third sql statment really drive the final_cost_hashjoin function to be called. I think For the above third one, cost_qual_eval(&hash_qual_cost, hashclauses, root) is for <s.cust_id = c.cust_id> And cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root) is for <s.cust_id = c.cust_id and c.cust_id <4> I've found the following calling relation: hash_inner_and_outer à try_hashjoin_path à create_hashjoin_path àfinal_cost_hashjoin For the second sql statement , In the hash_inner_and_outer function, the < if ( hashclauses) > condition is false, So there is no chance to try a hashjoin path. That is : When I use the where condition such as <cust_id=2>, postgresql is clever enough to know it is better to make seqscan and filter ? 2013/6/13 Tom Lane <t...@sss.pgh.pa.us> > Stephen Frost <sfr...@snowman.net> writes: > > * 高健 (luckyjack...@gmail.com) wrote: > >> Why the reduction is needed here for cost calculation? > > > cost_qual_eval(&hash_qual_cost, hashclauses, root); > > returns the costs for *just the quals which can be used for the > > hashjoin*, while > > cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root); > > returns the costs for *ALL the quals* > > Right. Note what it says in create_hashjoin_path: > > * 'restrict_clauses' are the RestrictInfo nodes to apply at the join > ... > * 'hashclauses' are the RestrictInfo nodes to use as hash clauses > * (this should be a subset of the restrict_clauses list) > > So the two cost_qual_eval() calls are *both* counting the cost of the > hashclauses, and we have to undo that to get at just the cost of any > additional clauses beside the hash clauses. See the comment about the > usage of qp_qual_cost further down: > > /* > * For each tuple that gets through the hashjoin proper, we charge > * cpu_tuple_cost plus the cost of evaluating additional restriction > * clauses that are to be applied at the join. (This is pessimistic > since > * not all of the quals may get evaluated at each tuple.) > */ > startup_cost += qp_qual_cost.startup; > cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple; > run_cost += cpu_per_tuple * hashjointuples; > > regards, tom lane >