Re: [HACKERS] One question about transformation ANY Sublinks into joins

2016-07-23 Thread Armor
After we pull up this sublink as semi join , when make join rel for semi join, 
the optimizer will take hash join method into account if a unique path can be 
created with the RHS, for detail please check make_join_rel in 
src/backend/optimizer/path/joinrels.c. 
For this case, the cost of  hash join is cheaper than semi join, so you can see 
the planner chose the hash join rather than semi join.


--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Robert Haas";;
Date:  Fri, Jul 22, 2016 00:23 AM
To:  "Armor"; 
Cc:  "pgsql-hackers"; 
Subject:  Re: [HACKERS] One question about transformation ANY Sublinks into 
joins



On Sun, Jul 17, 2016 at 5:33 AM, Armor  wrote:
> Hi
> I run a simple SQL with latest PG??
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>  QUERY PLAN
> 
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>   Table "public.t1"
>  Column |  Type   | Modifiers
> +-+---
>  id1| integer |
>  c1 | integer |
>
> postgres=# \d t2
>   Table "public.t2"
>  Column |  Type   | Modifiers
> +-+---
>  id2| integer |
>  c2 | integer |
>
>  I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>  However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>QUERY PLAN
> 
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>->  Hash  (cost=46.16..46.16 rows=226 width=8)
>  ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>Group Key: t2.id2, t2.c2
>->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] One question about transformation ANY Sublinks into joins

2016-07-21 Thread Dilip Kumar
On Thu, Jul 21, 2016 at 9:53 PM, Robert Haas  wrote:

> It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?


I guess, Hash Join will do here,
because inner hash node is, on hash aggregate with group key on t2.id2,
t2.c2
and hash join condition is (t1.id1 = t2.id2) AND (t1.c1 = t2.c2).

So I think these together will make sure that we don't get duplicate tuple
for one outer record.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] One question about transformation ANY Sublinks into joins

2016-07-21 Thread Robert Haas
On Sun, Jul 17, 2016 at 5:33 AM, Armor  wrote:
> Hi
> I run a simple SQL with latest PG:
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>  QUERY PLAN
> 
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>   Table "public.t1"
>  Column |  Type   | Modifiers
> +-+---
>  id1| integer |
>  c1 | integer |
>
> postgres=# \d t2
>   Table "public.t2"
>  Column |  Type   | Modifiers
> +-+---
>  id2| integer |
>  c2 | integer |
>
>  I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>  However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>QUERY PLAN
> 
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>->  Hash  (cost=46.16..46.16 rows=226 width=8)
>  ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>Group Key: t2.id2, t2.c2
>->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] One question about transformation ANY Sublinks into joins

2016-07-17 Thread Armor
Hi
I run a simple SQL with latest PG??
postgres=# explain select * from t1 where id1 in (select id2 from t2 where 
c1=c2);
 QUERY PLAN 

 Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
   Filter: (t1.c1 = c2)
(5 rows)



and the table schema are as following:


postgres=# \d t1
  Table "public.t1"
 Column |  Type   | Modifiers 
+-+---
 id1| integer | 
 c1 | integer | 


postgres=# \d t2
  Table "public.t2"
 Column |  Type   | Modifiers 
+-+---
 id2| integer | 
 c2 | integer | 



 I find PG decide not to pull up this sublink because the whereClauses in 
this sublink refer to the Vars of parent query, for detail please check the 
function named convert_ANY_sublink_to_join in 
src/backend/optimizer/plan/subselect.c. 
 However, for such simple sublink which has no agg, no window function, no 
limit, may be we can carefully pull up the predicates in whereCluase which 
refers to the Vars of parent query, then pull up this sublink and produce a 
query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where 
c1=c2);
   QUERY PLAN   

 Hash Join  (cost=49.55..99.23 rows=565 width=8)
   Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
   ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=46.16..46.16 rows=226 width=8)
 ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
   Group Key: t2.id2, t2.c2
   ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)
   
--
Jerry Yu
https://github.com/scarbrofair