Hi,all
This is my ddl and test data.
CREATE TABLE level1_table
(
level1_no serial NOT NULL ,
level1_node_name varchar(255),
PRIMARY KEY (level1_no)
) WITHOUT OIDS;
CREATE TABLE level2_table
(
level2_no serial NOT NULL ,
parent_no int NOT NULL,
level1_node_name varchar(255),
PRIMARY KEY (level2_no)
) WITHOUT OIDS;
ALTER TABLE level2_table
ADD FOREIGN KEY (parent_no)
REFERENCES level1_table (level1_no)
ON UPDATE RESTRICT
ON DELETE RESTRICT
;
CREATE TABLE all_level_status
(
level int NOT NULL,
node_no int NOT NULL,
status int,
PRIMARY KEY (level, node_no)
) WITHOUT OIDS;
--create test data
INSERT INTO level1_table(
level1_no)
select generate_series(1,200);
INSERT INTO level2_table(
level2_no, parent_no)
select
level2_no,level2_no/50+1 as parent_no
from
generate_series(1,) level2_no ;
INSERT INTO all_level_status(
level, node_no, status)
select 1,level1_no,0
from level1_table;
INSERT INTO all_level_status(
level, node_no, status)
select 2,level2_no,0
from level2_table;
--analyze table
analyze level1_table;
analyze level2_table;
analyze all_level_status;
--execute sql
explain analyze
select * from level2_table l2
join (
select l1.* from level1_table l1
join all_level_status als on (als.level=1 and als.node_no=l1.level1_no)
where als.status=0
) normal_l1 on l2.parent_no=normal_l1.level1_no;
result :
"Hash Join (cost=16.39..198.92 rows=200 width=1044) (actual time=0.246..3.772
rows= loops=1)"
" Hash Cond: (l2.parent_no = als.node_no)"
" -> Seq Scan on level2_table l2 (cost=0.00..144.99 rows= width=524)
(actual time=0.011..0.840 rows= loops=1)"
" -> Hash (cost=16.34..16.34 rows=4 width=524) (actual time=0.226..0.226
rows=200 loops=1)"
"Buckets: 1024 Batches: 1 Memory Usage: 8kB"
"-> Merge Join (cost=10.93..16.34 rows=4 width=524) (actual
time=0.077..0.193 rows=200 loops=1)"
" Merge Cond: (als.node_no = l1.level1_no)"
" -> Index Scan using all_level_status_pkey on all_level_status
als (cost=0.29..109.10 rows=200 width=4) (actual time=0.021..0.069 rows=200
loops=1)"
"Index Cond: (level = 1)"
"Filter: (status = 0)"
" -> Sort (cost=10.64..11.14 rows=200 width=520) (actual
time=0.055..0.062 rows=200 loops=1)"
"Sort Key: l1.level1_no"
"Sort Method: quicksort Memory: 34kB"
"-> Seq Scan on level1_table l1 (cost=0.00..3.00 rows=200
width=520) (actual time=0.007..0.020 rows=200 loops=1)"
"Planning time: 1.158 ms"
"Execution time: 4.054 ms"
The main reason for the the big gap between estimated(200) and actual()
rows in line 1 lies in that, in actuality rows for the join conditon
column(node_no) in all_level_status table are not evenly distributed for the
filter condition column(level),but in row estimation,the optimizer takes the
assumption that they are evenly distributed.So as a lesson,I think we should
try to make the distribution of rows for the join condition column even when
possible.
Thus, can we draw the conclusion that the current structure of all_level_status
table is a bad design and should be avoided ?
Chao.
From: chang chao
Sent: Monday, May 16, 2016 17:23
To: pgsql-hackers@postgresql.org
Subject: The rewritting of join conditions caused a very slow query plan.
Hi,all
I have a query that is very slow,and the reason may be in the rewritting of
join conditions.
this is the simplied version table and the key part of the sql.
level1_table and level2_table hold the tree data nodes,
and all_level_status table holds the current status all all nodes of all levels.
(I know that there would be much less trouble in performance if
all_level_status was divided into two tables,namely,level1_status and
level2_status tables.)
table1: level1_table
level1_no PK:serial
level1_node_name :varchar
table2:level2_table
level2_no PK:serial
parent_no FK to level1_table.level1_no
level2_node_name :varchar
table3: all_level_status
level:1 OR 2 PK1
node_no:level1_table.level1_no or level2_table.level2_no PK2
status:0 OR 1(normal or abnormal)
The sql to find all level2 nodes whose parent level nodes are in normal status.
explain analyze
select * from level2_table l2
join (
select l1.* from level1_table l1
join all_level_status als on (als.level=1 and als.node_no=l1.level1_no)
where als.status=0
) normal_l1 on l2.parent_no=normal_l1.level1_no;
this is the query plan .
"Merge Join (cost=3.38..5.13 rows=3 width=158) (actual time=0.087..0.179
rows=21 loops=1)"
" Merge Cond: (als.node_no = l2.parent_no)"
" -> Merge Join (cost=1.63..7.66 rows=19 width=80) (actual time=0.067..0.126
rows=18 loops=1)"
" Merge Cond: (als.node_no = l1.level1_no)"
" -> Index Scan using all_level_status_pkey