Is there actually evidence that there's a lot of problems with bad join
orders? ISTM that's one of the areas where the planner actually does a
pretty good job.
Advertising
I put together a quick demonstration using AxBxC where AxB is empty but AxC is
not. Sure enough, postgres chooses AxC first, then xB, which results in extra
work. This is a contrived example, but it would be a pain to try to post a real
example with all the data and analysis. I think it is fair to say that if it is
making the wrong choice in this example, it is sometimes making the wrong choice
in practice. Cross-table statistics are supposed to help avoid this, right?
But I think it would only help if the system had the statistics for AxB. I
think I have been hearing other people propose systems which would track which
joins the system is actually using and then recommend to the user that those
statistics be gathered. I think we need to go beyond that to recommending
statistics (or automatically gathering statistics, or whatever) for joins that
*might* be used given different plans than the one currently chosen by the planner.
test=# create table A (a integer);
CREATE TABLE
Time: 60.151 ms
test=# create table B (b integer);
CREATE TABLE
Time: 3.270 ms
test=# create table C (c integer);
CREATE TABLE
Time: 2.421 ms
test=# insert into A (a) (select * from generate_series(1,10000,2));
INSERT 0 5000
Time: 67.829 ms
test=# insert into B (b) (select * from generate_series(2,10000,2));
INSERT 0 5000
Time: 60.031 ms
test=# insert into C (c) (select * from generate_series(1,1000,2));
INSERT 0 500
Time: 6.303 ms
test=# analyze A;
ANALYZE
Time: 69.669 ms
test=# analyze B;
ANALYZE
Time: 24.548 ms
test=# analyze C;
ANALYZE
Time: 2.936 ms
test=# explain select * from A, B, C where A.a = B.b and A.a = C.c;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=113.50..216.50 rows=500 width=12)
Hash Cond: ("outer".b = "inner".a)
-> Seq Scan on b (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=112.25..112.25 rows=500 width=8)
-> Hash Join (cost=9.25..112.25 rows=500 width=8)
Hash Cond: ("outer".a = "inner".c)
-> Seq Scan on a (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=4)
-> Seq Scan on c (cost=0.00..8.00 rows=500 width=4)
(9 rows)
Time: 4.807 ms
test=# select * from A, B, C where A.a = B.b and A.a = C.c;
a | b | c
---+---+---
(0 rows)
Time: 34.561 ms
test=# select count(*) from A, C where A.a = C.c;
count
-------
500
(1 row)
Time: 8.450 ms
test=# select count(*) from A, B where A.a = B.b;
count
-------
0
(1 row)
Time: 33.757 ms
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org