Is there actually evidence that there's a lot of problems with bad join
orders? ISTM that's one of the areas where the planner actually does a
pretty good job.

I put together a quick demonstration using AxBxC where AxB is empty but AxC is not. Sure enough, postgres chooses AxC first, then xB, which results in extra work. This is a contrived example, but it would be a pain to try to post a real example with all the data and analysis. I think it is fair to say that if it is making the wrong choice in this example, it is sometimes making the wrong choice in practice. Cross-table statistics are supposed to help avoid this, right? But I think it would only help if the system had the statistics for AxB. I think I have been hearing other people propose systems which would track which joins the system is actually using and then recommend to the user that those statistics be gathered. I think we need to go beyond that to recommending statistics (or automatically gathering statistics, or whatever) for joins that *might* be used given different plans than the one currently chosen by the planner.

test=# create table A (a integer);
Time: 60.151 ms
test=# create table B (b integer);
Time: 3.270 ms
test=# create table C (c integer);
Time: 2.421 ms
test=# insert into A (a) (select * from generate_series(1,10000,2));
INSERT 0 5000
Time: 67.829 ms
test=# insert into B (b) (select * from generate_series(2,10000,2));
INSERT 0 5000
Time: 60.031 ms
test=# insert into C (c) (select * from generate_series(1,1000,2));
INSERT 0 500
Time: 6.303 ms
test=# analyze A;
Time: 69.669 ms
test=# analyze B;
Time: 24.548 ms
test=# analyze C;
Time: 2.936 ms
test=# explain select * from A, B, C where A.a = B.b and A.a = C.c;
                                QUERY PLAN
 Hash Join  (cost=113.50..216.50 rows=500 width=12)
   Hash Cond: ("outer".b = "inner".a)
   ->  Seq Scan on b  (cost=0.00..73.00 rows=5000 width=4)
   ->  Hash  (cost=112.25..112.25 rows=500 width=8)
         ->  Hash Join  (cost=9.25..112.25 rows=500 width=8)
               Hash Cond: ("outer".a = "inner".c)
               ->  Seq Scan on a  (cost=0.00..73.00 rows=5000 width=4)
               ->  Hash  (cost=8.00..8.00 rows=500 width=4)
                     ->  Seq Scan on c  (cost=0.00..8.00 rows=500 width=4)
(9 rows)

Time: 4.807 ms
test=# select * from A, B, C where A.a = B.b and A.a = C.c;
 a | b | c
(0 rows)

Time: 34.561 ms
test=# select count(*) from A, C where A.a = C.c;
(1 row)

Time: 8.450 ms
test=# select count(*) from A, B where A.a = B.b;
(1 row)

Time: 33.757 ms

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to