I want to clarify small doubt in this regard. In function make_rels_by_clause_joins(..), it tries to join the given relation old_rel with other relations if there exist a join between them. What I can understand is, it is because if there exists a join condition its better to join with that relation. However if the given relation old_rel is not able to join any relation, then why can't it try to make cross-join with other relations there itself.
-----Original Message----- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Friday, April 13, 2012 8:02 PM To: pgsql-hackers@postgreSQL.org Subject: [HACKERS] Improving our clauseless-join heuristics I looked into the behavior complained of here: http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php The problem query can be abstracted to select * from a, b, c, d where a.x = b.y and (a.z = c.c or a.z = d.d) Table a is much larger than the others (in fact, in the given example c and d are known to be single rows), and there are indexes on the mentioned columns of a. In this situation, the best plan is to cross-join c and d, then use a BitmapOr indexscan to pick out the rows of a that satisfy the OR condition, and finally join that small number of rows to b. The planner will use a cross-join-first plan if we omit b and the first WHERE clause from the query; but in the query as given, it fails to discover that plan and falls back on a vastly inferior plan that involves forming the a/b join first. The reason for this behavior is the anti-clauseless-join heuristics in join_search_one_level(). Without b, there are no join clauses available at join level 2, so the planner is forced to form all three 2-way cross joins; and then at level 3 it finds out that joining a to c/d works well. With b, we find the a/b join has a usable join clause so we form that join, and then we decide not to make any 2-way clauseless joins. So the c/d join is never constructed and there is no way to exploit the desirable indexscan at higher levels. After some reflection I think that the blame should be pinned on have_relevant_joinclause(), which is essentially defined as "is there any join clause that can be evaluated at the join of these two relations?". I think it would work better to define it as "is there any join clause that both these relations participate in?". In the majority of real-world queries, join clauses relate exactly two relations, so that these two definitions are equivalent. However, when we do have join clauses involving 3 or more relations, such as the OR clause in this example, it's evidently useful to consider cross-product joins of the smaller relations so that the join clause can be applied during the scan of the largest table. It would probably not be a good idea to back-patch such a change, since it might have consequences I can't foresee at the moment. But I'm strongly tempted to squeeze it into 9.2. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers