"Kevin Grittner" <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> wrote: >> I believe that the optimizable cases for EXISTS are those where the >> EXISTS() is either at the top level of WHERE, or just underneath a >> NOT, > The rest of the plan makes sense to me, but this part seems narrow. > There's probably a good reason for that which is beyond my depth, but > attached is a view that is used for calculating statistics for a > database which is primarily used for "case management" purposes. If > EXISTS could also be optimized in the contexts used there, it would be > great. I chewed on that for awhile. We can certainly optimize EXISTS that's appearing in the ON-condition of a regular JOIN, because that's not really semantically different from a WHERE-condition. But I don't think it's going to be reasonable to improve EXISTS in outer-JOIN ON conditions. There are a couple of problems. Consider
t1 LEFT JOIN t2 ON (t1.f1 = t2.f2 AND EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx AND t3.f4 = t2.fy)) To implement this with the correct semantics, we'd have to have the t1/t2 join and the t1/t2/t3 join going on in the same execution node, with two different join behaviors (LEFT and SEMI). There isn't any way AFAICS to factor it into two separate steps. That's unreasonably complicated, and it's not clear that you'd get any better performance anyway than the current implementation (which'd treat the EXISTS as a subplan). Even worse, let the EXISTS be a degenerate case: t1 LEFT JOIN t2 ON (t1.f1 = t2.f2 AND EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx)); We can't actually treat this EXISTS as a semijoin between t1 and t3, either before or after the LEFT JOIN; because then the behavior would be to drop t1 rows that have no t3 match, which is not what this query specifies. (Note: the other degenerate case, where the EXISTS depends only on t2, *could* be optimized since we could just let the semijoin be performed within the RHS of the LEFT JOIN.) So this is not something I'm going to tackle; at least not this devel cycle. One small step we can take in this direction, though, is to improve the planner's internal handling of the qual conditions for IN and EXISTS. Right now the process is just to throw the sub-select into the main range table and put the IN join conditions into the same place in WHERE that the IN-clause was to start with. The trouble with this is that the distribute_quals_to_rels processing has no idea that there's anything special about the IN join conditions. We got away with that for the limited case of IN clauses at the top level of WHERE, but it's become clear to me over the weekend that this has no hope of working for NOT EXISTS --- since that's effectively an outer join, it absolutely has to have the same kinds of qual-scheduling constraints as ordinary outer joins do. So we need a data structure that distribute_quals_to_rels can work with. What I think needs to happen is that the initial pass that pulls up optimizable IN/EXISTS sub-selects should not merge the SubLink's replacement qual clauses seamlessly, but put them underneath a new node type, say "FlattenedSubLink", that retains knowledge of the join it's representing. The FlattenedSubLink would survive only as far as distribute_quals_to_rels, which would distribute out the contained qual conditions instead of the FlattenedSubLink itself --- but only after marking them properly for the outer-join restrictions. This representation would make it feasible to work with IN/EXISTS that are inside JOIN ON conditions, which the present representation using a single in_info_list really can't do. The semantic issues are still there but at least the representation isn't getting in the way ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers