On Sat, Oct 15, 2022 at 3:27 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andy Fan <zhihui.fan1...@gmail.com> writes: > > After some more self review, I find my proposal has the following side > > effects. > > Yeah, I do not think this works at all. .... The discussion of outer join > reordering in optimizer/README says that that doesn't work, and while > I'm too lazy to construct an example right now, I believe it's true. > I came to this topic again recently and have finally figured out the reason. It looks to me that semi join is slightly different with outer join in this case. The following test case can show why we have to pull_up_sublinks_qual_recurse to either LHS or RHS rather than the JoinExpr. create table t1(a int, b int, c int); create table t2(a int, b int, c int); create table t3(a int, b int, c int); insert into t1 select 1, 1, 2; insert into t2 select 1, 2, 1; insert into t2 select 1, 1, 2; insert into t3 select 1, 1, 2; select * from t1 where exists (select 1 from t2 -- below references to t1 and t2 at the same time where exists (select 1 from t3 where t1.c = t2.c and t2.b = t3.b) and t1.a = t2.a); which can be transformed to SELECT * FROM t1 SEMI JOIN t2 ON t1.a = t2.a AND exists (select 1 from t3 where t1.c = t2.c and t2.b = t3.b) Here the semantics of the query is return the rows in T1 iff there is a row in t2 matches the whole clause (t1.a = t2.a AND exists..); But If we transform it to SELECT * FROM (t1 SEMI JOIN t2 ON t1.a = t2.a) SEMI JOIN t3 on t1.c = t2.c and t2.b = t3.b; The scan on T2 would stop if ONLY (t1.a = t2.a) matches and the following rows will be ignored. However the matched rows may doesn't match the exists clause! So in the above example, the correct result set will be 1 row. If we pull up the sublink above the JoinExpr, no row would be found. The attached is just a comment and a test case to help understand why we have to do things like this. -- Best Regards Andy Fan
v1-0001-test-case-and-comments-to-help-understand-why-we-.patch
Description: Binary data