On 11/29/24 05:33, Tom Lane wrote:
Avoid mislabeling of lateral references when pulling up a subquery.

If we are pulling up a subquery that's under an outer join, and
the subquery's target list contains a strict expression that uses
both a subquery variable and a lateral-reference variable, it's okay
to pull up the expression without wrapping it in a PlaceHolderVar.
That's safe because if the subquery variable is forced to NULL
by the outer join, the expression result will come out as NULL too,
so we don't have to force that outcome by evaluating the expression
below the outer join.  It'd be correct to wrap in a PHV, but that can
lead to very significantly worse plans, since we'd then have to use
a nestloop plan to pass down the lateral reference to where the
expression will be evaluated.
Pardon the noise, but I'm curious why the optimiser must choose NestLoop in the case of lateral reference.

It would be nice to provide alternatives. Because now we have some corner cases. For example, with pull-up correlated subqueries, we've got one degraded case. Look the following:

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1(x int, y int);
CREATE TABLE t2(x int, y int);
INSERT INTO t1 (x,y)
  SELECT gs,-gs FROM generate_series(1,1E4) AS gs;
ANALYZE t1,t2;

EXPLAIN (ANALYZE, COSTS ON)
SELECT t1.* FROM t1 LEFT JOIN LATERAL (
  SELECT t3.* FROM t1 t3 WHERE t3.x=t1.x) AS t4
ON (t4.y IN (SELECT y FROM t2 WHERE t4.x=t2.x));

In previous versions Postgres executed this plan in milliseconds:

 Hash Left Join
   Hash Cond: (t1.x = t3.x)
   ->  Seq Scan on t1
   ->  Hash
         ->  Seq Scan on t1 t3
               Filter: (ANY (y = (SubPlan 1).col1))
               SubPlan 1
                 ->  Seq Scan on t2
                       Filter: (t3.x = x)
 Planning Time: 0.175 ms
 Execution Time: 6.396 ms

But now we have seconds:

 Nested Loop Left Join
   ->  Seq Scan on t1
   ->  Nested Loop Semi Join
         Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
         ->  Seq Scan on t1 t3
               Filter: (x = t1.x)
         ->  Seq Scan on t2
 Planning Time: 1.309 ms
 Execution Time: 6780.217 ms

Correlated subquery pull-up is a nice optimisation, of course. So, why not let optimiser try a HashJoin like that (not a really generated plan, just my imagination):

 Hash Left Join
   Hash Cond: (t1.x = t3.x)
   ->  Seq Scan on t1
   Hash
   ->  Nested Loop Semi Join
         Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
         ->  Seq Scan on t1 t3
         ->  Seq Scan on t2

Does the optimiser have some internal limits to let such a path?

--
regards, Andrei Lepikhov


Reply via email to