[ https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-47633: ------------------------------------- Assignee: Bruce Robbins > Cache miss for queries using JOIN LATERAL with join condition > ------------------------------------------------------------- > > Key: SPARK-47633 > URL: https://issues.apache.org/jira/browse/SPARK-47633 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.2, 4.0.0, 3.5.1 > Reporter: Bruce Robbins > Assignee: Bruce Robbins > Priority: Major > Labels: pull-request-available > > For example: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v1 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2) > on c1 = a; > cache table v1; > explain select * from v1; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false > :- LocalTableScan [c1#180, c2#181] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > false] as bigint)),false), [plan_id=113] > +- LocalTableScan [a#173, b#174] > {noformat} > Note that there is no {{InMemoryRelation}}. > However, if you move the join condition into the subquery, the cached plan is > used: > {noformat} > CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2); > CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2); > create or replace temp view v2 as > select * > from t1 > join lateral ( > select c1 as a, c2 as b > from t2 > where t1.c1 = t2.c1); > cache table v2; > explain select * from v2; > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179] > +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, > memory, deserialized, 1 replicas) > +- AdaptiveSparkPlan isFinalPlan=true > +- == Final Plan == > *(1) Project [c1#26, c2#27, a#19, b#20] > +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, > BuildLeft, false > :- BroadcastQueryStage 0 > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- *(1) LocalTableScan [a#19, b#20, c1#30] > +- == Initial Plan == > Project [c1#26, c2#27, a#19, b#20] > +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, > false > :- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [plan_id=37] > : +- LocalTableScan [c1#26, c2#27] > +- LocalTableScan [a#19, b#20, c1#30] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org