francis0407 opened a new pull request #24344: [SPARK-27440][SQL] Optimize 
uncorrelated predicate subquery
URL: https://github.com/apache/spark/pull/24344
 
 
   ## What changes were proposed in this pull request?
   
   This PR is trying to optimize uncorrelated predicate subqueries(InSubquery, 
Exists).
   Currently, we rewrite all the predicate subqueries(`InSubquery`, `Exists`) 
as semi-join/anti-join. But uncorrelated predicate subquery can be evaluated 
using a subplan instead of a join. We can firstly rewrite all the uncorrelated 
predicate subqueries as `Exists`, then optimize it and compute it using a 
subquery physical plan like ScalarSubquery. 
   
   This PR adds a new Optimize rule: RewriteUncorrelatedSubquery.
   This rule rewrites uncorrelated PredicateSubquery expressions as Exists(it 
can also be used for ANY/SOME/ALL). Besides, we can use `limit 1` and `select 
1` after the subquery to reduce the result set. `InSubquery` can be rewritten 
as uncorrelated Exists only when the left side values are literals and the 
subquery has no outer reference. Here is an example,
   ```SQL
   3 in (select b from t) => exists(select 1 from t where b = 3 limit 1)
   ```
   
   Also, this PR adds a new class `Exists` which is the physical copy of Exists 
to be used inside SparkPlan.
   
   
   ## How was this patch tested?
   
   ut
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to