dbaliafroozeh opened a new pull request #26705: [SPARK-30072] Create dedicated 
planner for subqueries
URL: https://github.com/apache/spark/pull/26705
 
 
   ### What changes were proposed in this pull request?
   
   This PR changes subquery planning by calling the planner and plan 
preparation rules on the subquery plan directly. Before we were creating a 
`QueryExecution` instance for subqueries to get the executedPlan. This would 
re-run analysis and optimization on the subqueries plan. Running the analysis 
again on an optimized query plan can have unwanted consequences, as some rules, 
for example `DecimalPrecision`, are not idempotent.
   
   As an example, consider the expression `1.7 * avg(a)` which after applying 
the `DecimalPrecision` rule becomes:
   
   ```
   promote_precision(1.7) * promote_precision(avg(a))
   ```
   
   After the optimization, more specifically the constant folding rule, this 
expression becomes:
   
   ```
   1.7 * promote_precision(avg(a))
   ```
   
   Now if we run the analyzer on this optimized query again, we will get:
   
   ```
   promote_precision(1.7) * promote_precision(promote_precision(avg(a)))
   ```
   
   Which will later optimized as:
   
   ```
   1.7 * promote_precision(promote_precision(avg(a)))
   ```
   
   As can be seen, re-running the analysis and optimization on this expression 
results in an expression with extra nested promote_preceision nodes. Adding 
unneeded nodes to the plan is problematic because it can eliminate situations 
where we can reuse the plan. 
   
   We opted to introduce dedicated planners for subuqueries, instead of making 
the DecimalPrecision rule idempotent, because this eliminates this entire 
category of problems. Another benefit is that planning time for subqueries is 
reduced.
   
   
   ### How was this patch tested?
   Unit tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to