[GitHub] [spark] WangGuangxin opened a new pull request, #36626: [SPARK-39249][SQL] Improve subexpression elimination for conditional expressions

GitBox Sat, 21 May 2022 07:55:41 -0700


WangGuangxin opened a new pull request, #36626:
URL: https://github.com/apache/spark/pull/36626


   ### What changes were proposed in this pull request?
   Currently we can do subexpression elimination for conditional expressions 
when the subexpression is common across all `branchGroups`. In fact, we can 
farther improve this when there are common expressions between 
`alwaysEvaluatedInputs` and `branchGroups`.
   
   ### Why are the changes needed?
   Take the following case as an example
   ```
   IF(IsNull(a), b, KnowNotNull(a))
   ```
   `a` **may miss subexpression elimination chances** since it is not the 
common expression between all `branchGroups`, but it's **safe** to evaluate `a` 
as common subexpression and eagerly execute it since it's part of the 
prediction, which will always be executed. If `a` is a time-expensive 
expression, we may waste time on running it.
   
   This kind of expressions are common when we do `sum` on decimal type because 
of https://github.com/apache/spark/pull/29026
   
https://github.com/apache/spark/blob/291d155b3c514f8b590a6b078f7efd42a30e67f0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L125
   
   Performance result on TPC-DS 10T
   Query| With this PR | Without this PR | Speed up
   -- | -- | -- | --
   4 | 310.862 | 635.299 | 104.37%
   80 | 36.723 | 46.006 | 25.28%
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   add more UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WangGuangxin opened a new pull request, #36626: [SPARK-39249][SQL] Improve subexpression elimination for conditional expressions

Reply via email to