WangGuangxin opened a new pull request, #36626: URL: https://github.com/apache/spark/pull/36626
### What changes were proposed in this pull request? Currently we can do subexpression elimination for conditional expressions when the subexpression is common across all `branchGroups`. In fact, we can farther improve this when there are common expressions between `alwaysEvaluatedInputs` and `branchGroups`. ### Why are the changes needed? Take the following case as an example ``` IF(IsNull(a), b, KnowNotNull(a)) ``` `a` **may miss subexpression elimination chances** since it is not the common expression between all `branchGroups`, but it's **safe** to evaluate `a` as common subexpression and eagerly execute it since it's part of the prediction, which will always be executed. If `a` is a time-expensive expression, we may waste time on running it. This kind of expressions are common when we do `sum` on decimal type because of https://github.com/apache/spark/pull/29026 https://github.com/apache/spark/blob/291d155b3c514f8b590a6b078f7efd42a30e67f0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L125 Performance result on TPC-DS 10T Query| With this PR | Without this PR | Speed up -- | -- | -- | -- 4 | 310.862 | 635.299 | 104.37% 80 | 36.723 | 46.006 | 25.28% ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add more UT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
