bart-samwel commented on pull request #30565: URL: https://github.com/apache/spark/pull/30565#issuecomment-737334913
Do we pull out common subexpressions to just before the place where they are used, or all the way before the entire filter? Because if we pull them out farther, then we will get more exceptions and we will potentially slow things down. Longer version: This is a lot scarier than for Project. The reason is that in project, you can be pretty certain that all expressions will be evaluated, unless they are in conditionals. But in filters there are typically conjunctions, and in conjunctions, shortcut evaluation will hide potential exceptions. If you then pull some of those expressions ahead of the entire filter, then you may get errors where you didn't get errors before. This is especially scary when ANSI dialect is enabled, which makes a lot of expressions return errors. E.g., in a filter like this: `WHERE d <> 0 AND a/d > 3 AND a/d < 23`, it's likely that it currently works because d<>0 will shortcut the other expressions if `d = 0`. But if you pull out the common subexpression `a/d` and evaluate it before the rest of the filter, then you'll get exceptions where you wouldn't currently get them. Another thing to think about is expensive expressions. What if you have a filter `WHERE selective_filter AND f(expensive_expression) AND g(expensive_expression)`? If you pull the expensive expression in front, then you evaluate it many more times than if you don't. But if we pull the expression to just before the conjunct that uses it first, then we're good. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
