bart-samwel commented on pull request #30565:
URL: https://github.com/apache/spark/pull/30565#issuecomment-737334913


   Do we pull out common subexpressions to just before the place where they are 
used, or all the way before the entire filter? Because if we pull them out 
farther, then we will get more exceptions and we will potentially slow things 
down.
   
   Longer version:
   
   This is a lot scarier than for Project. The reason is that in project, you 
can be pretty certain that all expressions will be evaluated, unless they are 
in conditionals. But in filters there are typically conjunctions, and in 
conjunctions, shortcut evaluation will hide potential exceptions. If you then 
pull some of those expressions ahead of the entire filter, then you may get 
errors where you didn't get errors before. This is especially scary when ANSI 
dialect is enabled, which makes a lot of expressions return errors.
   
   E.g., in a filter like this: `WHERE d <> 0 AND a/d > 3 AND a/d < 23`, it's 
likely that it currently works because d<>0 will shortcut the other expressions 
if `d = 0`. But if you pull out the common subexpression `a/d` and evaluate it 
before the rest of the filter, then you'll get exceptions where you wouldn't 
currently get them.
   
   Another thing to think about is expensive expressions. What if you have a 
filter `WHERE selective_filter AND f(expensive_expression) AND 
g(expensive_expression)`? If you pull the expensive expression in front, then 
you evaluate it many more times than if you don't.
   
   But if we pull the expression to just before the conjunct that uses it 
first, then we're good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to