Kimahriman commented on pull request #32987:
URL: https://github.com/apache/spark/pull/32987#issuecomment-867716953


   My main assumption in creating this was that it's always faster to run an 
expression once in a function than twice inlined. If this creates a lot of 
extra subexpressions that pushes the code over the 1kb threshold for breaking 
into functions, then the alternative is that you are running a lot of duplicate 
inlined logic, so at the end of the day it all comes down to how often a 
subexpression created by this logic is only evaluated once. 
   
   The two extremes of performance impact I can think of would be:
   - Worst case: Without this logic, you have subexpressions that are just 
small enough to remain inlined. You add one conditional that creates a new 
subexpression that pushes your code over the (default) 1kb limit. That 
conditional never evaluates to true, so your conditional subexpression is 
evaluated once in a function rather than inlined, and all your other 
subexpressions are evaluated with a function call instead of inlined as well. 
This is somewhat bound by the number of subexpressions that can be fit inline 
in the first place, plus the function calls of the one-time evaluated 
conditional subexpressions.
   - Best case: Your existing subexpressions have already been broken out into 
functions before this change, or the new subexpression fits inline as well, and 
the conditional always evaluates to true, so you are running the conditional 
expression once instead of two or more times. This is essentially the existing 
logic where we create a subexpression for things that are always evaluated at 
least twice, so obviously a win here.
   
   Realistically things are going to fall somewhere in the middle. Where the 
extra function calls outweigh the deduped expression execution, who knows. But 
the upside here is pretty large, and I would expect most Spark users would 
expect this to logically happen (don't run the same code twice). If we want to 
leave it with the setting defaulted to disabled I'm fine with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to