cloud-fan commented on PR #32987:
URL: https://github.com/apache/spark/pull/32987#issuecomment-1258168996

   I don't have a good idea about how to fix this issue. Assume a subexpression 
appears in both a normal branch and a conditional branch:
   1. The condition branch may never be hit at runtime. Doing CSE (common 
subexpression elimination) is pure overhead.
   2. The condition branch may always be hit at runtime. Doing CSE benefits a 
lot.
   
   It's hard to make a decision at compile time, and the best option is 
probably JIT, which is very complicated. Another idea is to reduce the overhead 
of CSE. Its marjor overhead is the need of a mutable variable to store the 
result. The overhead is very minor if we just add a few mutable mutable 
variables. So here is the idea:
   1. subexpression still has reference count. The count increases if the 
subexpression is refered by a normal branch, or by all conditional branches of 
a conditional expression. (this is the behavior today)
   2. trigger CSE if the ref count is `> 1`. (this is the behavior today)
   3. If the ref count of a subexpression is 1, and it's also referenced by an 
conditional branch, trigger CSE. Note that, we can only trigger CSE in this 
case less than 10 (this is configurable) times.
   
   Also cc @rednaxelafx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to