cloud-fan commented on PR #32987: URL: https://github.com/apache/spark/pull/32987#issuecomment-1258168996
I don't have a good idea about how to fix this issue. Assume a subexpression appears in both a normal branch and a conditional branch: 1. The condition branch may never be hit at runtime. Doing CSE (common subexpression elimination) is pure overhead. 2. The condition branch may always be hit at runtime. Doing CSE benefits a lot. It's hard to make a decision at compile time, and the best option is probably JIT, which is very complicated. Another idea is to reduce the overhead of CSE. Its marjor overhead is the need of a mutable variable to store the result. The overhead is very minor if we just add a few mutable mutable variables. So here is the idea: 1. subexpression still has reference count. The count increases if the subexpression is refered by a normal branch, or by all conditional branches of a conditional expression. (this is the behavior today) 2. trigger CSE if the ref count is `> 1`. (this is the behavior today) 3. If the ref count of a subexpression is 1, and it's also referenced by an conditional branch, trigger CSE. Note that, we can only trigger CSE in this case less than 10 (this is configurable) times. Also cc @rednaxelafx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
