peter-toth commented on PR #39722: URL: https://github.com/apache/spark/pull/39722#issuecomment-1408744462
> @peter-toth This issue happens for a specific complex query that has a huge expression tree containing Add operators interleaved by non Add operators. > > For this query, the query ends up consuming 10x more memory and ultimately results in executor failures. > > The issue is related to canonicalization and why it is causing issues in the executors is because the codegen component relies on expression canonicalization to deduplicate expressions. > > When we have a large number of Adds interleaved by non-Add operatos, I think [this line](https://github.com/apache/spark/pull/37851/files#diff-7278f2db37934522ee7c74b71525153234cff245cefaf996957e4a9ff3dbaacdR1171) ends up materializing a new canonicalized expression tree at every non-Add operator. > > In our case, analyzing the executor heap histogram shows that the additional memory is consumed by a large number of Add objects. > > Sorry for not being clear in the initial PR description. I will update it now. So this is very likely because subexpression elimination uses `EquivalentExpressions.addExprTree()` that recurses the expression tree and canonicalizes each and every node. Since canonicalization of a `CommutativeExpression` node doesn't reuse the canonicalized form of its `CommutativeExpression` children, each `Add` node in that nested group of `Add`s has a huge separate canonicalized tree. The new `MultiAdd` looks like a good way to fix this issue, but as @cloud-fan mentioned, we should make it more general to handle other `CommutativeExpression`s too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
