peter-toth commented on PR #39722:
URL: https://github.com/apache/spark/pull/39722#issuecomment-1408744462

   > @peter-toth This issue happens for a specific complex query that has a 
huge expression tree containing Add operators interleaved by non Add operators.
   > 
   > For this query, the query ends up consuming 10x more memory and ultimately 
results in executor failures.
   > 
   > The issue is related to canonicalization and why it is causing issues in 
the executors is because the codegen component relies on expression 
canonicalization to deduplicate expressions.
   > 
   > When we have a large number of Adds interleaved by non-Add operatos, I 
think [this 
line](https://github.com/apache/spark/pull/37851/files#diff-7278f2db37934522ee7c74b71525153234cff245cefaf996957e4a9ff3dbaacdR1171)
 ends up materializing a new canonicalized expression tree at every non-Add 
operator.
   > 
   > In our case, analyzing the executor heap histogram shows that the 
additional memory is consumed by a large number of Add objects.
   > 
   > Sorry for not being clear in the initial PR description. I will update it 
now.
   
   So this is very likely because subexpression elimination uses 
`EquivalentExpressions.addExprTree()` that recurses the expression tree and 
canonicalizes each and every node. Since canonicalization of a 
`CommutativeExpression` node doesn't reuse the canonicalized form of its 
`CommutativeExpression` children, each `Add` node in that nested group of 
`Add`s has a huge separate canonicalized tree.
   The new `MultiAdd` looks like a good way to fix this issue, but as 
@cloud-fan mentioned, we should make it more general to handle other 
`CommutativeExpression`s too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to