db-scnakandala commented on PR #39722:
URL: https://github.com/apache/spark/pull/39722#issuecomment-1407791814

   > > * With the [[SPARK-40362][SQL] Fix BinaryComparison canonicalization 
#37851](https://github.com/apache/spark/pull/37851) in the expression 
canonicalization, a complex query with a large number of Add operations could 
end up consuming significantly more (sometimes > 10X) memory on the executors.
   > 
   > @db-scnakandala, can you please explain this issue a bit more? Does #37851 
cause performance regression? Why exactly? And why on executors?
   
   @peter-toth This issue happens for a specific complex query that has a huge 
expression tree containing Add operators interleaved by non Add operators.
   
   For this query, the query ends up consuming 10x more memory and ultimately 
results in executor failures.
   
   The issue is related to canonicalization and why it is causing issues in the 
executors is because the codegen component relies on expression 
canonicalization to deduplicate expressions. 
   
   When we have a large number of Adds interleaved by non-Add operatos, I think 
[this 
line](https://github.com/apache/spark/pull/37851/files#diff-7278f2db37934522ee7c74b71525153234cff245cefaf996957e4a9ff3dbaacdR1171)
 ends up materializing a new canonicalized expression tree at every non-Add 
operator.
   
   In our case, analyzing the executor heap histogram shows that the additional 
memory is consumed by a large number of Add objects.
   
   Sorry for not being clear in the initial PR description. I will update it 
now.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to