ahshahid opened a new pull request, #37824:
URL: https://github.com/apache/spark/pull/37824

   The change in the canonicalization of expressions to split in 2 parts ( 
precanonicalization & canonicalization) has broken the canonicalization logic 
involving commutative expressions if they are sub-expressions.
   The reason is that the Main expression invokes precanonicalization of 
children and if the child is a commutative type expression, the re-ordering 
gets skipped, which will not be the case if that child's canonicalize is called 
directly.
   so if a main expression is canonicalized and if it is compared to the main 
expression built from leaves to top with children's canonicalization explicitly 
invoked, the two main expression's canonicalization will differ.
   
   The fix is to ensure that for commutative expressions , the canonicalize and 
pre-canonicalize are same.
   Not sure if this fix will completely nullify the perf improvement intended 
by splitting canonicalization in two parts.
   If it completely nullifies the perf, then it would be better to ignore this 
PR and revert the code change of splitting the canonicalization in 2 parts.
   But if we keep the two step canonicalization, then this PR is valid and is 
needed.
   
   Introduced a trait which is implemented by Commutative expressions which 
equalizes precanonicalize and canonnicalize to fix the issue.
   
   There are bug tests added which show the issue on master.
   
   No user facing changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to