ahshahid opened a new pull request, #37824: URL: https://github.com/apache/spark/pull/37824
The change in the canonicalization of expressions to split in 2 parts ( precanonicalization & canonicalization) has broken the canonicalization logic involving commutative expressions if they are sub-expressions. The reason is that the Main expression invokes precanonicalization of children and if the child is a commutative type expression, the re-ordering gets skipped, which will not be the case if that child's canonicalize is called directly. so if a main expression is canonicalized and if it is compared to the main expression built from leaves to top with children's canonicalization explicitly invoked, the two main expression's canonicalization will differ. The fix is to ensure that for commutative expressions , the canonicalize and pre-canonicalize are same. Not sure if this fix will completely nullify the perf improvement intended by splitting canonicalization in two parts. If it completely nullifies the perf, then it would be better to ignore this PR and revert the code change of splitting the canonicalization in 2 parts. But if we keep the two step canonicalization, then this PR is valid and is needed. Introduced a trait which is implemented by Commutative expressions which equalizes precanonicalize and canonnicalize to fix the issue. There are bug tests added which show the issue on master. No user facing changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
