ahshahid commented on PR #37824: URL: https://github.com/apache/spark/pull/37824#issuecomment-1241281980
Though this PR fixes the bug, but performance benchmark is now 14 sec instead of 200ms. Which is not good. I also now appreciate much better the requirement of precanonicalize phase ( the cost of canonicalizing an expression like a + b + c+ d + e + f which is a nested tree of Add and for proper canonicalization needs to be flatenned hence recursive cost which precanonicalize avoids by having only 1 such deep call. I will try the alternate approach of ensuring hashCode is symmetric for commutative expressions which will mean minimal changes & fix the bug. The ugly part there will be specific handling of Seq[Expression] for the Least & Greatest. Once I modify this PR will elicit your inputs... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
