HyukjinKwon commented on a change in pull request #34883:
URL: https://github.com/apache/spark/pull/34883#discussion_r769160607
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -222,6 +222,34 @@ abstract class Expression extends TreeNode[Expression] {
*/
def childrenResolved: Boolean = children.forall(_.resolved)
+ // Expression canonicalization is done in 2 phases:
+ // 1. Recursively canonicalize each node in the expression tree. This does
not change the tree
+ // structure and is more like "node-local" canonicalization.
+ // 2. Find adjacent commutative operators in the expression tree, reorder
them to get a
+ // static order and remove cosmetic variations. This may change the
tree structure
+ // dramatically and is more like a "global" canonicalization.
+ //
+ // The first phase is done by `preCanonicalized`. It's a `lazy val` which
recursively calls
+ // `preCanonicalized` on the children. This means that almost every node in
the expression tree
+ // will instantiate the `preCanonicalized` variable, which is good for
performance as you can
+ // reuse the canonicalization result of the children when you construct a
new expression node.
+ //
+ // The second phase is done by `canonicalized`, which simply calls
`Canonicalize` and is kind of
+ // the actual "user-facing API" of expression canonicalization. Only the
root node of the
+ // expression tree will instantiate the `canonicalized` variable. This is
different from
+ // `preCanonicalized`, because `canonicalized` does "global"
canonicalization and most of the time
+ // you cannot reuse the canonicalization result of the children.
+
+ /**
+ * An internal lazy val to implement expression canonicalization. It should
only be called in
+ * `canonicalized`, or in subclass's `preCanonicalized` when the subclass
overrides this lazy val
+ * to provide custom canonicalization logic.
+ */
+ lazy val preCanonicalized: Expression = {
Review comment:
Shall we explicitly make it `protected` since this isn't supposed to
call anywhere outside but only within `canonicalized`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]